Similar setting with previous work ErrorRadar #10

StupidBuluchacha · 2025-01-10T14:40:11Z

Hi ProcessBench Team,

I am the author of the paper 'ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection' (https://arxiv.org/abs/2410.04509), which was uploaded on arXiv in October 2024.

We believe ProcessBench is a solid work with good motivation, but it seems to share a highly similar task setting with ErrorRadar - Error Identification in Mathematical Reasoning. Thus, we wonder if your team should discuss and/or compare ErrorRadar in the Intro/Related Work section, as a thorough review of this line of research for our community. Emphasis on the similarities and differences among them would be more appropriate for a fair comparison.

Besides, our team also released a survey named 'A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges' (https://arxiv.org/abs/2412.11936) in December last year. After discussion, we will update ProcessBench in the next version, considering its research values. But we may not require you to cite this survey, because they are basically parallel works (i.e., the time interval of arxiv uploading is less than one month), so it depends on you for this survey.

Last, good luck with ProcessBench! We both are committed to pushing the boundary of LLM reasoning for the research community.

Best,
ErrorRadar Team

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Similar setting with previous work ErrorRadar #10

Similar setting with previous work ErrorRadar #10

StupidBuluchacha commented Jan 10, 2025

Similar setting with previous work ErrorRadar #10

Similar setting with previous work ErrorRadar #10

Comments

StupidBuluchacha commented Jan 10, 2025