You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am the author of the paper 'ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection' (https://arxiv.org/abs/2410.04509), which was uploaded on arXiv in October 2024.
We believe ProcessBench is a solid work with good motivation, but it seems to share a highly similar task setting with ErrorRadar - Error Identification in Mathematical Reasoning. Thus, we wonder if your team should discuss and/or compare ErrorRadar in the Intro/Related Work section, as a thorough review of this line of research for our community. Emphasis on the similarities and differences among them would be more appropriate for a fair comparison.
Besides, our team also released a survey named 'A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges' (https://arxiv.org/abs/2412.11936) in December last year. After discussion, we will update ProcessBench in the next version, considering its research values. But we may not require you to cite this survey, because they are basically parallel works (i.e., the time interval of arxiv uploading is less than one month), so it depends on you for this survey.
Last, good luck with ProcessBench! We both are committed to pushing the boundary of LLM reasoning for the research community.
Best,
ErrorRadar Team
The text was updated successfully, but these errors were encountered:
Hi ProcessBench Team,
I am the author of the paper 'ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection' (https://arxiv.org/abs/2410.04509), which was uploaded on arXiv in October 2024.
We believe ProcessBench is a solid work with good motivation, but it seems to share a highly similar task setting with ErrorRadar - Error Identification in Mathematical Reasoning. Thus, we wonder if your team should discuss and/or compare ErrorRadar in the Intro/Related Work section, as a thorough review of this line of research for our community. Emphasis on the similarities and differences among them would be more appropriate for a fair comparison.
Besides, our team also released a survey named 'A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges' (https://arxiv.org/abs/2412.11936) in December last year. After discussion, we will update ProcessBench in the next version, considering its research values. But we may not require you to cite this survey, because they are basically parallel works (i.e., the time interval of arxiv uploading is less than one month), so it depends on you for this survey.
Last, good luck with ProcessBench! We both are committed to pushing the boundary of LLM reasoning for the research community.
Best,
ErrorRadar Team
The text was updated successfully, but these errors were encountered: