Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similar setting with previous work ErrorRadar #10

Open
StupidBuluchacha opened this issue Jan 10, 2025 · 0 comments
Open

Similar setting with previous work ErrorRadar #10

StupidBuluchacha opened this issue Jan 10, 2025 · 0 comments

Comments

@StupidBuluchacha
Copy link

Hi ProcessBench Team,

I am the author of the paper 'ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via Error Detection' (https://arxiv.org/abs/2410.04509), which was uploaded on arXiv in October 2024.

We believe ProcessBench is a solid work with good motivation, but it seems to share a highly similar task setting with ErrorRadar - Error Identification in Mathematical Reasoning. Thus, we wonder if your team should discuss and/or compare ErrorRadar in the Intro/Related Work section, as a thorough review of this line of research for our community. Emphasis on the similarities and differences among them would be more appropriate for a fair comparison.

Besides, our team also released a survey named 'A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges' (https://arxiv.org/abs/2412.11936) in December last year. After discussion, we will update ProcessBench in the next version, considering its research values. But we may not require you to cite this survey, because they are basically parallel works (i.e., the time interval of arxiv uploading is less than one month), so it depends on you for this survey.

Last, good luck with ProcessBench! We both are committed to pushing the boundary of LLM reasoning for the research community.

Best,
ErrorRadar Team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant