Skip to content

[NACCL 2025 πŸ”₯] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.

License

Notifications You must be signed in to change notification settings

mbzuai-oryx/Camel-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark
[NACCL 2025 πŸ”₯]

arXiv GitHub issues GitHub stars GitHub license Dataset on HuggingFace

Overview

CAMEL-Bench is a Comprehensive Arabic LMM Benchmark designed to evaluate and improve the capabilities of Large Multimodal Models (LMMs) in the Arabic language. Our benchmark aims to bridge the gap in multimodal model evaluation for Arabic, which represents a large population of over 400 million speakers worldwide.

CAMEL-Bench Diversity

The benchmark includes eight diverse domains and 38 sub-domains to rigorously assess the performance of LMMs in visual reasoning and understanding tasks. It comprises over 29K questions, curated by native Arabic speakers, ensuring high-quality evaluation.

Key Features

  • Eight Domains of Evaluation: Multimodal Understanding and Reasoning, OCR and Document Understanding, Chart and Diagram Understanding, Video Understanding, Cultural-Specific Understanding, Medical Imaging, Agricultural Image Understanding, and Remote Sensing Understanding.
  • Over 29,000 Questions: Carefully curated by native Arabic speakers to ensure quality and accuracy.
  • Broad Scope: Evaluates models in domains such as medical imaging, cultural-specific understanding, and remote sensing.
  • Open and Closed Source Evaluation: We provide a leaderboard featuring results from both closed-source models (e.g., GPT-4o) and open-source LMMs.

πŸ“’ Latest Updates

  • Oct 2024 πŸ”₯ CAMEL-Bench in released on HuggingFace CAMEL-Bench Dataset πŸ€—.
  • Jan 2025 πŸ”₯πŸ”₯ CAMEL-Bench in accepted for NACCL 2025 conference.

Leaderboard

Leaderboard on HuggingFace

Our leaderboard provides a performance comparison of different models evaluated on CAMEL-Bench. Current top performers include GPT-4o with an overall score of 62% and other notable models such as Gemini-1.5-Pro.

Leaderboard Snapshot

Installation

To get started with CAMEL-Bench, clone the repository and install the dependencies:

$ git clone https://github.com/mbzuai-oryx/Camel-Bench.git
$ cd Camel-Bench
$ pip install -r requirements.txt

Getting Started

The benchmark can be easily executed using the provided scripts:

$ python scripts/eval_qwen.py

To evaluate on your model, just modify the generate_qwen function in scripts/eval_qwen.py.

Dataset

Our dataset is hosted on HuggingFace, and can be accessed here: CAMEL-Bench Dataset πŸ€—.

Citation

If you use CAMEL-Bench in your research, please consider citing:

@article{ghaboura2024camelbench,
  title={CAMEL-Bench: A Comprehensive Arabic LMM Benchmark},
  author={Sara Ghaboura, Ahmed Heakl, Omkar Thawakar, Ali Alharthi, Ines Riahi, Abduljalil Saif, Jorma Laaksonen, Fahad S. Khan, Salman Khan, Rao M. Anwer},
  journal={arXiv preprint arXiv:2410.18976},
  year={2024}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Join Us!

We welcome contributions to CAMEL-Bench! Just push a pull request or issue to get started.

GitHub forks GitHub stars

Contact

For questions or suggestions, feel free to reach out to us on GitHub Discussions.

About

[NACCL 2025 πŸ”₯] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages