This repository contains code for an abstractive summarization model designed specifically for Arabic text. The project focuses on generating concise and coherent summaries that capture essential information from longer documents. The model employs transformer-based architecture, specifically AraBart, showcasing its effectiveness in addressing the complexities of Arabic text.
The efficacy of our model was evaluated on the XL-Sum dataset. Our model achieved a remarkable ROUGE-L score of 27.839 on the test set of the XL-Sum dataset.
But in abstractive summarization ROUGE-L score is not enough as a significant aspect of abstractive summarization quality lies in the semantic similarity between the generated summaries and the baseline summaries. In this regard, our model demonstrated a substantial semantic similarity score of 93.1. This high score is indicative of the close alignment between the content and context of the generated summaries and the baseline summaries.
To run the application, clone the repo and execute the following command:
python app.py
You can access the datasets used for training and evaluation, and weights obtained after training from the following link: Datasets and Weights