Retrieving information from the Holy Quran is an important field for Quran scholars, Muslim researchers, and Arabic enthusiasts in general. There are two popular types of Quran searching techniques: lexical or keyword-based and semantic or concept-based that is a challenging task, especially in a complex corpus such as the Holy Quran. Quranic Search
stands for lexical and semantic search in the Holy Quran.
Quranic Search is developed to help all people, especially Muslims to deal with the Holy Quran easier and faster, allowing them to search in the Holy Quran for specific Verses, by a keyword or a conceptual topic.
The Holy Quran is considered the primary reference to approximately 1.6 billion Muslims around the world and as the leading resource for classical Arabic language. Muslims, as well as non-Muslims, need to search for certain information from the Holy Quran or retrieve verses that discuss a specific topic, having various topics to discuss, for example; ethics, Islamic law, marital and family law, monetary transactions, morals, and the relationship between Islam/Muslims and other world religions.
- Incomplete results using key-words
- Lexical search is not based on the meaning of the search query
- Relevant verses based on meaning, improving the accuracy of search
- Best ranking of the most similar verses, based on Word Embedding Representation
- Natural Language Processing based
- Displaying the first 50 results based on the best ranking
- Using the best pre-trained Word2Vec models
- Building a sentence embedding model based on Word2Vec (CBOW Architecture)
- Using different methods to represent the sentence vector
- Max similarity score between two words (A word in a query and a word in a Verse)
- Max frequency score of a specific similarity (0.3) between two words
- Average similarity score between two words
- Pooling; max pooling and average pooling
- Preprocessing of the queries is done based on the preprocessing of models' training to o seek the best comparison of vectors
- Working first on the single word level
- Then we iterated over the whole query and sentence, maximizing all Verses words with the results by summing up the result of the method for every two words (in a query and a verse), to finally compare with the whole document of the Holy Quran text.
- Combining methods results and models to get the best results
- Fast with low cost, unlike using Transformers
- Open-source
When you make a lexical search:
- Lexical Search Django API interacts with the React UI
- Verses are retrieved based on the sequence of keywords using the Lexical Search API
- Verses are displayed in the results page lexicographically by Surah number and the Verse number in the Surah
When you make a semantic search:
- Semantic Search API interacts with the UI
- Verses IDs are retrieved based on the meaning/topic of words using the Semantic Search API
- A set of Word2Vec pre-trained models are used to get the word vectors of the words of Verses and search queries
- Computing sentence vectors is done using the several methods
- Combining the results of all methods by all models
- Verses are retrieved based on the similarity score between the query and the verse
- Computing distances by cosine similarity to retrieve the most similar verses
- Verses' all props are retrieved from the Lexical Search API
The tools used in this project.
Tool | Description | |
---|---|---|
Visual Studio Code | IDE | |
React.js | Frontend framework | |
django | Lexical Search Backend Framework | |
Flask | Semantic Search API Backend Framework | |
Gensim | Topic Modeling (Word2Vec, KeyedVectors) | |
SQLite3 | For the Holy Quran Database |
quranic-search-v2
├── README.md <- This top-level README for this project
├── LICENSE
├── assets
│ ├── screenshots <- Screenshots from the project
│ └── tools <- Used tools in the project
├── backend
│ ├── api
│ │ ├── lexical
│ │ │ ├── api/ <- Lexical Django project with settings
│ │ │ ├── db/ <- Used databases in the project
│ │ │ ├── search/ <- Search application (static, templates, models, serializers, urls, views, tests, ..etc)
│ │ │ ├── db.sqlite3 <- Migrated database
│ │ │ ├── manage.py <- A command-line utility to interact with this Django project
│ │ │ └── requirements.txt <- All needed for installing the lexical search API
│ │ └── semantic
│ │ ├── data
│ │ │ ├── external/ <- Data from third-party sources
│ │ │ └── processed/ <- The final, canonical data sets for modeling
│ │ ├── models/ <- Trained and serialized models, model predictions, or model summaries
│ │ ├── notebooks/ <- All Jupyter notebooks
│ │ ├── src <- Source code for use in this project
│ │ │ ├── __init__.py <- Makes src a Python module
│ │ │ └── models <- Scripts to train models and then use trained models to make predictions
│ │ │ ├── pooling.py <- Pooling algorithms for sentence embeddings
│ │ │ ├── predict.py <- Resources of the semantic search API
│ │ │ ├── preprocess.py <- The frequent preprocessing methods
│ │ │ └── semantic_methods.py <- The semantic (word/sentence) search methods
│ │ ├── app.py <- The Flask application (entry point)
│ │ └── requirements.txt <- All needed for installing the semantic search API
│ └── run.sh <- Bootstrapping script to run the APIs
├── frontend
│ ├── node_modules <- Node.js modules
│ ├── public
│ │ ├── fonts <- Fonts used in the project
│ │ │ ├── amiri/
│ │ │ └── kufi/
│ │ ├── images
│ │ │ └── quran-logo.png
│ │ ├── 404.html
│ │ ├── index.html
│ │ ├── manifest.json
│ │ └── robots.txt
│ ├── src
│ │ ├── components <- React components
│ │ │ ├── HomeForm
│ │ │ │ ├── HomeForm.css
│ │ │ │ └── HomeForm.js
│ │ │ ├── Navbar/
│ │ │ ├── ResultsForm/
│ │ │ └── Verse/
│ │ ├── containers <- React containers/pages
│ │ │ ├── About
│ │ │ │ ├── About.css
│ │ │ │ └── About.js
│ │ │ ├── Bookmarks/
│ │ │ ├── Home/
│ │ │ └── Results/
│ │ ├── App.css <- CSS for the application
│ │ ├── App.js <- The application file
│ │ ├── App.test.js <- The application file for testing
│ │ ├── index.css <- CSS for the root (entire application)
│ │ ├── index.js <- The root application file
│ │ ├── reportWebVitals.js <- WebVitals reporting script
│ │ └── setupTests.js <- Setup script for testing
│ ├── package-lock.json <- Used to install dependencies
│ └── package.json <- Used to install dependencies
├── .github
│ └── workflows <- GitHub Actions workflows
│ ├── django.yml
│ └── node.js.yml
└── .gitignore
This project uses multiple pre-trained models, besides the requirements to run (backend/frontend). You can start by using the helper scripts to download a light model and install all requirements, before running:
sh scripts/start.sh
- Clone this repository
git clone https://github.com/ahr9n/quranic-search-v2.git
cd quranic-search-v2
🔴 All commands must be executed in the root of the project.
- Run all services (lexical API, semantic API, then frontend)
sh scripts/run.sh
- Navigate to
http://localhost:3000
🟢 Now you are good to go!
🔴 Notice that all servers shall be running in the background using the scripts, so you can close all of them using the following command:
sh scripts/down.sh
Omar Shamkh 💻 |
Ahmad Almaghraby 💻 |
Ahmad Abdulrahman 💻 |
Abd El-Twab M. Fakhry 💻 |
Ahmad Ateya 💻 |
Licensed under the GPL-v3 License.