Skip to content

Latest commit

 

History

History
83 lines (55 loc) · 2.45 KB

README.md

File metadata and controls

83 lines (55 loc) · 2.45 KB

Implementation of LSA, LDA, and SBERT for Semantic Search

This project employs Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Sentence-BERT (SBERT) on the MS MARCO dataset, enabling semantic search functionality across these models. By utilizing GloVe embeddings of the documents and comparing them with provided queries using cosine similarity, it establishes a baseline for model comparison. Key evaluation metrics such as Precision, Average Precision, Recall, F1-Score, and Mean Average Precision (MAP) are computed to assess model performance.

Dependencies

-- nltk
-- tqdm
-- gensim
-- scipy
-- numpy
-- sklearn
-- sentence_transformers
-- Pytorch
-- GloVe embeddings 

Run Locally

Clone the project

    git clone https://github.com/zthsk/semantic_search.git

Go to the project directory

    cd semantic_search

Install dependencies

    pip install nltk
    pip install tqdm
    pip install gensim
    pip install scipy   
    pip install numpy
    pip install scikit-learn
    pip install sentence-transformers
    pip install torch torchvision torchaudio

Train the LSA, LDA, BERT, and GloVe

    python train_models.py --bert sbert_embeddings.npy
    python train_models.py --lsa lsa_model.pny
    python train_models.py --lda lda_model.pny
    python train_models.py --glove glove_embeddings.npy

Query the model with a single query

    python query.py --model [bert, lsa, lda] --query "your query"

Query the model with a list of queries

    ./run_queries.sh  # just update the queries you want in queries.txt

Results of a query with different models

App Screenshot

App Screenshot

App Screenshot

Use the analysis.ipynb file to produce the following images:

App Screenshot

App Screenshot

App Screenshot