A university-related project, Build an Inverted Index search according to the documentation specifications
- normal search
- wildcard search
- spell checking
This project uses some of easiest NLP techniques for preprocessing level. First of all, read all data-set files and parsing them to the JSON file according to a year that comment left. Next use some NLP technique in preprocessing JSON files. Finally, make an inverted index from preprocessed files. An inverted index read and used for normal search, wildcard search, and spell checking.
- For wildcard search pyahocorasick library is used.
- For spell checking pyspellchecker library is used.
NOTICE:This project use some of easiest NLP techniques for preprocessing level like: lemmatizing, stemming, word position detection from NLTK
First make Virtualenv with python 2.7, next run following command
pip install -r requirment.txt
python search.py -h