This repository contains the Turkish version of Semantle.
- Create a virtual environment and install required Python modules from
requirements.txt
. - Use the
word2vec/train.sh
file to train a Turkish Word2Vec model using the Wikipedia corpus. - Run
python dump-vecs.py
to initialize the SQLite database with vectors. - Run
python dump-hints.py
to create hints pickle. - Run
python store-hints.py
to import the hints pickle to the database.
- Run
docker-compose up -d
. The project should be up and running at HTTP port 80.
This project was forked from the original Semantle repository and some modifications to the dump-hints.py
file were cherry picked from the Semantle-es repository.
- Since Turkish is an agglutinative language, while looking for similar words you will see lots of words with suffixes. I'm planning to clean up the corpus to work with lemmas or stems and improve the performance.
Go ahead, they're always appreciated!
Made in Ankara with 💙