Releases: ddangelov/Top2Vec
gensim version fix
Fixes #152
1.0.23
1.0.22
added umap/hdbscan custom args
added use_embedding_model_tokenizer option
Added use_embedding_model_tokenizer
parameter. If set to True
and if using an embedding_model
other than doc2vec
, use the model's tokenizer for document embedding.
Fixed dependency issue with joblib.
Fixed issues with wordclouds caused by negative similarity scores.
fix saving bug
Fixed bug #91
word indexing
Added option for indexing word vectors, this will speed up search for models with large vocabularies. Specifically search_words_by_vector
and similar_words
.
Added new method search_words_by_vector
.
document indexing
Added option for indexing document vectors, this will speed up search for models with large number of documents. Specifically search_documents_by_vector
, search_documents_by_keywords
, and search_documents_by_documents
.
Added new method search_documents_by_vector
.
Added code to prevent hierarchical topic reduction error #79.
Separate dependencies
Dependencies for universal sentence encoder and BERT sentence transformer options are now optional.
With pip install top2vec[sentence-encoders]
and pip install top2vec[sentence_transformers]
Faster cosine similarity.
logging bug fix and default change
The verbose
parameter will be set to True by default.
Fixed a bug that stopped showing logging updates after downloading pre-trained models.