-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Started out going down the path as a pure NLP approach - remove punctuations, tokenization, lemmization, stemming, then extract weighted features through TF-IDF.
I then throw these featuers into Doc2Vec and thought I can get a reasonable classification.
However, it fails to work for the need of identifying "intents" behind natural languages and goes heavily toward the frequency of a word being used.
This will definitely help with identifying keywords in a broader perspective, but not so much to a broader spectrum like the intention behind it.
This is the moment I dicovered Snips NLU ( and a similar package called Rasa NLU.
Open sources specialized for contexual and natural language understanding!
- p2.py - Outlier Seeker for NLU Problem -> Overview + Approach
- Snips NLU -> Work being done
In a nutshell, it is a simple function to find the max score(number) in a list = count(element in list > this score).
- See p1.py for more details.