Simple Chat bot - coffee shop

using PyTorch and NLTK

The NLP preprocessing pipeline

Tokenize
lower + stem
exlude punctuation characters
generate bag of words

instruction

for training the neural network: (this should be done at first time)

python train.py

for chatting:

python chat.py

or you can use jupyter notebook

Comments

about pytorch modules
The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner morphological and inflexional endings from words in English. more info
Punkt Sentence Tokenizer: This tokenizer divides a text into a list of sentences by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences.
nltk tokenize: split sentence into array of words/tokens a token can be a word or punctuation character, or number
nltk stem: stemming = find the root form of the word examples:

words = ["organize", "organizes", "organizing"]
words = [stem(w) for w in words]
# return: ["organ", "organ", "organ"]

bag of word: return bag of words array: 1 for each known word that exists in the sentence, 0 otherwise example:

sentence = ["hello", "how", "are", "you"]
words = ["hi", "hello", "I", "you", "bye", "thank", "cool"]
bag   = [  0 ,    1 ,    0 ,   1 ,    0 ,    0 ,      0]

Pytorch Dataset: map-style dataset: "represents a map from (possibly non-integral) indices/keys to data samples."
Pytorch nn.CrossEntropyLoss:more info
Adam Algorithm: "for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments." more info

Libraries

PyTorch
NumPy
NLTK (Natural Language Processing Toolkit)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Simple Chat bot - coffee shop

The NLP preprocessing pipeline

instruction

Comments

Libraries

Files

README.md

Latest commit

History

README.md

File metadata and controls

Simple Chat bot - coffee shop

The NLP preprocessing pipeline

instruction

Comments

Libraries