Sentiment analysis also known as opinion mining.
it's a way of determining how positive or negative the content of a text document is, based on the relative numbers of words it contains that are classified as either positive or negative.
This technique is refered to the use of natural language processing, text analysis, computational linguistics and it is heavily used in multiple problems such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.
Here in this example, we want to develop a system that can predict the sentiment of a textual movie review as either positive or negative.
Refer to this arxiv research paper: Sentiment Analysis
Open cmd and type:
pip install -r requirements.txt
It will install all the dependencies for you.
or manually the procedure will look like this:
Assuming you are using anaconda: currently this is where I do everything, Linux is on to do list.
- Numpy : conda install -c anaconda numpy
- NLTK : conda install -c anaconda nltk
- Keras : conda install -c conda-forge keras
Visualization done through keras visualization packages. Simply by typing these two command before return your model.
from keras.utils.vis_utils import plot_model
plot_model(model, to_file='multichannel.png', show_layer_names=True, show_shape=True)
The data is already clean for use but we should turn these document to a real token, after that:
- Split into tokens by white space
- Remove punctuation from each token
- Remove punctuation from each token
- filter out stop words
- filter out short tokens
- load all docs in a directory
- walk through all files in the folder
- Skip any reviews in the test set
- Create the full path of the file to open
- Load the doc
- Clean doc
- Add to list
Three channels
- Channel 1: Input -> Embedding -> Conv1D -> Dropout -> MaxPooling1D -> Flatten
- Channel 2: Input -> Embedding -> Conv1D -> Dropout -> MaxPooling1D -> Flatten
- Channel 3: Input -> Embedding -> Conv1D -> Dropout -> MaxPooling1D -> Flatten
- Merge: Concatinate all the Flatten layers
- Apply Dense layer with Relu activation function
- Apply Dense layer with Sigmoid activation function
- Specifying the learning process of:
- binary_crossentropy for the loss(object the model will try to minimize)
- AdamOptimizer to optimize the model
- A lists of metrics as Accuracy
python NN_for_SAnalysis.py
All the credit goes to machinelearningmastry
First image from Google