Skip to content

Latest commit

 

History

History
119 lines (85 loc) · 3.44 KB

README.md

File metadata and controls

119 lines (85 loc) · 3.44 KB

Sentiment Analysis:


be negative or positive

Overview:


Sentiment analysis also known as opinion mining.

it's a way of determining how positive or negative the content of a text document is, based on the relative numbers of words it contains that are classified as either positive or negative.

This technique is refered to the use of natural language processing, text analysis, computational linguistics and it is heavily used in multiple problems such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.


Here in this example, we want to develop a system that can predict the sentiment of a textual movie review as either positive or negative.

Source:


Refer to this arxiv research paper: Sentiment Analysis

A bit about this model:


Libraries used:


Open cmd and type:

pip install -r requirements.txt


It will install all the dependencies for you.

or manually the procedure will look like this:

Assuming you are using anaconda: currently this is where I do everything, Linux is on to do list.

  • Numpy : conda install -c anaconda numpy
  • NLTK : conda install -c anaconda nltk
  • Keras : conda install -c conda-forge keras

Neural Network architechture:


NN-architecture
Visualization done through keras visualization packages. Simply by typing these two command before return your model.

from keras.utils.vis_utils import plot_model


plot_model(model, to_file='multichannel.png', show_layer_names=True, show_shape=True)


Tokenization procedure:


The data is already clean for use but we should turn these document to a real token, after that:

  • Split into tokens by white space
  • Remove punctuation from each token
  • Remove punctuation from each token
  • filter out stop words
  • filter out short tokens

Data processing procedure:


  • load all docs in a directory
  • walk through all files in the folder
  • Skip any reviews in the test set
  • Create the full path of the file to open
  • Load the doc
  • Clean doc
  • Add to list

Neural Network Consist of a Sequential model which is a linear stack of layers:


Three channels

  • Channel 1: Input -> Embedding -> Conv1D -> Dropout -> MaxPooling1D -> Flatten
  • Channel 2: Input -> Embedding -> Conv1D -> Dropout -> MaxPooling1D -> Flatten
  • Channel 3: Input -> Embedding -> Conv1D -> Dropout -> MaxPooling1D -> Flatten
  • Merge: Concatinate all the Flatten layers
  • Apply Dense layer with Relu activation function
  • Apply Dense layer with Sigmoid activation function
  • Specifying the learning process of:
    • binary_crossentropy for the loss(object the model will try to minimize)
    • AdamOptimizer to optimize the model
    • A lists of metrics as Accuracy

After 10 epoch with batch size of 16 the model accuracy was 86%

Run the model with:


python NN_for_SAnalysis.py


All the credit goes to machinelearningmastry

First image from Google