GitHub - rickynazarrudin/sklearntweetclassification: This project does a text classification by using three classes of the datasets. I use several machine learning algorithm provided by sklearn and cross-validate each algorithm to get the best one.

Sklearn Tweet Classification

This project does a text classification by using three classes of the datasets. I use several machine learning algorithm provided by sklearn and cross-validate each algorithm to get the best one. But before doing the classification, i have to be able to deal with the challenge of the datasets. The content within the datasets can not be used directly, because the content still contain 'trash' information such as (@, #, RT, username, punctuation, local abbreviations, etc) So i have to clear the data first. Here is the main step i did during this project.

Load dataset
Clear dataset from unnecessarry characters (@,#. etc)
Words normalization (yg > yang, dgn > dengan)
Stopwords
Stemming
Train
Test
Report

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Sentiment		Sentiment
README.md		README.md
Sklearn Tweet Classification (Bahasa).ipynb		Sklearn Tweet Classification (Bahasa).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

rickynazarrudin/sklearntweetclassification

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages