Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lobster Thermidor aux crevettes with a Mornay sauce, garnished with truffle pâté, brandy and a fried egg on top and Spam #3

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 2 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,62 +1,3 @@
# Classify and Cluster SMS Messages
The code (and the data journey) is all included in the ipython notebook titled "sms spam". Functions are also contained in sms_spam.py.

## Description

Use Bayesian classification and K-means clustering to analyze SMS messages.

## Objectives

### Learning Objectives

After completing this assignment, you should understand:

* The uses of classification and clustering
* Good feature extraction

### Performance Objectives

After completing this assignment, you should be able to:

* Parse text into features
* Classify textual data
* Cluster textual data

## Details

### Deliverables

* A Git repo called sms-spam containing at least:
* `README.md` file explaining how to run your project
* a `requirements.txt` file

### Requirements

* No PEP8 or Pyflakes warnings or errors

## Normal Mode

Download the [SMS Spam collection](https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection) from the UCI Machine Learning Repository.

Choose a set of features to use in order to separate SMS ham from spam.

Write a program to extract the features you want from each SMS message and then classify each SMS as ham or spam. Iterate on your feature extraction until you
have a classification success level you are comfortable with (> 75% minimum.)

## Hard Mode

In addition to the requirements from **Normal Mode**:

Write a program to cluster the SMS messages in differing numbers of groups. Examine each cluster to see if they have meaning. Write up your findings.

## Notes

Some features you might want to try:

* Presence of the following words: claim, winner
* Presence of money symbols
* Presence of numbers
* Presence of first-person words

## Additional Resources

* [TextBlob](http://textblob.readthedocs.org/en/dev/)
pip install -r requirements.txt to make sure that all necessary packages are installed
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
ipython[all]
scikit-learn
pandas
numpy
matplotlib
textblob
textblob
Loading