Skip to content

A machine learning project aimed at detecting spam messages using text classification techniques. The project involves preprocessing text data, feature extraction, and building a classification model to distinguish between spam and legitimate messages.

License

Notifications You must be signed in to change notification settings

KanAfridi/Sms-Spam-Detection-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SMS Spam Detection Project

This project focuses on detecting spam messages using machine learning techniques to classify textual data as spam or not spam. The dataset, sourced from Kaggle, contains labeled examples of both types of messages. Key steps in this project include preprocessing the text data, applying natural language processing techniques, and building a classification model. The model's performance was improved through hyperparameter tuning and evaluation to ensure accurate and reliable predictions.

Exploratory Data analysis help me out to understand how different distributed. Visualization methods were used to create plots comparing the frequencies of spam and non-spam messages, providing a clear overview of their distribution. Word clouds were generated to highlight the most common words in spam and non-spam messages, offering insights into the language patterns of each category.

Software And Tools Requirements

  1. Github Account
  2. Vs Code IDE
  3. Git Cli
  4. Streamlit\io

📋 Data Cleaning

The data was cleaned to ensure quality and consistency through the following steps:

  1. Handling Null Values:
  2. Text Standardization:
    • Converted text to lowercase.
    • Removed punctuation.
    • Eliminated stopwords.
  3. Tokenization and Lemmatization:
    • Tokenized the text into meaningful units.
    • Applied lemmatization for better semantic representation.
  4. Function Input at least 5 words

🔧 Feature Engineering

Meaningful features were extracted using:

  • Word Frequency Analysis: Analyzed the distribution of words in spam and non-spam messages.
  • Term Importance: Used statistical measures to identify words crucial for spam classification.

🔄 Data Preparation

Text data was transformed into a format suitable for machine learning models using TF-IDF (Term Frequency-Inverse Document Frequency). This process represented text numerically, emphasizing the importance of words relative to the dataset.


🧪 Model Selection and Optimization

Several machine learning algorithms were tested, and the Random Forest Classifier delivered the best performance. Key metrics included:

  • Accuracy: ~97%
  • Precision: ~100%

Precision was prioritized to minimize false positives, as misclassifying non-spam messages as spam can cause significant issues. The spam detection threshold was adjusted to 75% to further enhance precision.


📚 Libraries Used

  • NLTK: For text cleaning, tokenization, and lemmatization.
  • Scikit-learn: For building and evaluating the model.
  • Pandas: For data manipulation.
  • Matplotlib/Seaborn: For visualizations during EDA.

🚀 Key Features

  • High precision and accuracy in spam detection.
  • Threshold-based classification to reduce false positives.
  • Robust preprocessing pipeline for textual data.

🏁 Conclusion

This project demonstrates how a well-structured pipeline can achieve exceptional results in spam detection tasks. The Random Forest Classifier, combined with TF-IDF and robust preprocessing techniques, delivered a highly reliable solution.

📈 Future Improvements

  1. Experimenting with other algorithms like XGBoost.
  2. Fine-tuning hyperparameters for further optimization.

Feel free to explore and improve this project!

About

A machine learning project aimed at detecting spam messages using text classification techniques. The project involves preprocessing text data, feature extraction, and building a classification model to distinguish between spam and legitimate messages.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published