- Introduction
- Objectives
- Dataset
- Installation
- Usage
- Modeling
- Evaluation
- Visualizing the Data
- Contributions
- Acknowledgments
- Change Log
- License
This repository contains a machine learning project focused on detecting credit card fraud using Decision Tree and Support Vector Machine (SVM) classifiers. The project compares the performance of models built with scikit-learn and Snap ML, particularly in terms of training speed and accuracy. Python and popular data science libraries such as scikit-learn, pandas, matplotlib, and Snap ML are utilized in this project.
The primary objectives of this project are:
- To implement and compare Decision Tree classifiers using scikit-learn and Snap ML.
- To implement and compare Support Vector Machine classifiers using scikit-learn and Snap ML.
- To evaluate the performance of these models on a large-scale credit card fraud dataset.
- To visualize class distributions and model performance metrics.
The dataset used in this project, creditcard.csv
, contains data on credit card transactions labeled as fraudulent or non-fraudulent. The dataset includes the following columns:
Time
: Number of seconds elapsed between this transaction and the first transaction in the dataset.V1
toV28
: The result of a PCA transformation applied to the original features (due to confidentiality).Amount
: The transaction amount.Class
: Label where 1 indicates fraud and 0 indicates non-fraud.
Ensure you have the following dependencies installed:
Python 3.x
numpy
pandas
matplotlib
scikit-learn==1.0.2
snapml
To run this project locally, you need to have Python installed along with the required libraries. You can install the necessary packages using the following command:
Clone the repository and install the necessary dependencies:
git clone https://github.com/Drexregion/Credit-Card-Detector
cd creditcard-fraud-detection
pip install -r requirements.txt
To use this repository, follow these steps:
- Clone the repository:
git clone https://github.com/Drexregion/Credit-Card-Detector
- Navigate to the project directory:
cd creditcard-fraud-detection
- Run the classification script:
python classify_fraud.py
The modeling process involves the following steps:
- Data Loading: Import the dataset and inflate it to increase the size for training.
- Data Preprocessing: Standardize the features and normalize the data.
- Data Splitting: Split the data into training and testing sets.
- Model Training: Train both Decision Tree and Support Vector Machine models using scikit-learn and Snap ML.
- Model Prediction: Make predictions on the test set using the trained models.
The performance of the models is evaluated using the test dataset. The key metrics used for evaluation include:
- Confusion Matrix: A table used to describe the performance of a classification model.
- Classification Report: This includes precision, recall, F1-score, and support for each class.
- Training Time Speedup: A comparison of the training times between scikit-learn and Snap ML.
Here is an example of how to evaluate the models using a confusion matrix and classification report:
from sklearn.metrics import confusion_matrix, classification_report
# Assuming y_test contains the actual values and pred contains the predicted values
print("Confusion Matrix:\n", confusion_matrix(y_test, pred))
print("Classification Report:\n", classification_report(y_test, pred))
The target class distribution can be visualized using a pie chart. The following code snippet demonstrates how to create this visualization:
import matplotlib.pyplot as plt
# Plotting class column to visualize data
plt.figure(figsize=(8, 8))
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.title("Target Class Values")
plt.show()
We welcome contributions from the community to improve this project. To contribute, please follow these steps:
-
Fork the Repository: Click the "Fork" button at the top right of the repository page to create a copy of this repository on your GitHub account.
-
Clone the Repository: Clone your forked repository to your local machine.
git clone https://github.com/Drexregion/Credit-Card-Detector
-
Create a New Branch: Create a new branch for your feature or bug fix.
git checkout -b feature-name
-
Make Changes: Make your changes to the codebase.
-
Commit Your Changes: Commit your changes with a clear and descriptive commit message.
git commit -m "Description of your changes"
-
Push to Your Branch: Push your changes to your forked repository.
git push origin feature-name
-
Open a Pull Request: Open a pull request to merge your changes into the main repository. Provide a detailed description of your changes in the pull request.
We appreciate your contributions and will review your pull request as soon as possible. Thank you for helping to improve this project!
This project is licensed under the MIT License. See the LICENSE file for more details.