Skip to content

AzaamAhmed/Titanic-Survival-Prediction-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Titanic Survival Prediction: A Comparative Analysis

Predicting the Fate of Titanic Passengers

This repository examines the tragic story of the RMS Titanic through data science, utilizing machine learning techniques to analyze and predict passenger survival based on various factors. The project provides insights into survival probabilities and evaluates multiple predictive models.


Project Overview

The main objectives of this project are:

  1. Data Exploration: Understanding the dataset and identifying trends.
  2. Feature Engineering: Creating meaningful features to improve model performance.
  3. Model Implementation: Comparing different machine learning algorithms.
  4. Model Evaluation: Using metrics to assess model accuracy and performance.
  5. Insights and Future Directions: Drawing conclusions and suggesting improvements.

Data Exploration and Preprocessing

Steps Undertaken:

  • Data Cleaning:

    • Addressed missing values in columns such as Age and Embarked.
    • Removed outliers that could skew the results.
    • Standardized and normalized features for consistent model input.
  • Feature Engineering:

    • Created new features like:
      • FamilySize (combination of SibSp and Parch).
      • IsAlone (indicating whether a passenger was alone or with family).
      • Title (extracted from passenger names).
    • Encoded categorical variables (e.g., Sex, Embarked) using one-hot encoding.
  • Data Splitting:

    • Divided the dataset into:
      • Training Set: 80% of the data for model training.
      • Testing Set: 20% of the data for evaluation.

Model Implementation and Evaluation

The following machine learning models were implemented and evaluated:

Model Accuracy
Decision Tree Classifier 0.7989
Random Forest Classifier 0.7877
Logistic Regression 0.7821
AdaBoost 0.7374
k-Nearest Neighbors 0.6704
Support Vector Classifier 0.6369

Evaluation Metrics:

  • Accuracy: Measures the percentage of correct predictions.
  • Precision: Ratio of true positive predictions to total positive predictions.
  • Recall: Ratio of true positive predictions to actual positive cases.
  • F1-Score: The harmonic mean of precision and recall.
  • Confusion Matrix: A matrix representation of true positive, true negative, false positive, and false negative predictions.

Key Findings

  • Decision Tree Classifier was the top performer, achieving the highest accuracy.
  • Random Forest Classifier and Logistic Regression also produced strong results and are viable options for further optimization.
  • Models like AdaBoost, k-Nearest Neighbors, and Support Vector Classifier had lower accuracies and may require additional tuning or feature adjustments.

Future Directions

To further improve the analysis and predictions, the following steps are proposed:

  1. Hyperparameter Tuning:

    • Use grid search or randomized search to optimize hyperparameters for each model.
  2. Ensemble Methods:

    • Combine models (e.g., bagging or stacking) to leverage the strengths of multiple algorithms.
  3. Advanced Feature Engineering:

    • Explore domain-specific features or additional transformations to enhance predictive power.
  4. Deep Learning Approaches:

    • Implement deep neural networks to uncover complex patterns in the data.
  5. Class Imbalance Handling:

    • Apply techniques like SMOTE (Synthetic Minority Oversampling Technique) to address class imbalance.

How to Use This Repository

Clone the Repository:

git clone https://github.com/AzaamAhmed/Titanic-Survival-Prediction-Analysis.git
cd Titanic-Survival-Prediction-Analysis

Set Up the Environment:

Ensure you have Python and the required libraries installed. Install dependencies using:

pip install -r requirements.txt

Run the Project:

  1. Data Preprocessing:

    • Execute the notebook or script to clean and preprocess the dataset.
  2. Model Training:

    • Train the machine learning models on the training dataset.
  3. Evaluation:

    • Use the testing dataset to evaluate and compare model performance.
  4. Visualization:

    • Generate visualizations to understand data trends and model predictions.

Repository Structure

Titanic-Survival-Prediction-Analysis/
├── data/                     # Dataset files
├── notebooks/                # Jupyter notebooks for analysis and modeling
├── scripts/                  # Python scripts for data processing and model implementation
├── requirements.txt          # Dependencies and libraries
└── README.md                 # Project documentation

Contributing

Contributions are welcome! If you'd like to improve the project, feel free to submit a pull request. Ensure that your changes are well-documented and tested.


License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgments

  • Kaggle Titanic Dataset for providing the dataset.
  • Open-source contributors and the data science community for inspiration and guidance.

Enjoy exploring the Titanic dataset and building predictive models!

About

Titanic Survival Prediction - Machine Learning Project in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published