The objective of this project is to analyze and predict passenger survival on the Titanic based on historical data. The model will be built using features such as age, gender, class, and family relationships to identify the factors most strongly associated with survival rates.
The Titanic disaster of 1912 is one of the most tragic events in maritime history, where over 1,500 passengers lost their lives. The dataset from this event provides a unique opportunity to analyze the factors that influenced survival, such as age, gender, and class. This project will develop a predictive model and uncover meaningful patterns in the dataset.
This project aims to build a model that can predict whether a passenger survived the Titanic disaster based on various features such as passenger's age, gender, class, and family relationships. The goal is to identify the most important factors contributing to survival rates.
Titanic Survival Dataset
https://www.kaggle.com/c/titanic/data
train.csv
: Contains data with the target variableSurvived
(whether the passenger survived or not).test.csv
: Contains data without the target variable, used for model evaluation.
PassengerId
: Unique ID for each passenger.Survived
: Target variable (0 = No, 1 = Yes).Pclass
: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd).Name
: Passenger’s full name.Sex
: Gender.Age
: Age in years.SibSp
: Number of siblings/spouses aboard.Parch
: Number of parents/children aboard.Ticket
: Ticket number.Fare
: Fare paid.Cabin
: Cabin number (where available).Embarked
: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).
- Load the dataset and inspect its structure.
- Explore the distribution of features and look for any trends related to survival.
- Handle missing values in
Age
,Cabin
, andEmbarked
. - Encode categorical features like
Sex
andEmbarked
. - Create new features such as family size from
SibSp
andParch
.
- Visualize survival rates by gender, age, class, and other features.
- Look for patterns that may be predictive of survival.
- Select relevant features and target variable for prediction.
- Split the data into training and testing sets.
- Train a Logistic Regression model and evaluate the accuracy and performance.
- Insights into Titanic Survival: Understand how different factors such as gender, class, and age impacted survival rates.
- Predictive Model: A model that can predict whether a passenger survived the disaster based on various features.
- Performance Evaluation: Assess the accuracy and performance of the Logistic Regression model.
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn