This repository contains an exercise on regression metrics using an income dataset to predict happiness. The exercise includes data preprocessing, model training, evaluation, and visualization.
- Coded by: Himel Sarder
- Contact: info.himelcse@gmail.com
- LinkedIn: Himel Sarder
Exercise ~ Regression Metrics.ipynb
: Jupyter notebook containing the regression analysis.LICENSE
: License information.Mymodel.pkl
: Serialized model file.README.md
: This README file.Regression Metrics.ipynb
: Additional notebook for regression metrics.income.csv
: Dataset containing income and happiness data.
The dataset income.csv
contains the following columns:
Unnamed: 0
: Index column.income
: Income values.happiness
: Happiness scores.
- Python 3.x
- Jupyter Notebook
- Required Python libraries:
- pandas
- numpy
- matplotlib
- scikit-learn
- Clone the repository:
git clone https://github.com/Himel-Sarder/ML-Exercise-Income-Dataset-Regression-Metrics.git
- Navigate to the project directory:
cd ML-Exercise-Income-Dataset-Regression-Metrics
- Install the required libraries:
pip install pandas numpy matplotlib scikit-learn
Load the dataset using pandas and display its structure:
import pandas as pd
df = pd.read_csv('income.csv')
print(df.head())
print(df.shape)
print(df.info())
Visualize the relationship between income and happiness:
import matplotlib.pyplot as plt
plt.scatter(df['income'], df['happiness'], c=df['happiness'], cmap='coolwarm')
plt.xlabel('Income')
plt.ylabel('Happiness')
plt.colorbar(label='Happiness')
plt.show()
Split the data into training and test sets:
from sklearn.model_selection import train_test_split
X = df.iloc[:, 1:2]
y = df.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)
Train a Linear Regression model:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, y_train)
Evaluate the model using various metrics:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
y_pred = lr.predict(X_test)
print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred))
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R-squared:", r2_score(y_test, y_pred))
print("Root Mean Squared Error:", np.sqrt(mean_squared_error(y_test, y_pred)))
Save the trained model to a file:
import pickle
pickle.dump(lr, open('Mymodel.pkl', 'wb'))
Test the impact of adding random features and recalculating R² and adjusted R² scores.
This project is licensed under the MIT License - see the LICENSE
file for details.
- Thank you to everyone who contributed to this project.
If you have any questions or feedback, feel free to contact me at info.himelcse@gmail.com.
Happy coding! 😺