This project focuses on predicting whether customers who have health insurance are likely to be interested in vehicle insurance provided by the company. The dataset includes demographics, vehicle information, and policy details.
- Prediction Task: Build a predictive model to identify customers likely to purchase vehicle insurance.
- Model Comparison: Implemented and compared the performance of three models - Logistic Regression, RandomForestClassifier, and XGBClassifier.
- Data Exploration: Explored and preprocessed the dataset, addressing issues like missing values and outliers.
- Visualization: Utilized data visualization techniques to gain insights into the relationships between variables.
- Model Evaluation: Assessed model performance using metrics like accuracy, precision, recall, and F1-score.
- Feature Importance: Analyzed feature importance to understand the factors influencing predictions.
- Addressing Imbalance: Explored techniques to handle imbalanced data.
- Logistic Regression: Achieved a training and test accuracy of 87%.
- RandomForestClassifier: Achieved a training accuracy of 95% and a test accuracy of 87%.
- XGBClassifier: Achieved a training accuracy of 88% and a test accuracy of 87%.
- Python
- Jupyter Notebooks
- scikit-learn, XGBoost, statsmodels, seaborn, matplotlib, pandas, NumPy
- Data Preparation: Explore the dataset using the provided Colab Notebooks.
- Model Training: Understanding the implementation of Logistic Regression, RandomForestClassifier, and XGBClassifier for predicting vehicle insurance interest.
- Evaluation: Review the evaluation metrics to assess model performance.
- Visualization: Explore visualizations for insights into the dataset.
Feel free to fork and adapt this project for your own datasets and predictive modeling tasks.