Bank XYZ wants to expand its borrower base efficiently by improving campaign conversion rates using digital transformation strategies. Develop a machine learning model to identify potential borrowers for focused marketing.
Build a machine learning model to predict potential customers who will convert from liability customers to asset customers.
The dataset consists of two CSV files:
- Data1 (5000 rows, 8 columns)
- Data2 (5000 rows, 7 columns)
- Customer ID
- Age
- Customer Since
- Highest Spend
- Zip Code
- Hidden Score
- Monthly Average Spend
- Level
- Mortgage
- Security
- Fixed Deposit Account
- Internet Banking
- Credit Card
- Loan on Card
- Language:
Python
- Libraries:
numpy
,pandas
,matplotlib
,seaborn
,sklearn
,pickle
,imblearn
- Import required libraries and read the dataset.
- Exploratory Data Analysis (EDA) including data visualization.
- Feature Engineering:
- Remove unnecessary columns
- Handle missing values
- Check for intercorrelation and remove highly correlated features
- Model Building:
- Split data into training and test sets
- Train various models: Logistic Regression, Weighted Logistic Regression, Naive Bayes, SVM, Decision Tree, Random Forest
- Model Validation:
- Evaluate models using common metrics: accuracy, confusion matrix, AUC, recall, precision, F1-score
- Handle imbalanced data using imblearn.
- Hyperparameter Tuning using GridSearchCV for Support Vector Machine Model.
- Create the final model and make predictions.
- Save the model with the highest accuracy as a pickle file.
Folders:
input
: Contains the data (Data1 and Data2).src
: Contains modularized code for different project steps, includingengine.py
andML_Pipeline
.output
: Contains the best-fitted model.lib
: Reference folder with the original ipython notebook.