-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.txt
75 lines (58 loc) · 2.59 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
Data Science PROJECT
Client: Bank GoodCredit | Category: Banking - Risk
This project contains only brief information of the project for more information E-mail me @ http://srikantvaijapur@gmail.com
Business Case:
Bank wants to predict credit score for current credit card customers. The credit score will denote a customer’s credit worthiness and help the bank in reducing credit default risk.
Target variable →
Bad_label:
0 – Customer has Good credit history
1 – Customer has Bad credit history
ASSUMPTIONS:
Provided Data set is imbalanced
Existing Data set:
Client provided Customer accnt , Enquiery Details , Customer_Demographics
-Bussiness requirement wants to predict the Target variable wheather the customer has Good credit history or Bad credit history
Steps to predict the Good or Bad credit history:
Data set is provided by SQL server with USERNAME<PASSWORD<HOST<PORT by the clint.
Step 1:
import the data set to jupiter and convert it to CSV file.so that it will become convinent to analyze the data.
Step 2:
After converting to csv , open the fie in xl-sheet or in Tableau
Step 3:
Analyze the data clenly and amke out your seperate table to take into consideration
DATA CLEANING & DATA MINING (in brief)
1.Read the CSV_file
2.Remove the irrelevant columns
3.Replace the data with ['?','*','$',' ',' ',''] with Nan
4.Drop the duplicate columns
Enquiery File Transform(Data Mining)
1.Read the CSV of cust_enquiery
2.Take relevant columns
3.Add one more column of enq_eqn_amt(which add the no of transction made by every single customer)
4.Groupby customer_no(which is included in all 3 dataset)
5.join the customer_no and df_amt,df_count
6.save the file
Account File Transformation
1.Read the file
2.Take the relevant columns
3.fill NaN with '0'
4.Group by Customer_no
5.Save the file
> JOINING THE DATA
1.join all the 3 files with left join
Processing the Data
1.Convet the categorical data to numerical data by label Encoder
2.Take Bad_label as Target Variable/Dependent Variable
3.Other columns are independent Variable
4.Train & test the Data from sklearn.model_selection
Applying ML algorithm
Fit the model with Random Forest algorithm and after traing with Random Forest, find out accuracy and y_predict
Plot the Graph
plot the graph with feature importance and with independent Variables
Result:
Achived 96.13% accuracy by Random Forest model fit method
Tools Used:
1.Sql server
2.Excel
3.Jupyter
4.ML Algorithm