pump_it_up

First, checked for the missing values in the given data and identified below features with missing values.

Then investigate statistic summary for each feature available.

Then plot a correlation heat map for numerical features for and no significant correlation between two features were found.

scheme_name feature has a lot of missing values. Those values were filled using the mode value for each region.

scheme_managemet feature only have 12 unique values and missing values were replaced using the mode value of scheme_managemet.

Public_meeting and permit features’ missing values replaced using the mode value of respective field. Funder and installer features’ only top 10 values were used as separate categories. Other values consider as a separate category named “other”.

Dropped the subvillage feature.

Then plot a graph using longitude and latitude to observe the geographical distribution of data. And identified 1812 rows with 0,0 longitude and latitude which is clearly are some outlier due to false values.

Data have addition field region code which gives some idea about geo graphical location. Calculated the median longitude and latitude for each region and use respective values for correct the outliers. After correcting the outliers longitude and latitude distribution was as follows.

Other dropped columns – [date_recorded,gps_height,wpt_name,num_private,subvillage,lga,ward,recorded_by,extraction_type,management, management_group, payment , quality_group, quantity , source_type, waterpoint_type_group, region]

Use k-fold validation to evaluate the models. Models tested-

Random Forrest
XG boost classifier
SVM

Random forest achieved the best cross validation scores and used it for final prediction after training on the whole dataset. This model was able to achieve 0.8124 score on data driven test dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
photo		photo
Data exploration.ipynb		Data exploration.ipynb
Model.ipynb		Model.ipynb
README.md		README.md
data_preprocessing .ipynb		data_preprocessing .ipynb
submission_1.csv		submission_1.csv
submission_2.csv		submission_2.csv
submission_3.csv		submission_3.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pump_it_up

About

Releases

Packages

Languages

KasunTharaka/pump_it_up

Folders and files

Latest commit

History

Repository files navigation

pump_it_up

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages