Am using data sets provided by drivendata.org to predict if water pumps will be functional, need repair, or non-functional.
Data-explore shows my preliminary analysis of the data and some nice countplots in seaborn
Clean-data shows how I removed features and created new ones for operation year, season, and rural/urban setting
Train-test-split shows accuracy calculations, log loss, classification report, and confusion matrix
Model-and-predict-data shows model using random forest classifier. Using rfc currently have a score of 0.8020 which ranks me at 176 out of 1553 competitors.
Map of data using cartodb: https://jdills26.cartodb.com/viz/fceaae6e-0f04-11e6-ba94-0ef24382571b/public_map