Capstone: West Nile Virus

Competition Description:

West Nile virus is most commonly spread to humans through infected mosquitos. Around 20% of people who become infected with the virus develop symptoms ranging from a persistent fever, to serious neurological illnesses that can result in death.

In 2002, the first human cases of West Nile virus were reported in Chicago. By 2004 the City of Chicago and the Chicago Department of Public Health (CDPH) had established a comprehensive surveillance and control program that is still in effect today.

Every week from late spring through the fall, mosquitos in traps across the city are tested for the virus. The results of these tests influence when and where the city will spray airborne pesticides to control adult mosquito populations.

Given weather, location, testing, and spraying data, this competition asks you to predict when and where different species of mosquitos will test positive for West Nile virus. A more accurate method of predicting outbreaks of West Nile virus in mosquitos will help the City of Chicago and CPHD more efficiently and effectively allocate resources towards preventing transmission of this potentially deadly virus.

Submissions are evaluated on area under the ROC curve between the predicted probability that West Nile Virus is present and the observed outcomes.

Approach Summary:

Data Analysis:

Matplotlib, Seaborn to visualise key features.

Data Pre-Processing:

PCA used to compress weather features
RandomOversampling to deal with 5% representation from the minority class

Modelling:

Imblearn pipeline to allow correct re-sampling during GridSearchCV cross-validation
Classification models compared: Logistic Regression, Decision Tree, Random Forest, Bagging, Gradient Boosting.
Cross-validation scoring method = roc-auc

Prediction:

Tableau used to map quality of predictions geographically and temporally.
Tableau .twb file also provided in this repo

Outcome:

XGBoosting was the selected classifier, which was vastly superior to other classifiers, however despite achieving a cross-validated auc of 93%, the test score was 69%.
Conclusions detailed in the python notebook

Potential Improvements:

Apply a Neural Net to improve performance vs. test set.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
datasets		datasets
pickle_objects		pickle_objects
.DS_Store		.DS_Store
.gitignore		.gitignore
.~Capstone_WestNileVirus__395.twbr		.~Capstone_WestNileVirus__395.twbr
Capstone_ West Nile Virus.pptx		Capstone_ West Nile Virus.pptx
Capstone_WestNileVirus.twb		Capstone_WestNileVirus.twb
README.md		README.md
capstone_westnilevirus.ipynb		capstone_westnilevirus.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capstone: West Nile Virus

Competition Description:

Approach Summary:

Data Analysis:

Data Pre-Processing:

Modelling:

Prediction:

Outcome:

Potential Improvements:

About

Releases

Packages

Languages

noahberhe/capstone_westnilevirus

Folders and files

Latest commit

History

Repository files navigation

Capstone: West Nile Virus

Competition Description:

Approach Summary:

Data Analysis:

Data Pre-Processing:

Modelling:

Prediction:

Outcome:

Potential Improvements:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages