An analysis in python using sklearn and statsmodels api
Data provided by DrivenData: https://www.drivendata.org/competitions/44/dengai-predicting-disease-spread/
Notebooks:
Exploration and benchmark model [core]: this notebook includes initial data exploration, correlation plots, and decision tree benchmark model
Exploration and visuals [additional]: this notebook includes visuals from the training dataset to better understand total dengue counts over time
Time series - closer look [additional]: this notebook briefly explores ARIMA for time series forecasting and time series decomposition
Evaluating models [core]: this notebook evaluates the following algorithms on the dataset: decision tree, random forest, SVR, and linear regression
Final models [core]: this notebook showcases two final models: negative binomial regression and SVR, grid search cv is performed and models evaluated with mean absolute error and mean squared error