Linear Regression project predicting box office success using data scraped from Box Office Mojo.
In this repo you will find the following files:
- My powerpoint presentation on this project going over each step of the process as well as my results. This is the best place to start.
- The jupyter notebook I used for scraping Box Office Mojo.
- The jupyter notebook I used for feature engineering.
- The juptyer notebook I used for modeling.
- The CSV file of all Hollywood Blacklist films from 2008 through 2017
The data is a combination of data I scraped along with a kaggle dataset containing all of the films featured on the "Hollywood Blacklist" of promising scripts.
I successfully deployed this model to the web by creating a python file containing the pickled model results and the code necessary to input variables needed for a prediction via the user-friendly Streamlit library! The code for this deployment can be found in the 'box_office_stream_deploy.py' file.
This app was first deployed via a detatched TMUX session on an AWS free tier server. It now lives on Streamlit Share! Find it here