A Data Analytics project with efforts to draw conclusions from US accidents data from June 2016 - June 2020.
For a Simple EDA process, I started by eliminating the unnecessary features first. This is still a WIP. My main goals for the next few weeks towards this project are:
- Setup an Airflow-Spark data scheduler
- Completely understand the ETL processes required for the data.
- Look at how the severity of accidents is affected by location.
- The USA accidents dataset, taken from Kaggle, was analyzed, and results were discussed above.
- The dataset used in this project can be found at : https://www.kaggle.com/sobhanmoosavi/us-accidents
- I have discovered a lot of exciting things like cities or states witnessed the most number of accidents in the US by visualizing the results on a map and even categorizing the severity level of an accident.