📊 DATA-WAREHOUSE-ACCIDENT-US-2016-2023

Design and implement a data warehouse to manage automobile accident cases across all 49 states in the US, using a star schema and Snowflake for the data warehouse architecture.

🚀 About Project

Data Source: This project uses data on Kaggle including 2 datasets: US Accidents (2016 - 2023) and Traffic Accidents and Vehicles
- US Accidents (2016 - 2023: This is a countrywide car accident dataset that covers 49 states of the USA. The accident data were collected from February 2016 to March 2023
- Traffic Accidents and Vehicles: every line in the file represents the involvement of a unique vehicle in a unique traffic accident, featuring various vehicle and passenger properties as columns
Extract Data: Data is extracted from csv file then ingested into MinIO data lake in bronze folder using Python and Airflow
Transform Data: Data is retrieved from MinIO's bronze directory using Spark and FastAPI to perform transformation and cleaning, then the output is loaded into MinIO's silver directory.
Load Data: Once the data has been cleaned, we load it into the Snowflake data warehouse at Schema Staging using Python and Airflow.
Warehouse: Data is loaded into staging schema in Snowflake, Build and deploy data warehouse with Star Schema architecture by creating dimension and fact tables, to do this we use DBT to transform and check data.
Serving: Analyze data to improve road safety, identify high-risk accident areas to implement preventative measures. Identify factors that contribute to accidents (weather, road conditions, human error). Then visualize and create reports with Power BI.
Package and Orchestration: Components are packaged using Docker and orchestrated using Apache Airflow.

📦 Technologies

Apache Airflow
Apache Spark
Docker
Dbt
Snowflake
MinIO
FastApi
Power BI

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
config		config
dags		dags
dbt_snowflake_accident		dbt_snowflake_accident
images		images
notebooks		notebooks
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📊 DATA-WAREHOUSE-ACCIDENT-US-2016-2023

🚀 About Project

📦 Technologies

🔦 Star Schema Diagram

About

Languages

longNguyen010203/DATA-WAREHOUSE-ACCIDENT-US-2016-2023

Folders and files

Latest commit

History

Repository files navigation

📊 DATA-WAREHOUSE-ACCIDENT-US-2016-2023

🚀 About Project

📦 Technologies

🔦 Star Schema Diagram

About

Topics

Resources

Stars

Watchers

Forks

Languages