Movies-ETL

Utilizing python and SQL to build-out ETL pipelines that clean, transform, and load datasets into a database.

Resources

Python 3.7.6, JupyterLab 2.26
PostgreSQL 12.2, Pgadmin 4.20
Movie Data sourced from IMDB, Kaggle (note: due to size of the raw data files, they are not included within this repo)

The purpose of this project is to create a refactorable and intuitive ETL Pipeline that helps automate processing large sets of data.

Extract
This stage involves the initial retrieval and reading of data in various formats (csv, json) by using a python environment that can interpret the data.

Transform
This stage involves several more granular step including but not limited to:
- Cleaning data: assessing missing values and any corrupt data, formating
- Transforming: filtering , formatting, classifying (data type is redefined/changed to better suit analysis interpretation), merging data

Load
this stage involves connecting to a database/server from the python environment and loading the data into the appropriate tables/schemas.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
Resource		Resource
.gitignore		.gitignore
ETL_clean_kaggle_data.ipynb		ETL_clean_kaggle_data.ipynb
ETL_clean_wiki_movies.ipynb		ETL_clean_wiki_movies.ipynb
ETL_create_database.ipynb		ETL_create_database.ipynb
ETL_function_test.ipynb		ETL_function_test.ipynb
README.md		README.md