Skip to content

Latest commit

 

History

History
30 lines (20 loc) · 1.75 KB

README.md

File metadata and controls

30 lines (20 loc) · 1.75 KB

Movies-ETL

Utilizing python and SQL to build-out ETL pipelines that clean, transform, and load datasets into a database.

Resources

Overview

The purpose of this project is to create a refactorable and intuitive ETL Pipeline that helps automate processing large sets of data.

Primary steps & stages of the pipeline

  • Extract
    This stage involves the initial retrieval and reading of data in various formats (csv, json) by using a python environment that can interpret the data.

pipeline_1

  • Transform
    This stage involves several more granular step including but not limited to:
    • Cleaning data: assessing missing values and any corrupt data, formating
    • Transforming: filtering , formatting, classifying (data type is redefined/changed to better suit analysis interpretation), merging data

pipeline_2

  • Load
    this stage involves connecting to a database/server from the python environment and loading the data into the appropriate tables/schemas.

pipeline_4