Skip to content

Creating python scripts & SQL queries to build ETL pipelines to clean and transform data.

Notifications You must be signed in to change notification settings

DonnieData/Movies-ETL

Repository files navigation

Movies-ETL

Utilizing python and SQL to build-out ETL pipelines that clean, transform, and load datasets into a database.

Resources

Overview

The purpose of this project is to create a refactorable and intuitive ETL Pipeline that helps automate processing large sets of data.

Primary steps & stages of the pipeline

  • Extract
    This stage involves the initial retrieval and reading of data in various formats (csv, json) by using a python environment that can interpret the data.

pipeline_1

  • Transform
    This stage involves several more granular step including but not limited to:
    • Cleaning data: assessing missing values and any corrupt data, formating
    • Transforming: filtering , formatting, classifying (data type is redefined/changed to better suit analysis interpretation), merging data

pipeline_2

  • Load
    this stage involves connecting to a database/server from the python environment and loading the data into the appropriate tables/schemas.

pipeline_4

About

Creating python scripts & SQL queries to build ETL pipelines to clean and transform data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published