ETL Data Pipeline Automation with mage.ai

About

This data project details the usage of how mage.ai can be used as a tool to aid automation of a data pipeline

Dataset is extracted from a website source which is based on the historical death of hip hop rap artists all over the world.

Data Source Here

Data Pipeline

Extract data from source url using python
Load raw data into Google Cloud Storage data lake
Connect to data lake storage and perform transformation on raw data
Load transformed data into staging area on Google Cloud Storage
Connect Power BI tool to enable visualization and insights

Using the Project

Clone the github repository -> git clone repo-url
cd into etl_mage directory cd etl_mage
Locate the io_config_copy.yaml file and rename to io_config.yaml
Inside the io_config.yaml file, locate the GOOGLE_SERVICE_ACCOUNT_KEY section and fill in your Google service account credentials
Alternatively, you can give a path to your service account key using the GOOGLE_SERVICE_ACC_KEY_FILEPATH option

Run the pipeline

cd into the etl_mage directory
type mage start in the terminal to start the development environment

To-Do's

Code refactor and optimization
Data Visualizationa

Lesson Learnt

Setting up a coding environment using Google cloud compute engine
ssh into compute engine instance using vscode
Setting up Google Cloud Service account key
Setting up a mage.ai instance and developing its data pipeline
Data extraction from a web source using python web scraping tool (BeautifulSoup, requests)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
etl_mage		etl_mage
.gitignore		.gitignore
README.md		README.md
mage.ai_pipeline_automation.png		mage.ai_pipeline_automation.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Data Pipeline Automation with mage.ai

About

Data Pipeline

Using the Project

Run the pipeline

To-Do's

Lesson Learnt

About

Releases

Packages

Languages

julian-King22/etl_with_mage_ai

Folders and files

Latest commit

History

Repository files navigation

ETL Data Pipeline Automation with mage.ai

About

Data Pipeline

Using the Project

Run the pipeline

To-Do's

Lesson Learnt

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages