Amazing Prime is hosting a hackathon event where teams of analysts work collaboratively on projects where they would use data to solve any problems.
Raw data has been gathered from Wikipedia and Kaggle which then was extracted, transformed and loaded into SQL. 4 deliverables have been to automate the ETL process. This generally would save time but also the not having to manually update codes to update the data.
The automation process was done by refactoring code and inserting a function.
Resource files:
movies_metadata.csv from Kaggle
wikipedia-movies.json from Wikipedia
ratings.csv from MovieLens
ETL function defined to read all 3 data files.
Wikipedia movies data extracted and transformed with nested fuctions to clean the data.
Kaggle data extracted and transformed by cleaning data within the existing function.
Movies database created in SQL showing elapsed time; within the existing function.