Skip to content

Latest commit

 

History

History
11 lines (9 loc) · 936 Bytes

File metadata and controls

11 lines (9 loc) · 936 Bytes

ETL-Data-Pipeline-using-Scala-Hive-AWS-Athena-JDBC-Driver

An Automated ETL Data pipeline which extract complex json data from web API service (GBFS-bixi Data) and convert to CSV for loading into Data-warehouse HDFS. After-that, Hive will process the further by external and managed table. Same procedure is also applied with AWS S3 and Athena.

2 types of ETL pipelines

  • On-premise ETL data pipeline using HDFS, Hive, Scala
  • AWS Cloud base ETL data pipeline using S3, Athena, Lambda

Project Description

image

image

image