Skip to content

Proof-of-Concept (POC) Data Pipelines for various use cases such as data streaming/ingestion, batch data processing, orchestration and storage. Includes technologies such as Apache Airflow, Apache Spark, Apache Kafka, AWS, Python and more

Notifications You must be signed in to change notification settings

njfritter/poc-data-pipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 

Repository files navigation

Proof of Concept (POC) Data Pipelines

This repo is dedicated to testing out data technologies as well as highlighting my proficiency at building various types of data pipelines.

To start, I will be exploring simpler use cases with a combination of technologies that I have varying amounts of experience with. This will allow me to learn nuances and functionality of certain data technologies I have less experience with (i.e. streaming data use cases) while also learning how to piece them together with other technologies I have more experience with (i.e. batch data processing).

I will leverage the power of the cloud to simulate "production" conditions for these pipelines as much as possible.

I plan on using the information gained to tackle more complex use cases (including domains I am generally interested in), which will be placed in separate repos.

All of these pipelines will be guided by simulated "business use cases" that might be posed to a data engineer by a product organization, team of analysts, etc.

Option for generating pseudo-real data (real data generated in a fake way): EventSim

The Pipelines

This section will be updated as I build out each of the pipelines:

  1. Kafka Spark Streaming Pipeline with data from Coinbase API

About

Proof-of-Concept (POC) Data Pipelines for various use cases such as data streaming/ingestion, batch data processing, orchestration and storage. Includes technologies such as Apache Airflow, Apache Spark, Apache Kafka, AWS, Python and more

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published