Proof of Concept (POC) Data Pipelines

This repo is dedicated to testing out data technologies as well as highlighting my proficiency at building various types of data pipelines.

To start, I will be exploring simpler use cases with a combination of technologies that I have varying amounts of experience with. This will allow me to learn nuances and functionality of certain data technologies I have less experience with (i.e. streaming data use cases) while also learning how to piece them together with other technologies I have more experience with (i.e. batch data processing).

I will leverage the power of the cloud to simulate "production" conditions for these pipelines as much as possible.

I plan on using the information gained to tackle more complex use cases (including domains I am generally interested in), which will be placed in separate repos.

All of these pipelines will be guided by simulated "business use cases" that might be posed to a data engineer by a product organization, team of analysts, etc.

Option for generating pseudo-real data (real data generated in a fake way): EventSim

The Pipelines

This section will be updated as I build out each of the pipelines:

Kafka Spark Streaming Pipeline with data from Coinbase API

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
pipelines/kafka_spark_streaming_pipeline		pipelines/kafka_spark_streaming_pipeline
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proof of Concept (POC) Data Pipelines

The Pipelines

About

Releases

Packages

Languages

njfritter/poc-data-pipelines

Folders and files

Latest commit

History

Repository files navigation

Proof of Concept (POC) Data Pipelines

The Pipelines

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages