Apache Spark Engine

This the implementation of the Engine contract of Open Data Fabric using the Apache Spark data processing framework. It is currently in use in kamu-cli data management tool.

Features

Spark engine currently provides the most rich SQL dialect for map/filter style transformations
Integrates GeoSpark to provide geo-spatial SQL functions
It is used by kamu-cli for ingesting data into Parquet
It is used by kamu-cli along with Apache Livy to provide SQL queries functionality in the Jupyter notebooks

Known Issues

Takes a long time to start up which is hurting the user experience
Does not support temporal table joins
- You might be better off using Flink-based engine for joining and aggregating event streams
TODO

Developing

See the Developer Guide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Apache Spark Engine

Features

Known Issues

Developing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Apache Spark Engine

Features

Known Issues

Developing