-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Tansu Dasli edited this page May 11, 2019
·
1 revision
Welcome to the spark-sandbox wiki!
-
cardinality
- bounded: data finite in size. mostly by traditional batch engines
- unbounded: data infinite in size. mostly by most streaming or micro-batch engines
-
encoding
- table vs dataset
-
consistency
- at-most-once vs at-least-once vs exactly-once
-
in process-time windowing, no way to handle late data
-
in event-time windowing (Due to extended window lifetimes), more buffering of data is required. the other point is completeness (when you gonna end it), via watermarks
- stream: element view of dataset overtime
- micro-batch streaming: uses repeated executions of a batch processing engine to process unbounded data
- tupple-based windowing: windows whose sizes are counted in numbers of elements (esp. in sql based systems)