-
Notifications
You must be signed in to change notification settings - Fork 12
Home
Ingest and analyze event data streams for timely insights
Visualize statistics about taxi rides while the event data is streamed from an external program
Analytics
Built for anyone interested in doing data analytics on event data while it is streaming, this code pattern uses a Jupyter notebook, Spark SQL and matplotlib to show taxi trip event statistics while the events are streaming. A Java program streams the data into IBM Db2 Event Store which is optimized for event-driven data processing and analytics.
By Jacques Roy and Mark Sturdevant
- n/a
In this code pattern, a Java program runs as a daemon and will submit events to IBM Db2 Event Store. A Jupyter notebook is used to show how to interact with the event store using Python. An animated matplotlib chart is used to visualize the changing data while the events are streaming. Taxi trip data is used as the event stream. The average trip duration for each start time is continuously updated. The chart also shows the trip count to help visualize the growing database of taxi trips.
We chose to use taxi data from a CSV file so that you can easily run this code pattern without signing up for another external data feed, but it should be clear that the code pattern is designed to demonstrate event-driven data processing and analytics that can scale to support massive amounts of data. The code pattern can easily be modified to work with your own event stream. Our data has timestamps to make it easy to see simple statistics on all the data including the latest events. With your own events, you can use the notebook to experiment with charts and show how your events are trending with up-to-the-minute statistics.
When the reader has completed this code pattern, they will understand how to:
- Install IBM Db2 Event Store developer edition
- Interact with Db2 Event Store using Python and a Jupyter notebook
- Use a Java program to insert into IBM Db2 Event Store
- Query the database while inserts are in progress
- Show live updates with an animated chart
- User runs Jupyter notebook in DSX Local
- Notebook connects to Db2 Event Store to analyze live event stream
- External Java program sends live events
- IBM Db2 Event Store: In-memory database optimized for event-driven data processing and analysis.
- IBM Data Science Experience: Analyze data using RStudio, Jupyter, and Python in a configured, collaborative environment that includes IBM value-adds, such as managed Spark.
- Jupyter Notebook: An open source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text.
- Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
- Java: A secure, object-oriented programming language for creating applications.
- Databases: Repository for storing and managing collections of data.
- Analytics: Analytics delivers the value of data for the enterprise.
- Data Science: Systems and scientific methods to analyze structured and unstructured data in order to extract knowledge and insights.