In this file we document how the GraphZeppelin control works using flowcharts and step by step descriptions. In our flow chart we connect objects A
and B
with a directed edge A --> B
that indicates object A
calling a method of object B
.
The driver is responsible for managing and coordinating the CPU WorkerThreadGroup
, the GutteringSystem
, and the SketchAlgorithm
. The driver can do this for any vertex based sketch algorithm (which the driver is templatized upon) so long as the algorithm implements the required functions.
Steps to initialize the sketch algorith and driver.
- User constructs the
SketchAlgorithm
. - User constructs the
GraphSketchDriver
. GraphSketchDriver
pullsnum_vertices
from algorithm.GraphSketchDriver
requests the desired batch size from the algorithm.GraphSketchDriver
tells the algorithm to allocate space for its worker threads.GraphSketchDriver
constructsGutteringSystem
.GraphSketchDriver
constructsWorkerThreadGroup
.
flowchart TD
A[User] -->|2. Construct| B[GraphSketchDriver]
A -->|1. Construct| C
B -->|3. get_num_vertices\n4. get_desired_update_batch\n5. allocate_worker_memory\n8. print_configuration| C[Sketch Algorithm]
B -->|6. Construct| D[GutteringSystem]
B -->|7. Construct| E[WorkerThreadGroup]
When processing a stream, the driver coordinates its own threads, the GutteringSystem
which batches updates, the WorkerThreadGroup
which applies sketch updates, and application of updates to the graph sketch algorithm. Once the setup steps 1-2 complete, for each stream update until the breakpoint (either query or end of stream) we perform steps 3-7.
- User tells the driver to read stream until a query breakpoint or end of stream.
GraphSketchDriver
resumes its CPU worker threads (may have been paused by a query).- When processing an update
GraphSketchDriver
first calls the algorithm'spre_insert
function to allow the algorithm to do per update work before the updates are batched. Maintaining a cached query result for Connected Components is an example of why we allow this option. GraphSketchDriver
inserts the update into theGutteringSystem
.WorkerThreadGroup
pulls a batch of updates from theGutteringSystem
.WorkerThreadGroup
does not know what sketch algorithm we are running so callsGraphSketchDriver
'sbatch_callback
function.GraphSketchDriver
calls the algorithm'sapply_update_batch()
function.
flowchart TD
A[User] -->|1. process_stream_until| B[GraphSketchDriver]
B -->|2. resume| E[WorkerThreadGroup]
B -->|4. insert| D[GutteringSystem]
E -->|5. get_data| D
E -->|6. batch_callback| B
B --->|3. pre_insert\n7. apply_update_batch| C[Sketch Algorithm]
To perform a query, the user must first call driver.prep_query()
in which the driver ensures the query is safe to perform. Specifically, the driver must ensure that all stream updates have been processed before allowing the query to continue. If step 2 has_cached_query()
returns true, the driver can safely skip steps 3-4 and immediately allow the user to perform the query.
- User wants to preform a query so calls
prep_query
. GraphSketchDriver
checks if the algorithm has a valid cached query answer. In this case it can immediately return control to the user.GraphSketchDriver
flushes theGutteringSystem
of all updates.GraphSketchDriver
callspause
to wait for theWorkerThreadGroup
to finish applying all updates.- The user preforms the desired query. For example
connected_components()
.
flowchart TD
A[User] -->|1. prep_query| B[GraphSketchDriver]
A -->|5. query| C
B -->|2. has_cached_query| C[Sketch Algorithm]
B -->|3. flush| D[GutteringSystem]
B -->|4. pause| E[WorkerThreadGroup]