Experiments

Experimental Results (Final Draft)

Continuous Query Timing

stream used: kron17

No Memory Limit

Stream ingestion	Terrace (sec)	Aspen (sec)	GZ, buffer=100 (flush, cc = total)
10%	0.723748	0.294496	0.88, 0.86 = 1.75
20%	1.32206	0.518603	0.90, 0.48 = 1.38
30%	2.00126	0.735033	0.90, 0.48 = 1.39
40%	2.54583	0.929199	0.90, 0.51 = 1.41
50%	3.0512	1.13409	0.90, 0.52 = 1.42
60%	2.59427	1.35275	0.90, 0.51 = 1.41
70%	2.985	1.53211	0.90, 0.49 = 1.39
80%	3.34039	1.79285	0.90, 0.48 = 1.38
90%	N/A	1.94658	0.90, 0.51 = 1.41
100%	N/A	2.12543	0.90, 0.49 = 1.39

GraphZeppelin had an insertion rate of 3,946,000 per second on this workload

Note: Terrace crashes 83% of the way through the kron17 stream.

12GB Limit

Stream ingestion	Terrace (sec)	Aspen (sec)	GZ, buffer=standard (flush, cc = total)
10%	-	0.312	15.6, 7.96 = 23.6
20%	-	0.511	15.9, 8.43 = 24.4
30%	-	0.757	16.0, 8.52 = 24.5
40%	-	0.902	15.9, 8.45 = 24.4
50%	-	1.127	16.0, 8.43 = 24.4
60%	-	1.338	15.9, 8.44 = 24.4
70%	-	1.560	15.9, 8.48 = 24.4
80%	-	53.88	16.0, 8.47 = 24.5
90%	-	94.62	15.9, 8.49 = 24.4
100%	-	142.3	16.0, 8.45 = 24.4

Aspen had an insertion rate of 91,670 per second
GraphZeppelin had an insertion rate of 4,147,000 per second

Sketch speed + size

Sketch Speed

This is a comparison between our initial powermod l0 sampling implementation and our current implementation. Both versions of sketching only calculate the checksum value once per column. These are vector sketches, not node sketches. (thus lgn x lgn)

Vector size	AGM 128 bit buckets	AGM 64 bit buckets	CubeSketch
10^3	121,076 updates/s	221,330 updates/s	7,322,360 updates/s
10^4	66,726 updates/s	122,598 updates/s	5,180,510 updates/s
10^5	31,635 updates/s	54,956 updates/s	4,384,660 updates/s
10^6	19,021 updates/s	29,331 updates/s	3,730,720 updates/s
10^7	13,581 updates/s	20,936 updates/s	3,177,070 updates/s
10^8	10,577 updates/s	16,352 updates/s	2,825,880 updates/s
10^9	7,390 updates/s	13,200 updates/s	2,587,790 updates/s
10^10	1,352 updates/s	N/A	2,272,880 updates/s
10^11	916 updates/s	N/A	2,108,760 updates/s
10^12	835 updates/s	N/A	1,963,690 updates/s

Sketch Size

This is a comparison between our initial powermod l0 sampling implementation and our current implementation. These are vector sketches, not node sketches.

Vector size	AGM 128 bit	AGM 64 bit	CubeSketch
10^3	5,360B	2,696B	1,243B
10^4	10,160B	5,096B	2,395B
10^5	14,768B	7,400B	3,511B
10^6	20,240B	10,136B	4,843B
10^7	28,880B	14,456B	6,955B
10^8	36,368B	18,200B	8,791B
10^9	44,720B	22,376B	10,843B
10^10	57,200B	N/A	13,915B
10^11	67,568B	N/A	16,471B
10^12	78,800B	N/A	19,243B

Memory + Speed Experiments

Memory

GraphZeppelin (Gutter tree with 16 GB restriction)

Dataset	RES (GiB)	SWAP (GiB)	DISK (GiB)	TOTAL	TOTAL w/out DISK
Kron13	0.75	0	0.90	1.65	0.75
Kron15	6.20	0	4.57	10.77	6.2
Kron16	8.90	0	10.23	19.13	8.9
Kron17	14.8	0.05	22.76	37.61	14.85
Kron18	15.5	12.5	50.44	78.44	28

GraphZeppelin (Leaf Only; unrestricted)

Dataset	RES (GiB)	TOTAL
Kron13	0.58	0.58
Kron15	3.1	3.1
Kron16	7.0	7.0
Kron17	15.7	15.7
Kron18	35.1	35.1

Aspen

nodes	RES	SWAP
8192	352684	0
32768	3650722201	0
65536	6871947673	0
131072	16965120819	1073741824
262144	16965120819	7516192768

Terrace

nodes	RES	SWAP
8192	557788	0
32768	6764573491	0
65536	17072495001	8375186227
131072	16320875724	86758339379

Speed

Restricted (16 GiB)

GraphZeppelin (Gutter Tree)

Dataset	Insertions/sec (millions)	CC_time (seconds)
Kron13	3.93	0.02
Kron15	3.77	0.10
Kron16	3.59	0.22
Kron17	3.26	0.44
Kron18	2.50	97.5

GraphZeppelin (Leaf Only)

Dataset	Insertions/sec (millions)	CC_time (seconds)
Kron13	5.22	0.02
Kron15	4.87	0.10
Kron16	4.51	0.19
Kron17	4.24	0.42
Kron18	2.49	103

Aspen

nodes	ingestion_rate	cc_time
8192	4982400	0.04095
32768	3540570	0.2022
65536	2543120	0.746
131072	1899720	3.11
262144	11337.5	0

Terrace

nodes	ingestion_rate	cc_time
8192	137929	0.126
32768	132723	0.8
65536	143286	1.26
131072	25538.6	0

Unrestricted

GraphZeppelin (GutterTree)

Dataset	Insertions/sec (millions)	CC_time (seconds)
Kron13	3.96	0.03
Kron15	3.80	0.10
Kron16	3.69	0.20
Kron17	3.73	0.45
Kron18	3.52	0.89

GraphZeppelin (Leaf Only)

Dataset	Insertions/sec (millions)	CC_time (seconds)
Kron13	5.36	0.02
Kron15	4.98	0.07
Kron16	4.70	0.19
Kron17	4.41	0.29
Kron18	4.27	0.59
NOTE: This table of results is from after the submission and includes some optimizations that are not included with the other results. They indicate the performance of the main branch of the GraphStreamingCC system as of 02/28/2022 using a `queue_factor` of -2.

Aspen has 1.6019 million update per second ingestion rate on kron18 unrestricted

Parallel Experiment

Total Threads	Num_groups	Group_size	Insertions/sec (millions)
1	1	1	0.17
4	4	1	0.65
8	8	1	1.28
12	12	1	1.76
16	16	1	2.17
20	20	1	2.53
24	24	1	2.98
28	28	1	3.24
32	32	1	3.46
36	36	1	3.67
40	40	1	3.88
44	44	1	4.10
46	46	1	4.21

Num Groups	Group_Size	Insertions/sec (millions)
40	1	3.89
20	2	3.77
10	4	3.61
4	10	2.52
2	20	2.59

Buffering Experiment

Memory Limited to 8 GB

Number of Updates per Buffer	Proportion of a Sketch (%)	Insertions/sec (millions)
1	0.0048	0.002
250	1.2048	0.44
500	2.4096	0.88
1037	4.9976	1.60
2075	10	2.00
3458	16.6651	2.48
5187	24.9976	2.89
6916	33.3301	2.91
10375	50	3.01
20750	100	2.87
41500	200	3.14

Memory 'Unlimited' (64 GB)

Number of Updates per Buffer	Proportion of a Sketch (%)	Insertions/sec (millions)
1	0.00481927710843373	0.13
250	1.20481927710843	4.18
500	2.40963855421687	4.24
1037	4.99759036144578	4.27
2075	10	4.28
3458	16.6650602409639	4.27
5187	24.9975903614458	4.26
6916	33.3301204819277	4.24
10375	50	4.21
20750	100	4.09
41500	200	3.89

Experimental Results (SIGMOD submission)

Click to reveal

Memory

GraphZeppelin (Gutter tree with 16 GB restriction)

Dataset	RES (GiB)	SWAP (GiB)	DISK (GiB)	TOTAL	TOTAL w/out DISK
Kron13	0.78	0	0.56	1.34	0.78
Kron15	6.7	0	3.1	9.8	6.7
Kron16	10.4	0	7.25	17.65	10.4
Kron17	15.6	3.5	16.89	35.99	19.1
Kron18	15.5	24.3	39.17	78.97	39.8

GraphZeppelin (Leaf Only; unrestricted)

Dataset	RES (GiB)	TOTAL
Kron13	0.52	0.52
Kron15	3.2	3.2
Kron16	7.7	7.7
Kron17	18.6	18.6
Kron18	44.1	44.1

Aspen

nodes	RES	SWAP
8192	352684	0
32768	3650722201	0
65536	6871947673	0
131072	16965120819	1073741824
262144	16965120819	7516192768

Terrace

nodes	RES	SWAP
8192	557788	0
32768	6764573491	0
65536	17072495001	8375186227
131072	16320875724	86758339379

Speed

Restricted

GraphZeppelin (Gutter Tree)

Dataset	Insertions/sec (millions)	CC_time (seconds)
Kron13	3.29	0.11
Kron15	2.76	0.53
Kron16	2.55	1.22
Kron17	2.09	31.7
Kron18	1.63	537

GraphZeppelin (Leaf Only)

Dataset	Insertions/sec (millions)	CC_time (seconds)
Kron13	4.51	0.1
Kron15	3.51	0.52
Kron16	3.03	1.21
Kron17	2.67	48.79
Kron18	1.88	553

Aspen

nodes	ingestion_rate	cc_time
8192	4982400	0.04095
32768	3540570	0.2022
65536	2543120	0.746
131072	1899720	3.11
262144	11337.5	0

Terrace

nodes	ingestion_rate	cc_time
8192	137929	0.126
32768	132723	0.8
65536	143286	1.26
131072	25538.6	0

Unrestricted

GraphZeppelin (Leaf Only)

Dataset	Insertions/sec (millions)	CC_time (seconds)
Kron13	4.51	0.1
Kron15	3.5	0.55
Kron16	3.04	1.21
Kron17	2.72	2.77
Kron18	2.43	6.46

Aspen has 1.6019 million update per second ingestion rate on kron18 unrestricted

Parallel Experiment

Total Threads	Num_groups	Group_size	Insertions/sec (millions)
1	1	1	0.11
4	4	1	0.43
8	8	1	0.85
12	12	1	1.17
16	16	1	1.44
20	20	1	1.69
24	24	1	1.99
28	28	1	2.13
32	32	1	2.27
36	36	1	2.40
40	40	1	2.53
44	44	1	2.67
46	46	1	2.74

Buffering Experiment

Memory Limited to 8 GB

Number of Updates per Buffer	Proportion of a Sketch (%)	Insertions/sec (millions)
1	0.0034	0.002
250	0.8475	0.19
500	1.6949	0.39
1000	3.3898	0.80
1750	5.9322	1.19
3000	10.1695	1.64
5000	16.9492	2.12
7500	25.4237	2.31
10000	33.8983	2.39
14750	50	2.43
29500	100	2.41

Memory 'Unlimited' (64 GB)

Number of Updates per Buffer	Proportion of a Sketch (%)	Insertions/sec (millions)
1	0.0034	0.14
250	0.8475	2.52
500	1.6949	2.55
1000	3.3898	2.56
1750	5.9322	2.57
3000	10.1695	2.57
5000	16.9492	2.63
7500	25.4237	2.62
10000	33.8983	2.61
14750	50	2.58
29500	100	2.53

Old experiment results (version 2)

Click to reveal

1. Sketch speed experiment

This is a comparison between our initial powermod l0 sampling implementation and our current implementation. These are vector sketches, not node sketches.

Vector size	128 bit buckets	64 bit buckets	xor hashing .5 bucket factor
10^3	101840.357 updates/s	174429.354 updates/s	7156454.406 updates/s
10^4	55797.655 updates/s	96216.757 updates/s	6588353.109 updates/s
10^5	25971.327 updates/s	45342.837 updates/s	6015725.105 updates/s
10^6	17071.867 updates/s	25876.702 updates/s	5036844.517 updates/s
10^7	12132.933 updates/s	18442.902 updates/s	4407694.070 updates/s
10^8	9490.997 updates/s	14383.936 updates/s	4096379.619 updates/s
10^9	7652.748 updates/s	11602.058 updates/s	3673283.474 updates/s
10^10	1269.098 updates/s	N/A	3296869.951 updates/s
10^11	912.666 updates/s	N/A	3146148.013 updates/s
10^12	805.574 updates/s	N/A	2888870.913 updates/s

2. Sketch size experiment

This is a comparison between our initial powermod l0 sampling implementation and our current implementation. These are vector sketches, not node sketches.

Vector size	128 bit buckets	64 bit buckets	xor hashing .5 bucket factor
10^3	5KiB	3KiB	770B
10^4	10KiB	5KiB	1363B
10^5	14KiB	7KiB	1843B
10^6	20KiB	10KiB	2627B
10^7	28KiB	14KiB	3715B
10^8	36KiB	18KiB	4483B
10^9	44KiB	22KiB	5682B
10^10	56KiB	N/A	7151B
10^11	66KiB	N/A	7487B
10^12	76KiB	N/A	9955B

3. Continuous correctness test

We update an adjacency matrix while giving updates to our data structure, and after a bunch of updates we pause and compare the connected components given from the adjacency matrix and from our algorithm.

In each individual run of the correctness test, we ran 100 checks over the course of the input stream.

For kron17, we repeated these runs 5 times, resulting in a total of 500 checks. We saw no failures.

For p2p-Gnutella31, we repeated these runs 5 times, resulting in a total of 500 checks. We saw no failures.

4. system speed test

SSD swapping

speed

Aspen

Input Stream	Overall Throughput (updates / second)	Total Ingestion Time (seconds)	Total Runtime (seconds)
kron13	4691080	3.7	3.8
kron15	3512890	80	80
kron16	2461450	454	455
kron17	1843460	2427	2430
kron18	71600*	DNF after 24hr	DNF after 24hr

*Insertions per second is an estimate because we killed it at the 34% complete mark.

Terrace

Input Stream	Overall Throughput (updates / second)	Total Ingestion Time (seconds)	Total Runtime (seconds)
kron13	137596	127.5	127.6
kron15	130826	2140	2141
kron16	137200	8159	8160
kron17	33,300*	DNF after 24 hours	DNF after 24 hours
kron18	---	---	---

NVMe swapping

Aspen

Input Stream	Overall Throughput (updates / second)	Total Ingestion Time (seconds)	Total Runtime (seconds)
kron17	1.8333 x 10^6	2441	2443
kron18*	N/A	DNF after 24hr	DNF after 24hr

Made it 34% through the stream

Terrace

Input Stream	Overall Throughput (updates / second)	Total Ingestion Time (seconds)	Total Runtime (seconds)
kron16	138148	8103	8104
kron17*	N/A	DNF after 24 hours	DNF after 24 hours

Made it 71% through the stream

Our System

In-RAM Buffering

Dataset	Insertions/sec (millions)	Insertion Time	Connected Components Time
kron_13	3.16	5.6 seconds	0.1 seconds
kron_15	2.75	102 seconds	0.6 seconds
kron_16	2.47	7.55 minutes	1.27 seconds
kron_17	2.08	35.9 minutes	59.5 seconds
kron_18	1.30	3hr 49 minutes	10.1 minutes

Gutter Tree Buffering

Dataset	Insertions/sec (millions)	Insertion Time	Connected Components Time
kron_13	2.20	7.98 seconds	0.10 seconds
kron_15	1.93	145.2 seconds	0.56 seconds
kron_16	1.84	10.2 minutes	1.29 seconds
kron_17	1.71	43.6 minutes	55.6 seconds
kron_18	1.52	3hr 16 minutes	10.9 minutes

5. System memory size test

size

Aspen

Using `top` logging

Input Stream	RES (GiB)	SWAP (GiB)
kron13	1.9 GB	0
kron15	3.4 GB	0
kron16	6.4 GB	0
kron17	15.9 GB	1 GB
kron18	15.9 GB	6.3 GB

Using Aspen's own memory footprint tool:

Input Graph	Fully Compressed with Difference Encoding (GiB)	Without Difference Encoding (GiB)	Without C-Trees (GiB)
kron13	0.05 GB	0.18 GB	0.98 GB
kron15	0.83 GB	2.94 GB	15.90 GB
kron16	3.34 GB	11.79 GB	63.74 GB
kron17	13.38 GB	47.25 GB	255.5 GB
kron18*	18.49 GB	64.92 GB	349.7 GB

*kron18 stopped 34% through the stream.

Terrace

Input Stream	RES (GiB)	SWAP (GiB)
kron_13	0.52	0
kron_15	5.90	0
kron_16	15.90	7.70
kron_17	15.90	31.8

kron_17 numbers are only 74% through the stream

Aspen & Terrace without Memory Restrictions

Using top logging

system	input	RES (GiB)	SWAP (GiB	Total (GiB)
Aspen	kron18	57.7	0	57.7
Terrace	kron17	60.3	35.7	96.0

*Terrace only got 88% through the stream

Using aspens memory footprint tool

Input Graph	Fully Compressed with Difference Encoding (GiB)	Without Difference Encoding (GiB)	Without C-Trees (GiB)
kron18	53.97 GB	189.6 GB	1022 GB

Our system

In-RAM Buffering

All numbers in GiB

Dataset	RAM usage	Swap Usage	Total
kron_13	0.55	0	0.55
kron_15	3.30	0	3.30
kron_16	8.00	0	8.00
kron_17	15.9	3.30	19.2
kron_18	15.9	29.7	45.6

Gutter Tree Buffering

Dataset	RAM usage	Swap Usage	Total
kron_13	0.78	0	0.78
kron_15	6.80	0	6.80
kron_16	10.6	0	10.6
kron_17	15.7	4.60	20.3
kron_18	15.7	25.8	41.5

*These numbers don't include the size of the GutterTree file. How should we report the size of this GutterTree file?

6. Parallel test

Run upon big boi and a truncated version of kron_17 found at /home/evan/trunc_streams/kron_17_stream_trunc.txt.
There were two experiments that we ran. In the first, we varied the total number of threads available for stream ingestion from 1 to 46 while keeping a constant group_size of two. In the second, we kept a constant 40 threads but varied the group_size to see the affect on performance. In both of these experiments the full 64 GB of memory was available and we used the in-RAM buffering system.

We record the average and median insertion rate because there is a fair amount of single threaded overhead at the beginning of the stream (filling up the buffers the first time, allocating memory, creating sketches, etc.) the median should therefore provide a better indication of performance over the full kron_17 stream.

Varying Group Size

groups

Group_size	Num_groups	Insertions/sec (millions)	Median Rate (millions)
1	40	1.41	2.21
2	20	1.18	2.08
4	10	1.29	1.95
10	4	1.20	1.724
20	2	1.18	1.722
40	1	0.84	1.05

Varying Number of Threads

threads

Total Threads	Num_groups	Group_size	Insertions/sec (millions)	70th Percentile Rate (millions)
1	1	1	0.10	0.10
4	4	1	0.36	0.38
8	8	1	0.67	0.76
12	12	1	0.87	1.06
16	16	1	1.01	1.29
20	20	1	1.11	1.51
24	24	1	1.23	1.79
28	28	1	1.29	1.89
32	32	1	1.32	2.00
36	36	1	1.36	2.11
40	40	1	1.39	2.22
44	44	1	1.43	2.33
46	46	1	1.44	2.38

TODO

Rerun the varying group size experiment with a constant amount of work queue. Perhaps a reduced work queue is part of why group_size=1 was always faster. Also be sure to get the median insertion rate this time.
If the optimal group size is not 2 then re-run the vary number of threads experiment with the optimal group_size.

7. Distributed test

8. Other Small Experiments

Buffer size reduction for in-RAM tests

How far can we reduce the size of the in-RAM buffers while maintaining comparable performance.
We run upon a truncated version of kron_18 with only 2 billion insertions. The memory was limited to 16GB and we ran with 44 graph workers each of size 1.

Buffer Size	Number of Updates	Insertions/sec (millions)
Full	17496	1.09
Half	8748	1.14
Third	5832	1.10
Quarter	4374	1.09
Fifth	3499	1.04
Sixth	2916	1.00
Tenth	1749	0.84
1/15	1166	0.69 (nice)
1/20	874	0.62
1/25	699	0.55

Buffer Size for Enabling Parallelism

The purpose of this test is to establish that buffering is necessary for parallelism to have a benefit.
We ran these tests upon the full version of kron_15 with a variety of buffer sizes.
These tests were run without a memory limitation.
TODO: Would it be better to run one of the larger graphs with a memory limitation so that we see the IO performance? Or is this just an entirely in memory concern?

Number of Updates per Buffer	Insertions/sec (millions)
1	0.16
4	0.68
16	2.02
64	2.45
256	2.65
1024	2.73
4096	2.64
20248	2.15

Large Amounts of Memory Restriction In-RAM vs Gutter Tree

For kron_18 we expect the in-RAM version of our algorithm to require at least 2GB to perform well. In this experiment we will try limiting the memory to only 1GB at comparing the performance of the in-RAM buffering and Gutter Tree. We run upon a truncated version of kron_18 with only 2 billion insertions. All of these experiments were run with 44 Graph_Workers each of size 1.

System	Memory Restriction	Total Runtime of Insertions	Insertions/sec
In-RAM	Limit to 16GB	30.4 minutes	1,098,110
Gutter Tree	Limit to 16GB	36.1 minutes	922,627
In-RAM	Limit to 3GB	38.6 minutes	863,377
Gutter Tree	Limit to 3GB	42.2 minutes	690,923
In-RAM	Limit to 1GB	DNF after 9 hours	At most 62
Gutter Tree	Limit to 1GB	1 hour 6.5 minutes	516,737

Gutter Tree Leaf Size

In this experiment we seek to establish how big the leaves of the buffer tree need to be in order to have good performance. Larger buffer tree leaves means that the buffer tree can keep working even when the work queue is full but also we'd like these leaves to be as small as possible.
We ran this experiment upon the truncated version of kron_18, 46 Graph Workers of size 1, and with a memory restriction of 16GB.

Leaf Size	Sketches per leaf	Insertions/sec
1 MB	7.5	1.09 million
512 KB	3.75	1.10 million
256 KB	1.87	1.09 million
161 KB	1*	1.10 million

*the size of a sketch is actually about 137KB but we pad the size of a leaf by the size of the blocks we write to the children (24KB). The size at which we remove from the tree is 137KB still though. 137KB is how we calculated the other sketch per leaf values.

Old experiment results (version 1)

Click to reveal

## 1. Sketch speed experiment

This is a comparison between our initial powermod l0 sampling implementation and our current implementation. These are vector sketches, not node sketches.

Vector size	128 bit buckets	64 bit buckets	xor hashing .5 bucket factor
10^3	101840.357 updates/s	174429.354 updates/s	7156454.406 updates/s
10^4	55797.655 updates/s	96216.757 updates/s	6588353.109 updates/s
10^5	25971.327 updates/s	45342.837 updates/s	6015725.105 updates/s
10^6	17071.867 updates/s	25876.702 updates/s	5036844.517 updates/s
10^7	12132.933 updates/s	18442.902 updates/s	4407694.070 updates/s
10^8	9490.997 updates/s	14383.936 updates/s	4096379.619 updates/s
10^9	7652.748 updates/s	11602.058 updates/s	3673283.474 updates/s
10^10	1269.098 updates/s	N/A	3296869.951 updates/s
10^11	912.666 updates/s	N/A	3146148.013 updates/s
10^12	805.574 updates/s	N/A	2888870.913 updates/s

2. Sketch size experiment

This is a comparison between our initial powermod l0 sampling implementation and our current implementation. These are vector sketches, not node sketches.

Vector size	128 bit buckets	64 bit buckets	xor hashing .5 bucket factor
10^3	5KiB	3KiB	770B
10^4	10KiB	5KiB	1363B
10^5	14KiB	7KiB	1843B
10^6	20KiB	10KiB	2627B
10^7	28KiB	14KiB	3715B
10^8	36KiB	18KiB	4483B
10^9	44KiB	22KiB	5682B
10^10	56KiB	N/A	7151B
10^11	66KiB	N/A	7487B
10^12	76KiB	N/A	9955B

3. Continuous correctness test

We update an adjacency matrix while giving updates to our data structure, and after a bunch of updates we pause and compare the connected components given from the adjacency matrix and from our algorithm. We did 50 checks over the course of the kron17 stream, and did not get any discrepancies.

4. system speed test

Aspen

Input Stream	Overall Throughput (updates / second)	Total Ingestion Time (seconds)	Total Runtime (seconds)
kron13	3.41272e+06	6.43542	6.45255
kron15	3.19362e+06	92.6225	92.7011
kron16	2.99069e+06	372.126	372.44
kron17	2.91467e+06	1532.25	1533.1
kron18	303291	58916.7	59369.1

Terrace

Input Stream	Overall Throughput (updates / second)	Total Ingestion Time (seconds)	Total Runtime (seconds)
kron13	141012	155.748	155.807
kron15	135892	2176.73	2177.15
kron16	128040	8691.9	8693.32
kron17	61905.5	72142.4	72232

Note: Terrace crashed on kron18 around the 16 hour mark

Our System

In-RAM Buffering

Dataset	Insertions/sec (millions)	Insertion Time	Connected Components Time
kron_13	2.48	8.8 seconds	0.1 seconds
kron_15	2.21	134 seconds	0.5 seconds
kron_16	1.98	9.3 minutes	1.3 seconds
kron_17	1.71	43.5 minutes	149 seconds
kron_18	1.21	4.1 hours	10.4 minutes

Gutter Tree Buffering

Dataset	Insertions/sec (millions)	Insertion Time	Connected Components Time
kron_13	2.59	8.4 seconds	0.1 seconds
kron_15	2.33	127 seconds	0.6 seconds
kron_16	1.87	9.9 minutes	1.3 seconds
kron_17	1.24	59.9 minutes	44.8 seconds
kron_18	0.95	5.2 hours	11.7 minutes

TODO

See if Gutter Tree performance can be improved (is sharing the same disk as the swap causing the problem?, would it be helpful to have multiple threads inserting to the tree?).

5. System memory size test

Aspen

Using `top` logging

Input Stream	RES (GiB)	SWAP (GiB)
kron13	2.4	0
kron15	2.8	0
kron16	4.1	0
kron17	7.3	0
kron18	15.7	12.6

Using Aspen's own memory footprint tool:

Input Graph	Without C-Trees (GiB)	Without Difference Encoding (GiB)	Fully Compressed with Difference Encoding (GiB)
kron13	1.104	0.2053	0.05725
kron15	15.18	2.802	0.7875
kron16	57.26	10.58	2.991
kron17	128	23.72	6.711

TODO: Convert kron17 and kron18 to adj format and run Aspen's memory footprint tool on these as well.

Terrace

Input Stream	RES (GiB)	SWAP (GiB)
kron13	0.577	0
kron15	6.0	0
kron16	15.9	7.8
kron17	15.9	33.3
kron18*	15.6	50.1

*Last numbers reported by top before Terrace crashed around the 16 hour mark.

Our system

In-RAM Buffering

Dataset	RAM usage	Swap Usage	Total
kron_13	0.80 GB	0 B	0.80 GB
kron_15	5.1 GB	0 B	5.1 GB
kron_16	12.5 GB	0 B	12.5 GB
kron_17	15.9 GB	18.6 GB	34.5 GB
kron_18	15.6 GB	61.8 GB	77.4 GB

Gutter Tree Buffering

Dataset	RAM usage	Swap Usage	Total
kron_13	1.3 GB	0 B	1.3 GB
kron_15	6.9 GB	0 B	6.9 GB
kron_16	10.7 GB	0 B	10.7 GB
kron_17	15.7 GB	4.7 GB	20.4 GB
kron_18	15.6 GB	25.7 GB	41.3 GB

6. Parallel test

Run upon big boi and a truncated version of kron_17 found at /home/evan/trunc_streams/kron_17_stream_trunc.txt.
There were two experiments that we ran. In the first, we varied the total number of threads available for stream ingestion from 1 to 46 while keeping a constant group_size of two. In the second, we kept a constant 40 threads but varied the group_size to see the affect on performance. In both of these experiments the full 64 GB of memory was available and we used the in-RAM buffering system.

We record the average and median insertion rate because there is a fair amount of single threaded overhead at the beginning of the stream (filling up the buffers the first time, allocating memory, creating sketches, etc.) the median should therefore provide a better indication of performance over the full kron_17 stream.

Varying Group Size

Group_size	Num_groups	Insertions/sec (millions)
1	40	1.42
2	20	1.35
4	10	1.32
10	4	1.22
20	2	1.20
40	1	0.85

Varying Number of Threads

Total Threads	Group_size	Num_groups	Insertions/sec (millions)	Median Rate (millions)
1	1	1	0.10	0.10
4	2	2	0.34	0.36
8	4	2	0.64	0.71
12	6	2	0.83	0.99
16	8	2	0.97	1.22
20	10	2	1.09	1.43
24	12	2	1.20	1.69
28	14	2	1.23	1.75
32	16	2	1.28	1.90
36	18	2	1.32	1.98
40	20	2	1.36	2.09
44	22	2	1.41	2.23
46	23	2	1.42	2.27

TODO

Rerun the varying group size experiment with a constant amount of work queue. Perhaps a reduced work queue is part of why group_size=1 was always faster. Also be sure to get the median insertion rate this time.
If the optimal group size is not 2 then re-run the vary number of threads experiment with the optimal group_size.

7. Distributed test

8. Other Small Experiments

Buffer size reduction for in-RAM tests

How far can we reduce the size of the in-RAM buffers while maintaining comparable performance.
We run upon a truncated version of kron_18 with only 2 billion insertions. The memory was limited to 16GB and we ran with 44 graph workers each of size 1.

Buffer Size	Number of Updates	Insertions/sec (millions)
Full	34992	1.28
Half	17494	1.44
Third	11664	1.43
Quarter	8748	1.40
Sixth	5832	1.27
Eighth	4372	1.17
Tenth	3498	1.08

Buffer Size for Enabling Parallelism

The purpose of this test is to establish that buffering is necessary for parallelism to have a benefit.
We ran these tests upon the full version of kron_15 with a variety of buffer sizes.
These tests were run without a memory limitation.
TODO: Would it be better to run one of the larger graphs with a memory limitation so that we see the IO performance? Or is this just an entirely in memory concern?

Number of Updates per Buffer	Insertions/sec (millions)
1	0.15
4	0.61
16	1.87
64	2.45
256	2.70
1024	2.66
4096	2.41

Large Amounts of Memory Restriction In-RAM vs Gutter Tree

For kron_18 we expect the in-RAM version of our algorithm to require at least 2GB to perform well. In this experiment we will try limiting the memory to only 1GB at comparing the performance of the in-RAM buffering and Gutter Tree. We run upon a truncated version of kron_18 with only 2 billion insertions. All of these experiments were run with 44 Graph_Workers each of size 1.

System	Memory Restriction	Total Runtime of Insertions	Insertions/sec
In-RAM	Limit to 16GB	30.4 minutes	1,098,110
Gutter Tree	Limit to 16GB	36.1 minutes	922,627
In-RAM	Limit to 3GB	38.6 minutes	863,377
Gutter Tree	Limit to 3GB	42.2 minutes	690,923
In-RAM	Limit to 1GB	DNF after 9 hours	At most 62
Gutter Tree	Limit to 1GB	1 hour 6.5 minutes	516,737

Gutter Tree Leaf Size

In this experiment we seek to establish how big the leaves of the buffer tree need to be in order to have good performance. Larger buffer tree leaves means that the buffer tree can keep working even when the work queue is full but also we'd like these leaves to be as small as possible.
We ran this experiment upon the truncated version of kron_18, 46 Graph Workers of size 1, and with a memory restriction of 16GB.

Leaf Size	Sketches per leaf	Insertions/sec
1 MB	7.5	1.09 million
512 KB	3.75	1.10 million
256 KB	1.87	1.09 million
161 KB	1*	1.10 million

*the size of a sketch is actually about 137KB but we pad the size of a leaf by the size of the blocks we write to the children (24KB). The size at which we remove from the tree is 137KB still though. 137KB is how we calculated the other sketch per leaf values.

TODO

Buffer size experiment to show that a large-ish buffer is necessary for parallelism.
Faster queries via multi-threading?

Old experiment notes

Click to reveal

## !!DATASET LOCATIONS!! - Graphs and streams can be found in /home/experiment_inputs on big boi. - Static input graphs in edge-list format may be found in /home/experiment_inputs/graphs. - Streams from these input graphs may be found in /home/experiment_inputs/streams. - UPDATE: Needed to regenerate streams due to an off-by-one error. Only kron_17 is yet unfinished.

1. Continuous correctness test

Give the algorithm a stream defining a graph. Every X stream updates, pause the stream and run the post-processing algorithm. Make sure there are no random failures and then continue. This will show that our algorithm's failure probability is very low (basically unobservable) in practice. We compare against an ultra-compact data structure which Kenny will write.

Dataset: Kronecker with 1/4 max density on 100K nodes. Ahmed will generate this and streamify it.

Algorithm to run: Our core alg using in-memory buffering (there should be more than enough space).

Challenges: Under normal circumstances, running our alg on a stream this size should be fairly fast (prob < 1 hr). But since we have to keep stopping, flushing all buffers, processing flushed updates which may not be very parallel, and then doing connected components, this could take a long time.

What needs doing?

~~in-memory buffering needs to be completed. in pull request, evan and/or kenny will check ASAP~~ -- Done
~~we need to generate and streamify the dataset.~~ done
~~half reps needs to be enabled on the branch that is testing this. Evan will do.~~ -- Done
~~it needs to have all of our bells and whistles that make things faster~~. evan did it already
~~kenny is going to add some sketch copying functionality to rewind sketch smushing.~~ done.
we should test on a small prefix of the stream to see how long this will take. based on that we can decide how many data points we want for the experiment, if we have to do fewer queries and augment with more tests on smaller graphs, etc. kenny will do this on old machine
Run the actual test!

2. Speed Test

Run our algorithm (internal and external) and 1 or 2 in-memory dynamic graph systems and compare their performance as graph sizes grow larger. Ligra, Aspen, Terrace are likely targets. No correctness testing required.

Datasets: nearly full-density graphs on 10K, 50K, 100K, 200K nodes. Generated with both Kronecker and Erdos-Renyi if we have enough time to run on both. They need to be generated and streamified.

Algorithms to run: Aspen, Terrace and our algorithm with in-mem buffering and WOD buffering. Maybe also Ligra but it is a static graph system so it may not be applicable.

Challenge: weirdness that I still don't totally understand about how our system uses disk. Does the program think that everything is in memory and the OS just swaps things to disk if the data structures are too large? Is there a way to force it to put buffer tree on disk rather than our sketches when it has to swap, assuming there's enough space for sketches?

What needs doing?

~~in-memory buffering stuff. in pull request.~~ -- Done
buffer tree finishing touches, particularly basement nodes and new graph worker scheme (circular queue) / ~~sketches on disk~~(Probably done, ask victor) needs to be completed. EVAN SAYS just use smaller leaves, this change is in pull request. -- All done
Need to make Terrace ready to accept our graph streams and only do connected components at the end.
~~get aspen running,~~ figure out aspen batching issue
~~are input streams generated and streamified?~~
~~buy and set up disk hardware so systems can page to disk - hardware is shipping soon.~~
Estimate size of our data structures and aspen/terrace so we know when to expect us to start doing better. once they page to disk, they might be really bad. if so, do we have to run them for a long time? maybe we stop them after 24h? but this means locking up our machines for a long time.
Run the experiments.
victor will incorporate the memtest stuff into a new branch and make a few changes to it to automate finding PID etc
evan will incorporate other important components into that branch
limit mem to 16GB, run the experiments and see what the results are
try running aspen and terrace and see when they start doing poorly
based on results we may adjust memory size or use larger inputs, change our stream generation to make the graph denser at some points during the stream

3. Memory use test

We need to verify that our algorithm uses the (small) amount of space we expect it to. Basically we run the algorithm and chart its memory use over time. This may need to be done on a big graph at least once to make the point that memory usage is not dependent on graph size. This could be mixed with experiment 2.

Datasets: same as experiment 2.

What needs doing?

Victor set up a memtest tool which is lightweight enough that we can run this during the speedtest.
Run the experiments. This one should be pretty straightforward after the work we did setting up experiment 2.

4. Parallelizability

Show that our stream ingestion goes faster when we're given more cores. We don't need to do this for a full graph stream, just a significant portion of one so we can see the speedups. Evan has basically already made this I think.

Datasets: probably we can just do that for 1 or 2 graphs at around 100K nodes. One of the graphs we used for earlier tests. Not sure what's the best choice yet.

What needs doing?

Experiment DONE. We may want to run a similar experiment on a system with more cores if we can get access to one and have the time.

5. Distributed Version

We need to finish the implementation (Tyler) and get the cluster set up on as many nodes as Mike Fergman will give us (Abi). Tyler will run pilot experiments and we'll figure out what to do based on their results.

Experiments

Experimental Results (Final Draft)

Continuous Query Timing

No Memory Limit

12GB Limit

Sketch speed + size

Sketch Speed

Sketch Size

Memory + Speed Experiments

Memory

GraphZeppelin (Gutter tree with 16 GB restriction)

GraphZeppelin (Leaf Only; unrestricted)

Aspen

Terrace

Speed

Restricted (16 GiB)

GraphZeppelin (Gutter Tree)

GraphZeppelin (Leaf Only)

Aspen

Terrace

Unrestricted

GraphZeppelin (GutterTree)

GraphZeppelin (Leaf Only)

Parallel Experiment

Buffering Experiment

Memory Limited to 8 GB

Memory 'Unlimited' (64 GB)

Experimental Results (SIGMOD submission)

Memory

GraphZeppelin (Gutter tree with 16 GB restriction)

GraphZeppelin (Leaf Only; unrestricted)

Aspen

Terrace

Speed

Restricted

GraphZeppelin (Gutter Tree)

GraphZeppelin (Leaf Only)

Aspen

Terrace

Unrestricted

GraphZeppelin (Leaf Only)

Parallel Experiment

Buffering Experiment

Memory Limited to 8 GB

Memory 'Unlimited' (64 GB)

Old experiment results (version 2)

1. Sketch speed experiment

2. Sketch size experiment

3. Continuous correctness test

4. system speed test

SSD swapping

Aspen

Terrace

NVMe swapping

Aspen

Terrace

Our System

In-RAM Buffering

Gutter Tree Buffering

5. System memory size test

Aspen

Using top logging

Using Aspen's own memory footprint tool:

Terrace

Aspen & Terrace without Memory Restrictions

Our system

In-RAM Buffering

Gutter Tree Buffering

6. Parallel test

Varying Group Size

Varying Number of Threads

TODO

7. Distributed test

8. Other Small Experiments

Buffer size reduction for in-RAM tests

Buffer Size for Enabling Parallelism

Large Amounts of Memory Restriction In-RAM vs Gutter Tree

Gutter Tree Leaf Size

Old experiment results (version 1)

2. Sketch size experiment

Using `top` logging

Using `top` logging