Improper use of CUDA Graph #6

Judith-PAS · 2023-10-15T18:56:57Z

Improper use of CUDA Graph in TC-GNN

Hello,

I wanted to bring to your attention a potential issue regarding the usage of CUDA Graph in TC-GNN. Upon reviewing the torch document and tutorial, it appears that CUDA Graph is intended to capture and replay GPU kernels on a specified stream. However, I noticed that in TC-GNN, the manually-implemented kernels (e.g., TCGNN_conv/TCGNN_kernel.cu) are being called without setting the stream. As a result, these kernels are not being captured or replayed (executed) within the TC-GNN framework.

It seems that the speedup achieved through the utilization of CUDA Graph is a consequence of these kernels being ignored during execution. In fact, during my profiling, I observed that the performance was over five times faster than the non-CUDA Graph test that actually runs the kernels. Furthermore, the lower test accuracy experienced in #5 also supports this observation.

I kindly suggest considering some corrections to the tested results that involve the usage of CUDA Graph.

Thank you for your attention to this matter.

YukeWang96 · 2023-10-15T23:12:39Z

Hi, Thanks for reaching out!

Thanks for bringing this to our attention, our current observation is that the CUDA graph on PyToch seems to have some problem supporting kernel with dynamic array (e.g., edge_list or row_ptr) in GNN cases.
Here are the results we recently ran without the CUDA graph compared with DGL on RTX-3090.
The original conclusion for performance advantage over DGL still holds.

GCN-model	DGL	TC-GNN(w/o CUDA Graph)	Speedup (x)
citeseer	7.27	3.75	1.94
cora	7.05	3.68	1.92
pubmed	7.34	3.74	1.96
ppi	7.56	4.46	1.70
PROTEINS_full	7.48	3.75	2.00
OVCAR-8H	69.45	66.95	1.04
Yeast	63.67	61.07	1.04
DD	13.35	10.53	1.27
YeastH	114.87	111.48	1.03
amazon0505	20.58	22.70	0.91
artist	7.45	4.50	1.66
com-amazon	16.70	16.69	1.00
soc-BlogCatalog	7.56	9.41	0.80
amazon0601	19.58	19.55	1.00
		Average	1.38

AGNN-model	DGL	TC-GNN(w/o CUDA Graph)	Speedup (x)
citeseer	31.25	10.31	3.03
cora	31.08	10.34	3.01
pubmed	31.38	10.59	2.96
ppi	40.28	19.89	2.03
PROTEINS_full	31.47	10.48	3.00
OVCAR-8H	143.94	112.05	1.28
Yeast	131.67	100.85	1.31
DD	44.31	23.29	1.90
YeastH	231.63	184.51	1.26
amazon0505	69.63	118.42	0.59
artist	40.40	38.71	1.04
com-amazon	50.67	41.60	1.22
soc-BlogCatalog	50.72	81.73	0.62
amazon0601	61.05	47.42	1.29
		Average	1.75

We will soon update our current code repo to fix this error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improper use of CUDA Graph #6

Improper use of CUDA Graph #6

Judith-PAS commented Oct 15, 2023

YukeWang96 commented Oct 15, 2023 •

edited

Loading

Improper use of CUDA Graph #6

Improper use of CUDA Graph #6

Comments

Judith-PAS commented Oct 15, 2023

Improper use of CUDA Graph in TC-GNN

YukeWang96 commented Oct 15, 2023 • edited Loading

YukeWang96 commented Oct 15, 2023 •

edited

Loading