Slow training process same as issue #11 #28

ccyycc1994 · 2020-05-11T13:37:52Z

@chrischoy @sjnarmstrong ,Thanks for your sharing. I tried your code on 3DMatch dataset using the default configuration and found the training process is very slow. Specifically it took about one and a half hour for one epoch. (as you mentioned in the paper, you trained FCGF for 100 epochs, which means more than one week in my configuration). The GPU memory it took is only less than 5000 MB and GPU utility is less than 10% but CPU utility is high. I wonder is it normal situation and what's the most time-consuming part ?but I use V100 to train the model. And also find the speed of training on GTX1080Ti is faster than it on a V100.
In Issue#11, I could not find the solution, so can you provide another way to solve this problew

Thanks a lot.

ccyycc1994 · 2020-05-11T13:46:30Z

And also I try some methos in NVIDIA/MinkowskiEngine#121, but it did not work as well

chrischoy · 2020-05-11T15:13:22Z

For V100 speed being slower than 1080ti, use export OMP_NUM_THREADS=20 or lower.

ccyycc1994 · 2020-05-12T01:09:37Z

@chrischoy if it is ok, how long did you need to train a epoch on 3dmatch dataset, I need 1.5 hours to train on GTX1080, is it slow or that's a common speed?

chrischoy · 2020-05-12T01:31:00Z

Yes, that is the usual speed.

The default argument uses batch size = 4, which uses a fraction of GPU. Try to increase the batch size.

Also, the codebase is not particularly optimized, but I think there are some parts that could be sped up significantly if you tune some hard negative mining parameters.

ccyycc1994 · 2020-05-12T08:08:02Z

@chrischoy before did you try some other PyTorch Spatially Sparse Convolution Library Like spconv(https://github.com/traveller59/spconv) or SparseConvNet(https://github.com/facebookresearch/SparseConvNet) , can this library speed up training procegress, thank you a lot

chrischoy · 2020-05-12T09:08:11Z

No I haven't. There are several poorly written parts in data loader that take up huge resources and one of them is https://github.com/chrischoy/FCGF/blob/master/lib/data_loaders.py#L257 which uses parallel KD trees to create a large set of indices and tend to hog CPU resources.

This is not really necessary to compute the loss since we can compute whether a correspondence is correct or not from the ground truth transformation.

I was planning to replace this part with on-the-fly loss computation, but I didn't have much time and I just left it there.

jingyibo123 · 2020-11-17T10:41:49Z

No I haven't. There are several poorly written parts in data loader that take up huge resources and one of them is https://github.com/chrischoy/FCGF/blob/master/lib/data_loaders.py#L257 which uses parallel KD trees to create a large set of indices and tend to hog CPU resources.

This is not really necessary to compute the loss since we can compute whether a correspondence is correct or not from the ground truth transformation.

I was planning to replace this part with on-the-fly loss computation, but I didn't have much time and I just left it there.

Hi Chris,
I managed to rewrite the function generate_rand_negative_pairs using T_gt to remove correct correspondences,
however I haven't thought of how to generate Correct correspondences on the fly on GPU without using KDtree or KNN..
Any ideas @chrischoy ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow training process same as issue #11 #28

Slow training process same as issue #11 #28

ccyycc1994 commented May 11, 2020

ccyycc1994 commented May 11, 2020

chrischoy commented May 11, 2020 •

edited

Loading

ccyycc1994 commented May 12, 2020

chrischoy commented May 12, 2020

ccyycc1994 commented May 12, 2020

chrischoy commented May 12, 2020 •

edited

Loading

jingyibo123 commented Nov 17, 2020 •

edited

Loading

Slow training process same as issue #11 #28

Slow training process same as issue #11 #28

Comments

ccyycc1994 commented May 11, 2020

ccyycc1994 commented May 11, 2020

chrischoy commented May 11, 2020 • edited Loading

ccyycc1994 commented May 12, 2020

chrischoy commented May 12, 2020

ccyycc1994 commented May 12, 2020

chrischoy commented May 12, 2020 • edited Loading

jingyibo123 commented Nov 17, 2020 • edited Loading

chrischoy commented May 11, 2020 •

edited

Loading

chrischoy commented May 12, 2020 •

edited

Loading

jingyibo123 commented Nov 17, 2020 •

edited

Loading