DenseNet-121 is faster than CondenseNet-74 (C=G=4) on GTX 1080 Ti #3

ivankreso · 2017-11-30T09:34:36Z

I compared the forward pass speed of the larger ImageNet model with DenseNet-121 and the latter actually works faster. After benchmarking my guess is that CondenseConv layer is the cause of the slowdown due to memory transfers in ShuffleLayer and torch.index_select.
@ShichenLiu can you comment on this, did you get better performance compared to DenseNet-121 in your experiments?

ShichenLiu · 2017-11-30T13:54:45Z

Our model is mainly designed for mobile devices, on which the actual inference time highly correlates with the theoretical complexity. However, the group convolution and index/shuffle operations are not efficiently implemented on GPU.

lvdmaaten · 2017-11-30T15:34:00Z

GPUs tend to be memory-bound rather than compute-bound, in particular, for small models that require additional memory transfers such as ShuffleNets and CondenseNets. On mobile devices, embedded systems, etc. the ratio between compute (in FLOPS) and memory bandwidth is very different though: convnets tend to be compute-bound on such platforms. If you did the same comparison on such a platform, you would find that a CondenseNet is much faster than a DenseNet (see Table 5 of the paper for actual timing results on an ARM processor).

ivankreso · 2017-12-01T10:20:59Z

Thanks for clarification. I already suspected that is the reason after I measured time spent in bottleneck 1x1 layer and grouped 3x3 layer. Forward pass spends twice as much time in 1x1 compared to 3x3.
I think there is a way to avoid additional memory transfers on GPUs if CUDNN implementation allows you to specify custom feature maps ordering after grouped convolution. I don't know if this feature is available in CUDNN but if I am correct then you could remove all feature shuffling ops.

ShichenLiu mentioned this issue Nov 30, 2017

How about speed in pytorch implementation #1

Closed

ivankreso changed the title ~~DenseNet-121 is faster than CondenseNet-74 (C=G=4) in my benchmark~~ DenseNet-121 is faster than CondenseNet-74 (C=G=4) on GTX 1080 Ti Dec 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DenseNet-121 is faster than CondenseNet-74 (C=G=4) on GTX 1080 Ti #3

DenseNet-121 is faster than CondenseNet-74 (C=G=4) on GTX 1080 Ti #3

ivankreso commented Nov 30, 2017

ShichenLiu commented Nov 30, 2017 •

edited

Loading

lvdmaaten commented Nov 30, 2017

ivankreso commented Dec 1, 2017

DenseNet-121 is faster than CondenseNet-74 (C=G=4) on GTX 1080 Ti #3

DenseNet-121 is faster than CondenseNet-74 (C=G=4) on GTX 1080 Ti #3

Comments

ivankreso commented Nov 30, 2017

ShichenLiu commented Nov 30, 2017 • edited Loading

lvdmaaten commented Nov 30, 2017

ivankreso commented Dec 1, 2017

ShichenLiu commented Nov 30, 2017 •

edited

Loading