Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 560 Bytes

README.md

File metadata and controls

7 lines (5 loc) · 560 Bytes

Internal Knowledge Distillation

Residual networks can be viewed as the ensembles of a lot of shallower sub-networks. The idea is to train the residual networks in such a way that the knowledge in the ensemble is distilled into the sub-networks in a single procedure. The advantages of doing the same are

  1. Increment in the accuracy of the original ResNet
  2. Possible training of residual networks of multiple depths in a single and efficient procedure
  3. A better approach for knowledge distillation when compared to the traditional distillation methods.