Added paragraph on BN and BN bib item

andreped · Jan 21, 2024 · 4d623bc · 4d623bc
1 parent 261252e
commit 4d623bc
Show file tree

Hide file tree

Showing 2 changed files with 18 additions and 1 deletion.
diff --git a/paper/paper.bib b/paper/paper.bib
@@ -57,3 +57,16 @@ @article{helland2023postopglioblastoma
 journal = {Scientific Reports},
 doi = {10.1038/s41598-023-45456-x}
 }
+
+@inproceedings{ioffe2015batchnormalization,
+author = {Ioffe, Sergey and Szegedy, Christian},
+title = {Batch normalization: accelerating deep network training by reducing internal covariate shift},
+year = {2015},
+publisher = {JMLR.org},
+abstract = {Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82\% top-5 test error, exceeding the accuracy of human raters.},
+booktitle = {Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37},
+pages = {448–456},
+numpages = {9},
+location = {Lille, France},
+series = {ICML'15}
+}
diff --git a/paper/paper.md b/paper/paper.md
@@ -3,6 +3,7 @@ title: 'GradientAccumulator: Efficient and seamless gradient accumulation for Te
 tags:
   - Python
   - TensorFlow
+  - Optimization
   - Deep Learning
   - Gradient Descent
 authors:
@@ -50,7 +51,10 @@ GradientAccumulator has already been used in several research studies [@pedersen
 `GradientAccumulator` implements two main approaches to add gradient accumulation support to an existing TensorFlow model. GA support can either be added through model or optimizer wrapping. By wrapping the model, the `train_step` of a given Keras [@chollet2015keras] model is updated such that the gradients are updated only after a user-defined number of backward steps. Wrapping the optimizer works somewhat similar, but this update control is handled directly in the optimizer itself. This is done in such a way that _any_ optimizer can be used with this approach.
 
 
-The package solely depends on `TensorFlow`, hence adds no additional dependencing to your projects. 
+The package solely depends on `TensorFlow`, hence adds no additional dependencing to your projects.
+
+
+`BatchNormalization` (BN) [@ioffe2015batchnormalization] is one of the most commonly used normalization techniques, commonly used with convolutional neural networks. BN has to be handled with care, as it is one of the only commonly used layers to update both in the forward and backward steps. Thus, if this layer is used naively with any of the wrappers, the mean and standard deviations will not update correctly. To add proper support, we have implemented a drop-in replacement for batch normalization called `AccumBatchNormalization`. To use it in a pretrained network for finetuning applications, you can simply use the provided `replace_batchnorm_layers` method, which will use the weights of the old `BatchNormalization` layer for any given layer. Note that for the mean and standard devations are computed using full precision, similarly to the original Keras implementation.
 
 
 More details and tutorials on getting started with the `GradientAccumulator` package, can be found in the `GradientAccumulator `\href{(https://gradientaccumulator.readthedocs.io/}{documentation}.