Skip to content
forked from mohitgulla/Edge

Energy Efficient ML and DL Computing for GE Research

Notifications You must be signed in to change notification settings

Prasham8897/Edge

 
 

Repository files navigation


Logo

Energy Efficient Deep Learning

Repository of Capstone Work at Data Science Institute, Columbia University in collaboration with GE Research

Table of Contents

  1. About The Project
  2. Contributors
  3. Usage
  4. Methodology
  5. Future Work

About The Project

The project aims to develop techniques for training and inference of machine learning models with reduced carbon footprint. Recent estimates suggest training deep learning models such as BERT, ELMo and GPT-2 requires training on multiple GPU's for a few days. The carbon emissions from training a deep learning model is equivalent to 5 times the lifetime emissions of an average car. Hence, GE requires low-latency and lighter deep learning models without compromising accuracy, which can be deployed on GE's EDGE devices. Our objective is to explore techniques that enable us to store a model in lower precision and assess its effect during inference.

Contributors

Mentors:

Tapan Shah - Lead Machine Learning Scientist, GE Research
Eleni Drinea - Lecturer, Data Science Institute, Columbia University

Capstone Team:

Mohit Gulla, Kumari Nishu, Neelam Patodia, Prasham Sheth, Pritam Biswas

Usage

Demo / Tutorial

For a detailed walkthrough of the main techniques, i.e. multi-point mixed precision post-training quantization, pruning, and quantization aware training, please refer to notebook Demo_Code.ipynb.

Directory Structure

  • data - contains .py files with contain class definition of PyTorch dataset and the corresponding .dat file. The datasets explored are ANN based Classification: Churn and Telescope, ANN based Regression: MV Data and California Housing and CNN based Classification: CIFAR-100 and FMNIST. A subdirectory results conatins .csv files which track accuracy and loss at different precision levels from the experiments we conducted.

  • model - contains .py files with model class definition for Dense Neural Networks (DNNs) and Convolutional Neural Networks (CNNs). The various architectures of each model type are defined as separate class objects within its corresponding .py file.

  • model_artifacts - contains .pt files of full precision trained models.

  • utils - contains .py files with post-training quantization, pruning and quantization-aware training methods which were explored. In post-training quantization we have implemented single-point methods such as mid-rise quantization, regular rounding, stochastic rounding and multi-point method such as the mixed precision multipoint quantization. Each method is designed to be a standalone functionality. It also contains utility code for fetching datasets, plotting graphs, etc.

All *.ipynb and *.py files in main directory has the comprehensive code for model training, weight quantization and evaluation. They leverage the code base from the sub-directories.

Methodology

Post Training Quantization

Single-point Quantization approximates a weight value using a single low precision number.

  1. Mid-Rise
  • Delta - controls granularity of data quantization, high delta implies high quantization and significant loss of information
  • Uniform division of range of Weight values into 2^p bins for p precision
  • w_quantized = Delta * (floor(w/Delta) + 0.5)
  1. Regular Rounding
  • Quantization Set - collect a set of landmark values using uniform bin, histogram, prior normal on weight values
  • Map each weight value to the nearest landmark value from quantization set
  1. Stochastic Rounding
  • Quantization Set - collect a set of landmark values using uniform bis, histogram, prior normal on weight values
  • Assign each weight value to either the closest smaller value or the closest larger value from quantization set probabilistically

Multi-point Quantization approximates a weight value using linear combination of multiple values of low precision.

  1. Multi-point - mixed precision method
  • Assign more bits to important layers, and fewer bits to unimportant layers to balance the accuracy and cost more efficiently
  • Achieves the same flexibility as mixed precision hardware but using only a single-precision level
  • The quantization set is constructed using a uniform grid on [-1, 1] with increment epsilon and each weight value w is approximated as a linear combination of low precision weight vectors.

Pruning

It is a method of compression that involves removing less contributing weights from a trained model.

  • Setting the neural network parameters’ values to zero to remove what we estimate are less contributing (unnecessary connections) between the layers of a neural network.
  • Using the magnitude of weights to determine the importance of the weights towards the model’s performance.

Quantization-Aware Training

It is a process of training the model assuming that it will be quantized later during inference.

The steps involved in QAT are:

  1. Initialize a full precision model
  2. Quantize model weights per layer
  3. Forward propagate and compute gradients
  4. Update gradients using straight through estimator
  5. Backprop on full precision model and return quantized model

Future Work

Model Size

To get a complete picture of each method’s effectiveness, we need to observe model size at different levels of precision. This relates to our objective of reducing the carbon footprint of deep learning models.

Quantize Activations

Along with quantization of weights, explore quantization of activations as well.

Improve Training Algorithm

Most of the carbon emissions are caused due to the intensive computations required during the training. (e.g.) BERT and GPT-3 require a lot a computation to learn the parameters. We can explore techniques to get smart weight updates and reduce computations required during the training.

Hardware Simulations

Experiment on specialized low-precision hardware to accurately evaluate different quantization techniques.

About

Energy Efficient ML and DL Computing for GE Research

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.3%
  • Python 5.7%