Checkpointing in practice for memory-efficient training on the edge

Pratyush Kumar

doi:10.1109/HPCC/SmartCity/DSS.2019.00387

Profiles Research Units Publications

Conferences

Checkpointing in practice for memory-efficient training on the edge

Published in Institute of Electrical and Electronics Engineers Inc.

2019

DOI: 10.1109/HPCC/SmartCity/DSS.2019.00387

Pages: 2759 - 2766

Abstract

Training deep neural networks has large memory requirements to store the activation maps for the forward pass of all layers to be able to compute the gradients during the backward pass. When training networks on the edge, large models may either not fit in the memory or may run with very small batch sizes. Checkpointing has been proposed as a solution, whereby during the forward pass the activation maps from only some of the layers are stored as checkpoints, and the rest are recomputed during the backward pass starting from the closest checkpoint. However, checkpointing in practice requires a careful choice of the set of layers to checkpoint. In this paper we empirically evaluate checkpointing for different networks. We then establish an analytical approach to estimate the memory requirement of each layer (using a linear regression model) in a network and thereby identify the layers which have to be checkpointed. Through this method we were able to reduce memory consumption of MobileNet and ResNet-18 architectures by a factor of 2.6 and 1.8 respectively. Finally, the networks are tested on a Raspberry Pi 3 Model B board. For MobileNet using our approach for checkpointing, we could increase the batch-size from 4 to 12. © 2019 IEEE.

About the journal

Journal	Data powered by TypesetProceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019
Publisher	Data powered by TypesetInstitute of Electrical and Electronics Engineers Inc.
Open Access	No

Authors (1)

Pratyush Kumar
- Department of Computer Science and Engineering

Concepts (16)

Chemical activation
Computer vision
Data communication systems
Deep learning
Deep neural networks
Regression analysis
Smart city
Analytical approach
CHECK POINTING
Edge
LINEAR REGRESSION MODELS
MEMORY CONSUMPTION
MEMORY EFFICIENT
Memory requirements
TRAINING NETWORK
Multilayer neural networks

ABOUT IIT MADRAS

R & D

RANKINGS & ACHIEVEMENTS

QUICK FIND