From the course: Deep Learning: Model Optimization and Tuning

Epoch and batch size tuning

- [Instructor] Let us begin our optimization journey with the most common training parameters, namely batch sizes and epochs. The general format we will follow for optimization in this course would be a quick review of the hyper parameter, followed by an exercise to try out various values and see their relative performance. We will stay away from the background concepts informally. We recommend additional readings on these topics to learn more. What is a batch size? A batch size represents a set of samples sent through the ANN in a single pass. The input data is broken up into multiple batches and each batch is passed through the network to obtain predictions and update parameters. The maximum batch size is the size of the input data and batch sizes are usually configured in 2 power n values. If the batch sizes are higher, it would lead to better GPU utilization as the samples in a batch can be processed in parallel. It would also lead to lower training iterations and possible instability in the gradient descent. The recommendation is to experiment with the model to find the right size. A size of 32 has been found to be the most optimal for most use cases. We now look at epochs. Epochs are the number of times the entire training set is passed through the network. Epochs similar to batches will only control training progress, not inference. As epochs increase, the gains would taper off as the model gains accuracy. An increase can also trigger instability beyond a certain point. It is recommended to choose the earliest value when accuracy stabilizers during the training process. The recommendation is to figure out the right number of batches and epochs first, and then use that for further experimentation. In the next video, we will experiment with epochs and batch sizes.

Contents