Epoch, Batch, and Iterations
Understanding epoch, batch, and iterations is fundamental to training neural networks effectively.
Sample
A sample is a single row from the training dataset. If your dataset contains 10,000 rows, then you have 10,000 samples.
Each sample typically consists of:
- Input features
- Corresponding output/label
Epoch
An epoch means one complete pass through the entire training dataset.
During one epoch:
- Every sample is used once
- Each sample goes through:
- One forward pass
- One backward pass
If you have 10,000 samples, going through all 10,000 once equals 1 epoch.
In practice, one epoch is rarely enough.
We repeat the dataset multiple times so the model can gradually improve its weights and reduce training error.
Batch
Instead of feeding the whole dataset into the model at once, we divide it into smaller groups called batches.
The batch size determines how many samples are processed before updating the model weights.
Different training strategies use batches differently.
Batch Gradient Descent
- Entire dataset is one batch
- Weight update happens once per epoch
If n = 10,000 samples Then:
- Batch size = 10,000
- Iterations per epoch = 1
Stochastic Gradient Descent (SGD)
- Each sample is its own batch
- Weight update happens after every single sample
If n = 10,000 samples
Then:
- Batch size = 1
- Iterations per epoch = 10,000
This produces frequent but noisy updates.
Mini-Batch Gradient Descent (Most Common)
This is the most widely used approach. We choose a batch size manually (e.g., 32, 64, 128).
If:
- n = 10,000 samples
- m = 100 batch size
Then:
Number of batches per epoch = n / m = 10,000 / 100 = 100
So there will be 100 weight updates per epoch.
Iterations
An iteration refers to one weight update. For each batch processed, one iteration occurs.
Iterations Per Epoch
If:
- n = total samples
- m = batch size
Then: Iterations per epoch = n / m
Total Iterations Across Training
If:
- L = number of epochs
Then:
Total iterations = L × (n / m)
Example
Suppose:
- n = 10,000 samples
- m = 100 batch size
- L = 5 epochs
Then:
Iterations per epoch = 10,000 / 100 = 100
Total iterations = 5 × 100 = 500
So the model updates its weights 500 times during training.
Important Practical Detail
In most machine learning training workflows, the dataset is shuffled at the start of every epoch.
Before training begins for that epoch:
- The entire dataset is randomly shuffled.
- The shuffled dataset is divided into mini-batches.
- Training proceeds batch by batch.
Because the dataset is reshuffled at the start of every epoch, batches will be different across epochs.This prevents the model from learning the order of training data, improves convergence, and reduces the risk of overfitting. It also ensures that each epoch produces different mini-batch compositions, making gradient updates more diverse and representative.
Putting It All Together
- Sample → One row of dataset
- Batch → Group of samples processed before updating weights
- Epoch → One full pass through dataset
- Iteration → One weight update
Intuitively:
- Epoch controls how many times the model sees the data.
- Batch controls how much data is used before updating weights.
- Iterations count how many times weights are updated.
Understanding this relationship helps in:
- Estimating training time
- Choosing batch sizes
- Debugging convergence issues
- Explaining model training clearly in interviews