Epoch, Batch, and Iterations

Understanding epoch, batch, and iterations is fundamental to training neural networks effectively.

Sample

A sample is a single row from the training dataset. If your dataset contains 10,000 rows, then you have 10,000 samples.

Each sample typically consists of:

Input features
Corresponding output/label

Epoch

An epoch means one complete pass through the entire training dataset.

During one epoch:

Every sample is used once
Each sample goes through:
- One forward pass
- One backward pass

If you have 10,000 samples, going through all 10,000 once equals 1 epoch.

In practice, one epoch is rarely enough.
We repeat the dataset multiple times so the model can gradually improve its weights and reduce training error.

Batch

Instead of feeding the whole dataset into the model at once, we divide it into smaller groups called batches.

The batch size determines how many samples are processed before updating the model weights.

Different training strategies use batches differently.

Batch Gradient Descent

Entire dataset is one batch
Weight update happens once per epoch

If n = 10,000 samples Then:

Batch size = 10,000
Iterations per epoch = 1

Stochastic Gradient Descent (SGD)

Each sample is its own batch
Weight update happens after every single sample

If n = 10,000 samples

Then:

Batch size = 1
Iterations per epoch = 10,000

This produces frequent but noisy updates.

Mini-Batch Gradient Descent (Most Common)

This is the most widely used approach. We choose a batch size manually (e.g., 32, 64, 128).

If:

n = 10,000 samples
m = 100 batch size

Then:

Number of batches per epoch = n / m = 10,000 / 100 = 100

So there will be 100 weight updates per epoch.

Iterations

An iteration refers to one weight update. For each batch processed, one iteration occurs.

Iterations Per Epoch

If:

n = total samples
m = batch size

Then: Iterations per epoch = n / m

Total Iterations Across Training

If:

L = number of epochs

Then:

Total iterations = L × (n / m)

Example

Suppose:

n = 10,000 samples
m = 100 batch size
L = 5 epochs

Then:

Iterations per epoch = 10,000 / 100 = 100

Total iterations = 5 × 100 = 500

So the model updates its weights 500 times during training.

Important Practical Detail

In most machine learning training workflows, the dataset is shuffled at the start of every epoch.

Before training begins for that epoch:

The entire dataset is randomly shuffled.
The shuffled dataset is divided into mini-batches.
Training proceeds batch by batch.

Because the dataset is reshuffled at the start of every epoch, batches will be different across epochs.This prevents the model from learning the order of training data, improves convergence, and reduces the risk of overfitting. It also ensures that each epoch produces different mini-batch compositions, making gradient updates more diverse and representative.

Putting It All Together

Sample → One row of dataset
Batch → Group of samples processed before updating weights
Epoch → One full pass through dataset
Iteration → One weight update

Intuitively:

Epoch controls how many times the model sees the data.
Batch controls how much data is used before updating weights.
Iterations count how many times weights are updated.

Understanding this relationship helps in:

Estimating training time
Choosing batch sizes
Debugging convergence issues
Explaining model training clearly in interviews