What is Backpropagation?

Backpropagation is the cornerstone of training neural networks. It's a supervised learning algorithm that enables a neural network to learn from its mistakes by adjusting its internal parameters.

These parameters include weights and biases, which the network tunes during training to minimize the difference between its predictions and the actual target values.

Let's dive into a simple example to illustrate the backpropagation process.

Step 1: Forward Pass

Imagine we have a single-layer neural network with one input neuron, one hidden neuron, and one output neuron. The network is designed to predict whether an email is spam (1) or not spam (0) based on a single feature, such as the number of exclamation marks in the email subject line.

1. Input: We start with an input value, say x = 3 (number of exclamation marks).

2. Weight Initialization: Randomly initialize the weights and biases. Let's assume w1 = 0.5, b1 = 0.2 for the connection between the input and hidden neurons, and w2 = -0.8, b2 = 0.1 for the connection between the hidden and output neurons.

3. Compute the Hidden Layer Output: Calculate the output of the hidden layer using a sigmoid activation function:

z1 = (x * w1) + b1 = (3 * 0.5) + 0.2 = 1.7 a1 = sigmoid(z1) = 1 / (1 + e^(-1.7)) ≈ 0.846

4. Compute the Output Layer Output: Calculate the output of the network using another sigmoid activation function:

z2 = (a1 * w2) + b2 = (0.846 * -0.8) + 0.1 ≈ -0.581 a2 = sigmoid(z2) = 1 / (1 + e^(-(-0.581))) ≈ 0.641

Step 2: Calculate the Error

Determine how far off the network's prediction (a2) is from the actual target value (y_true). For instance, if the email is spam (y_true = 1), the error (E) can be calculated as:

E = 1/2 * (a2 - y_true)^2 = 1/2 * (0.641 - 1)^2 ≈ 0.105

Step 3: Backpropagation

Now, we propagate the error backward through the network to adjust the weights and biases.

1. Calculate the Output Layer's Error (δ2):

δ2 = (a2 - y_true) * sigmoid_derivative(z2) = (0.641 - 1) * 0.641 * (1 - 0.641) ≈ -0.065

2. Update the Output Layer Weights and Bias (w2 and b2):

w2_new = w2 - learning_rate * δ2 * a1 = -0.8 - 0.1 * (-0.065) * 0.846 ≈ -0.795 b2_new = b2 - learning_rate * δ2 = 0.1 - 0.1 * (-0.065) ≈ 0.107

3. Calculate the Hidden Layer's Error (δ1):

δ1 = δ2 * w2 * sigmoid_derivative(z1) = (-0.065) * (-0.795) * 0.846 * (1 - 0.846) ≈ 0.017

4. Update the Hidden Layer Weights and Bias (w1 and b1):

w1_new = w1 - learning_rate * δ1 * x = 0.5 - 0.1 * 0.017 * 3 ≈ 0.5 b1_new = b1 - learning_rate * δ1 = 0.2 - 0.1 * 0.017 ≈ 0.199

Repeat these steps for a predefined number of iterations (epochs) until the error converges to a minimum value or reaches a stopping criterion.

What is Backpropagation?