Image of multiple code windows or terminals arranged in a grid, symbolizing the parallel and efficient nature of mini-batch processing in SGD.

Lesson 7 - Coding Lesson: Implementing Mini-Batch SGD in Python

Code commanders, prepare for an upgrade! 🚀 In this coding lesson, we're taking our Gradient Descent implementation to the next level by coding Mini-Batch Stochastic Gradient Descent (SGD) in Python! You'll build upon your Batch GD code from Lesson 3, adding the crucial elements of mini-batches and epochs to create a more efficient and powerful optimization algorithm. Let's code SGD and unleash its speed!

Continue

Reusing and Modifying Your Batch GD Code

We'll start by reusing and modifying the Batch Gradient Descent code you wrote in Lesson 3. This will save us time and highlight the key changes needed to implement Mini-Batch SGD.

Start by opening your Jupyter Notebook or Colab notebook from Lesson 3 (or create a new notebook and copy over your Batch GD code if needed). We'll modify the batch_gradient_descent function to create our mini_batch_gradient_descent function.

We'll keep the calculate_mse and calculate_gradients functions exactly the same – they will be reused in our Mini-Batch SGD implementation. No need to rewrite those!

# (Code from Lesson 3 - calculate_mse function - REUSE AS IS)
def calculate_mse(y_true, y_predicted):
    """Calculates the Mean Squared Error."""
    n = len(y_true)
    mse = (1 / (2 * n)) * np.sum((y_true - y_predicted)**2)
    return mse

# (Code from Lesson 3 - calculate_gradients function - REUSE AS IS)
def calculate_gradients(X, y_true, y_predicted):
    """Calculates gradients of MSE cost function."""
    n = len(y_true)
    errors = y_predicted - y_true
    gradient_beta0 = (1 / n) * np.sum(errors)
    gradient_beta1 = (1 / n) * np.sum(X * errors)
    return gradient_beta0, gradient_beta1

Reuse calculate_mse and calculate_gradients: Ensure you have these two functions defined in your notebook (copy them over if needed).

Continue

Step 1: Implement Mini-Batch SGD Function (Core Logic)

Now, let's create the mini_batch_gradient_descent function. We'll start with the core logic of iterating through epochs and mini-batches.

from sklearn.model_selection import train_test_split # Import for shuffling

def mini_batch_gradient_descent(X, y, learning_rate, n_epochs, mini_batch_size):
    """Implements Mini-Batch Stochastic Gradient Descent for linear regression."""
    beta_0 = 0
    beta_1 = 0
    history_cost = []
    n_data = len(X)

    for epoch in range(n_epochs):
        # Shuffle data at the start of each epoch (NEW!)
        X_shuffled, y_shuffled = train_test_split(X, y, shuffle=True)

        for i in range(0, n_data, mini_batch_size): # Iterate through mini-batches (NEW!)
            X_mini_batch = X_shuffled[i:i + mini_batch_size]
            y_mini_batch = y_shuffled[i:i + mini_batch_size]

            # --- Gradient and Parameter Update Code will go here --- 
            pass # Placeholder

        # Calculate and store cost for full dataset (Moved to epoch loop)
        y_predicted_full = beta_0 + beta_1 * X
        cost = calculate_mse(y, y_predicted_full)
        history_cost.append(cost)

    return beta_0, beta_1, history_cost

Implement mini_batch_gradient_descent (Initial Structure): Copy and paste this code structure into a new cell. Let's understand the key additions for SGD:

This structure sets up the core loops for epochs and mini-batches. Now, let's fill in the gradient calculation and parameter update logic inside the mini-batch loop.

Continue

Step 2: Add Gradient Calculation and Parameter Update (Mini-Batch Loop)

Let's complete the mini_batch_gradient_descent function by adding the gradient calculation and parameter update steps within the mini-batch loop.

# (Add this code inside the 'for i in range(0, n_data, mini_batch_size):' loop, replacing 'pass')

            y_predicted_mini_batch = beta_0 + beta_1 * X_mini_batch # Predictions for mini-batch
            gradient_beta0, gradient_beta1 = calculate_gradients(X_mini_batch, y_mini_batch, y_predicted_mini_batch) # Gradients on mini-batch

            beta_0 = beta_0 - learning_rate * gradient_beta0
            beta_1 = beta_1 - learning_rate * gradient_beta1

Complete Mini-Batch Loop Logic: Copy and paste the code snippet above to replace the pass statement in your mini_batch_gradient_descent function. This code block does the following for each mini-batch:

With this code added, your mini_batch_gradient_descent function is now complete! It implements the full Mini-Batch SGD algorithm.

Continue

Step 3: Run Mini-Batch SGD and Compare with Batch GD

Let's run our Mini-Batch SGD implementation and compare its performance and behavior to Batch Gradient Descent.

# Set hyperparameters for Mini-Batch SGD
learning_rate_sgd = 0.01
n_epochs_sgd = 10 # Fewer epochs needed for SGD
mini_batch_size = 32

# Run Mini-Batch SGD
learned_beta0_sgd, learned_beta1_sgd, cost_history_sgd = mini_batch_gradient_descent(X, y, learning_rate_sgd, n_epochs_sgd, mini_batch_size)

print(f"SGD Learned Intercept (beta_0): {learned_beta0_sgd:.2f}")
print(f"SGD Learned Slope (beta_1): {learned_beta1_sgd:.2f}")

# (Code for plotting cost history and regression line comparison - REUSE from Lesson 3 Coding - Step 4, but adapt labels for SGD)

Run Mini-Batch SGD and Compare: Copy and paste this code block (and reuse your plotting code from Lesson 3, adapting labels as needed) into a new cell and run it. This code:

Examine the plots and printed parameters. How does Mini-Batch SGD compare to Batch GD in terms of convergence speed (epochs vs. iterations), cost function behavior (smoothness vs. noise), and the final learned regression line?

Expected output plots: (1) Cost function comparison plot showing SGD cost potentially decreasing faster per epoch than Batch GD per iteration, but possibly with more oscillations. (2) Regression line comparison plot showing both SGD and Batch GD lines fitting the data reasonably well, potentially very similar.
Continue

Experiment and Reflect

Congratulations, Mini-Batch SGD coder! 🎉 You've implemented Mini-Batch Stochastic Gradient Descent and compared it to Batch GD. Now, let's experiment and reflect on the differences.

Stop and Think

Compare the 'Cost Function Comparison' plot. Is the cost function history for SGD smoother or noisier than for Batch GD? Does SGD seem to converge faster in terms of epochs compared to Batch GD in terms of iterations? What do you think causes these differences?

Continue

Experiment Tasks

Continue

Coding Lesson Complete - SGD Master!

Fantastic job, Mini-Batch SGD implementer! 🚀 You've successfully coded Mini-Batch Stochastic Gradient Descent in Python, compared it to Batch GD, and experimented with mini-batch sizes and epochs. You now have a practical, hands-on understanding of SGD and its advantages. Share your code, plots, and experimental observations in the course forum – keep coding and keep optimizing!