Code commanders, prepare for an upgrade! 🚀 In this coding lesson, we're taking our Gradient Descent implementation to the next level by coding Mini-Batch Stochastic Gradient Descent (SGD) in Python! You'll build upon your Batch GD code from Lesson 3, adding the crucial elements of mini-batches and epochs to create a more efficient and powerful optimization algorithm. Let's code SGD and unleash its speed!
We'll start by reusing and modifying the Batch Gradient Descent code you wrote in Lesson 3. This will save us time and highlight the key changes needed to implement Mini-Batch SGD.
Start by opening your Jupyter Notebook or Colab notebook from Lesson 3 (or create a new notebook and copy over your Batch GD code if needed). We'll modify the batch_gradient_descent function to create our mini_batch_gradient_descent function.
We'll keep the calculate_mse and calculate_gradients functions exactly the same – they will be reused in our Mini-Batch SGD implementation. No need to rewrite those!
# (Code from Lesson 3 - calculate_mse function - REUSE AS IS)
def calculate_mse(y_true, y_predicted):
"""Calculates the Mean Squared Error."""
n = len(y_true)
mse = (1 / (2 * n)) * np.sum((y_true - y_predicted)**2)
return mse
# (Code from Lesson 3 - calculate_gradients function - REUSE AS IS)
def calculate_gradients(X, y_true, y_predicted):
"""Calculates gradients of MSE cost function."""
n = len(y_true)
errors = y_predicted - y_true
gradient_beta0 = (1 / n) * np.sum(errors)
gradient_beta1 = (1 / n) * np.sum(X * errors)
return gradient_beta0, gradient_beta1
Reuse calculate_mse and calculate_gradients: Ensure you have these two functions defined in your notebook (copy them over if needed).
Now, let's create the mini_batch_gradient_descent function. We'll start with the core logic of iterating through epochs and mini-batches.
from sklearn.model_selection import train_test_split # Import for shuffling
def mini_batch_gradient_descent(X, y, learning_rate, n_epochs, mini_batch_size):
"""Implements Mini-Batch Stochastic Gradient Descent for linear regression."""
beta_0 = 0
beta_1 = 0
history_cost = []
n_data = len(X)
for epoch in range(n_epochs):
# Shuffle data at the start of each epoch (NEW!)
X_shuffled, y_shuffled = train_test_split(X, y, shuffle=True)
for i in range(0, n_data, mini_batch_size): # Iterate through mini-batches (NEW!)
X_mini_batch = X_shuffled[i:i + mini_batch_size]
y_mini_batch = y_shuffled[i:i + mini_batch_size]
# --- Gradient and Parameter Update Code will go here ---
pass # Placeholder
# Calculate and store cost for full dataset (Moved to epoch loop)
y_predicted_full = beta_0 + beta_1 * X
cost = calculate_mse(y, y_predicted_full)
history_cost.append(cost)
return beta_0, beta_1, history_cost
Implement mini_batch_gradient_descent (Initial Structure): Copy and paste this code structure into a new cell. Let's understand the key additions for SGD:
train_test_split: We import train_test_split from sklearn.model_selection. We'll use this for easy shuffling of our dataset.for epoch in range(n_epochs): to iterate through epochs.X_shuffled, y_shuffled = train_test_split(X, y, shuffle=True) shuffles the data at the start of each epoch.for i in range(0, n_data, mini_batch_size): iterates through the shuffled data in steps of mini_batch_size, creating mini-batches X_mini_batch and y_mini_batch.pass statement is a placeholder where we'll add the gradient calculation and parameter update code for each mini-batch in the next step.This structure sets up the core loops for epochs and mini-batches. Now, let's fill in the gradient calculation and parameter update logic inside the mini-batch loop.
Let's complete the mini_batch_gradient_descent function by adding the gradient calculation and parameter update steps within the mini-batch loop.
# (Add this code inside the 'for i in range(0, n_data, mini_batch_size):' loop, replacing 'pass')
y_predicted_mini_batch = beta_0 + beta_1 * X_mini_batch # Predictions for mini-batch
gradient_beta0, gradient_beta1 = calculate_gradients(X_mini_batch, y_mini_batch, y_predicted_mini_batch) # Gradients on mini-batch
beta_0 = beta_0 - learning_rate * gradient_beta0
beta_1 = beta_1 - learning_rate * gradient_beta1
Complete Mini-Batch Loop Logic: Copy and paste the code snippet above to replace the pass statement in your mini_batch_gradient_descent function. This code block does the following for each mini-batch:
y_predicted_mini_batch = beta_0 + beta_1 * X_mini_batch calculates predictions using the current parameters, but only for the data points in the current mini-batch.gradient_beta0, gradient_beta1 = calculate_gradients(X_mini_batch, y_mini_batch, y_predicted_mini_batch) calls our calculate_gradients function to compute the gradients, but now it calculates them based only on the current mini-batch data.beta_0 = beta_0 - learning_rate * gradient_beta0 and beta_1 = beta_1 - learning_rate * gradient_beta1 update the parameters using the mini-batch gradients and the learning rate.With this code added, your mini_batch_gradient_descent function is now complete! It implements the full Mini-Batch SGD algorithm.
Let's run our Mini-Batch SGD implementation and compare its performance and behavior to Batch Gradient Descent.
# Set hyperparameters for Mini-Batch SGD
learning_rate_sgd = 0.01
n_epochs_sgd = 10 # Fewer epochs needed for SGD
mini_batch_size = 32
# Run Mini-Batch SGD
learned_beta0_sgd, learned_beta1_sgd, cost_history_sgd = mini_batch_gradient_descent(X, y, learning_rate_sgd, n_epochs_sgd, mini_batch_size)
print(f"SGD Learned Intercept (beta_0): {learned_beta0_sgd:.2f}")
print(f"SGD Learned Slope (beta_1): {learned_beta1_sgd:.2f}")
# (Code for plotting cost history and regression line comparison - REUSE from Lesson 3 Coding - Step 4, but adapt labels for SGD)
Run Mini-Batch SGD and Compare: Copy and paste this code block (and reuse your plotting code from Lesson 3, adapting labels as needed) into a new cell and run it. This code:
n_epochs_sgd and mini_batch_size.mini_batch_gradient_descent function.cost_history from Lesson 3 in your notebook).learned_beta0, learned_beta1 from Lesson 3).Examine the plots and printed parameters. How does Mini-Batch SGD compare to Batch GD in terms of convergence speed (epochs vs. iterations), cost function behavior (smoothness vs. noise), and the final learned regression line?
Congratulations, Mini-Batch SGD coder! 🎉 You've implemented Mini-Batch Stochastic Gradient Descent and compared it to Batch GD. Now, let's experiment and reflect on the differences.
mini_batch_size value (e.g., to 1, 8, 64, or even set it to the full dataset size to effectively make it Batch GD). Rerun the code for each mini-batch size. Observe how the cost history and convergence behavior change. What happens as you decrease the mini-batch size towards 1 (pure SGD)? What happens as you increase it towards the full dataset size (approaching Batch GD)?n_epochs_sgd value. Do you need more or fewer epochs for SGD to converge compared to the number of iterations needed for Batch GD to reach a similar cost level?learning_rate_sgd values for SGD. Does SGD require a different (potentially smaller) learning rate compared to Batch GD for stable convergence? Why might this be the case?Fantastic job, Mini-Batch SGD implementer! 🚀 You've successfully coded Mini-Batch Stochastic Gradient Descent in Python, compared it to Batch GD, and experimented with mini-batch sizes and epochs. You now have a practical, hands-on understanding of SGD and its advantages. Share your code, plots, and experimental observations in the course forum – keep coding and keep optimizing!