Python中mini batch如何实现？-杰瑞科技汇

Of course! Let's break down Mini-Batch Gradient Descent in Python, from the concept to a practical implementation.

（图片来源网络，侵删）

What is Mini-Batch Gradient Descent?

Mini-Batch Gradient Descent is an optimization algorithm used to train machine learning models, especially neural networks. It's a compromise between two other popular methods: Batch Gradient Descent and Stochastic Gradient Descent (SGD).

Let's quickly understand the extremes first:

Batch Gradient Descent: Calculates the gradient of the cost function using the entire training dataset for a single update step.
- Pros: Stable convergence, guaranteed to find the global minimum for convex functions.
- Cons: Very slow and computationally expensive for large datasets. Can get stuck in local minima for non-convex functions.
Stochastic Gradient Descent (SGD): Calculates the gradient using only a single, randomly selected training example for each update step.
（图片来源网络，侵删）
- Pros: Very fast updates, can escape local minima due to the noisy updates.
- Cons: Updates are very noisy (high variance), leading to a less stable convergence path. It may never settle at the exact minimum.

Mini-Batch Gradient Descent is the sweet spot in the middle. It splits the training dataset into small, random batches. For each update step, it calculates the gradient using one of these mini-batches.

Analogy: Imagine you're trying to learn the rules of a new language.

Batch GD: You read the entire dictionary and every grammar book before you try to speak a single word. (Slow, but very thorough).
SGD: You learn one word, try to use it, learn the next word, try to use it, and so on. (Fast, but your sentences are all over the place).
Mini-Batch GD: You learn a small group of 10-20 words (a "mini-batch"), practice making sentences with them, then learn the next group of 10-20 words. (A good balance of speed and stability).

Why Use Mini-Batches?

Computational Efficiency: It's much faster to perform matrix operations (which are highly optimized in libraries like NumPy) on a small batch than on the entire dataset. This is because of hardware parallelism (especially GPUs).
Faster Convergence: It converges faster than Batch GD because it updates the model more frequently.
Stability: It converges more smoothly and stably than SGD because the updates are less noisy (they are averaged over a batch).
Escape Local Minima: The noise from the random mini-batches can help the model jump out of shallow local minima, a common problem in the complex, non-convex loss landscapes of neural networks.

Python Implementation from Scratch

We'll implement a simple linear regression model using Mini-Batch Gradient Descent. We won't use scikit-learn for the core algorithm to understand the mechanics, but we'll use it to generate some data and for evaluation.

The Steps:

Generate Data: Create some sample data for a linear regression problem.
Initialize Parameters: Set initial weights and a bias.
Mini-Batch Training Loop:
- Shuffle the dataset.
- Split the dataset into mini-batches.
- Loop through the epochs (passes over the entire dataset).
- For each epoch, loop through the mini-batches.
- For each mini-batch, calculate the predictions, the loss, and the gradients.
- Update the weights and bias using the gradients.
Evaluate: Check the final model's performance.

The Code:

import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
# 1. Generate Data
# We create a dataset with 1000 samples, 1 feature, and some noise.
X, y = make_regression(n_samples=1000, n_features=1, noise=20, random_state=42)
# Add a bias (intercept) term to X (a column of ones)
X_b = np.c_[np.ones((X.shape[0], 1)), X] 
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_b, y, test_size=0.2, random_state=42)
# 2. Initialize Parameters
n_features = X_train.shape[1]
learning_rate = 0.01
n_epochs = 100
batch_size = 32 # The "mini-batch" size
# Initialize weights and bias (theta) with small random numbers
np.random.seed(42)
theta = np.random.randn(n_features, 1)
# Reshape y_train to be a column vector
y_train = y_train.reshape(-1, 1)
# Store the history of the cost function to plot later
cost_history = []
# 3. Mini-Batch Training Loop
m = len(X_train) # Number of training samples
for epoch in range(n_epochs):
    # Shuffle the training data at the beginning of each epoch
    indices = np.random.permutation(m)
    X_shuffled = X_train[indices]
    y_shuffled = y_train[indices]
    # Iterate over the mini-batches
    for i in range(0, m, batch_size):
        # Get the mini-batch
        X_i = X_shuffled[i:i + batch_size]
        y_i = y_shuffled[i:i + batch_size]
        # Number of samples in the current mini-batch
        m_batch = len(X_i)
        # Calculate predictions (forward pass)
        predictions = X_i.dot(theta)
        # Calculate the error
        error = predictions - y_i
        # Calculate the cost (Mean Squared Error) for this mini-batch
        cost = (1 / (2 * m_batch)) * np.sum(error**2)
        # Calculate the gradients (backward pass)
        # The gradient of the cost w.r.t. theta
        gradients = (1 / m_batch) * X_i.T.dot(error)
        # Update the parameters (theta)
        theta = theta - learning_rate * gradients
    # Calculate and store the full training cost at the end of each epoch
    # This gives us a smoother curve to plot
    full_predictions = X_train.dot(theta)
    full_error = full_predictions - y_train
    full_cost = (1 / (2 * m)) * np.sum(full_error**2)
    cost_history.append(full_cost)
    # Optional: Print progress
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1}/{n_epochs}, Cost: {full_cost:.4f}")
print("\nTraining finished.")
print(f"Final parameters (theta): \n{theta}")
# 4. Evaluate the model
# Make predictions on the test set
test_predictions = X_test.dot(theta)
# Calculate the Mean Squared Error on the test set
test_mse = mean_squared_error(y_test, test_predictions)
print(f"\nTest Mean Squared Error: {test_mse:.2f}")
# Plot the cost history
plt.figure(figsize=(10, 6))
plt.plot(cost_history)"Cost Function History")
plt.xlabel("Epoch")
plt.ylabel("Cost (MSE)")
plt.grid(True)
plt.show()
# Plot the results
plt.figure(figsize=(10, 6))
plt.scatter(X_test[:, 1], y_test, color='blue', label='Actual Data')
plt.plot(X_test[:, 1], test_predictions, color='red', linewidth=2, label='Predictions')"Linear Regression with Mini-Batch GD")
plt.xlabel("Feature")
plt.ylabel("Target")
plt.legend()
plt.grid(True)
plt.show()

Implementation with TensorFlow/Keras

In practice, you will almost never implement this from scratch. Deep learning frameworks like TensorFlow/Keras handle mini-batching automatically. This is how you would define the same model using Keras.

（图片来源网络，侵删）

Notice how simple and concise it is. The framework takes care of the data shuffling, batching, and gradient calculation.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
# 1. Generate Data (same as before)
X, y = make_regression(n_samples=1000, n_features=1, noise=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 2. Define the Model using Keras Sequential API
# We use a simple model with one Dense layer (which is linear regression)
model = keras.Sequential([
    layers.Dense(units=1, input_shape=(1,)) # 1 neuron for 1 output, 1 input feature
])
# 3. Compile the Model
# This configures the model for training.
# - Optimizer: 'sgd' is Stochastic Gradient Descent, but by default, Keras uses mini-batches.
# - Loss: 'mean_squared_error' is the standard loss for regression.
model.compile(optimizer='sgd', loss='mean_squared_error')
# Print model summary
model.summary()
# 4. Train the Model (Fit)
# Keras handles the mini-batching internally.
# - batch_size: We specify the batch size here.
# - epochs: Number of times to iterate over the entire dataset.
# - validation_data: Data to evaluate the model on at the end of each epoch.
history = model.fit(
    X_train, 
    y_train, 
    epochs=100, 
    batch_size=32, 
    validation_data=(X_test, y_test),
    verbose=1 # Set to 0 to hide the training progress
)
# 5. Evaluate the Model
test_loss = model.evaluate(X_test, y_test, verbose=0)
print(f"\nTest Mean Squared Error: {test_loss:.4f}")
# Get the learned weights and bias
weights, bias = model.get_weights()
print(f"\nLearned Weight: {weights[0][0]:.4f}")
print(f"Learned Bias: {bias[0]:.4f}")
# 6. Plot the results
# Plot training & validation loss values
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])'Model loss')
plt.ylabel('Loss (MSE)')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
# Plot the regression line
plt.subplot(1, 2, 2)
plt.scatter(X_test, y_test, color='blue', label='Actual Data')
# Create a line for the predictions
x_line = np.linspace(X_test.min(), X_test.max(), 100)
y_line = weights[0][0] * x_line + bias[0]
plt.plot(x_line, y_line, color='red', linewidth=2, label='Regression Line')"Linear Regression with Keras")
plt.xlabel("Feature")
plt.ylabel("Target")
plt.legend()
plt.grid(True)
plt.show()

Summary

Feature	Batch Gradient Descent	Stochastic Gradient Descent (SGD)	Mini-Batch Gradient Descent
Update Data	Entire dataset	Single sample	Small batch of samples
Speed	Very slow	Fast	Fast
Memory	High (needs whole dataset)	Very Low	Low (needs only one batch)
Update Stability	Very stable (low variance)	Very noisy (high variance)	Stable, but some noise (good balance)
Convergence	Smooth, guaranteed for convex	Noisy, oscillates around minimum	Smooth, efficient convergence
Use Case	Rarely used for large datasets	Online learning, very large datasets	The standard for deep learning

Mini-Batch Gradient Descent is the de facto standard for training neural networks because it provides the best of both worlds: the computational efficiency of matrix operations and the stable, fast convergence that allows models to effectively learn complex patterns.

Python中mini batch如何实现？

What is Mini-Batch Gradient Descent?

Why Use Mini-Batches?

Python Implementation from Scratch

The Steps:

The Code:

Implementation with TensorFlow/Keras

Summary

99ANYc3cd6

Java JSON字符串格式如何规范与校验？

拖拉机驾驶维护图解，新手能学会吗？

华为g7plus换屏教程难不难？

魅蓝note1换电池教程难不难？自己能换吗？

华为C199拆机视频教程详细步骤是怎样的？

Java Socket编程实例具体怎么实现？

Python WebDriver实战如何快速上手？

Java字符串占几个字节？

Python insertbatch如何高效批量插入数据？

iPhone4换前玻璃教程步骤详解？

Java如何实现WebService的XML交互？

Python自然语言处理如何高效处理PDF文档？

三菱plc编程软件使用教程

autodesk3dsmax教程

Python 2.6在Windows下如何安装配置？

Java类如何调用另一个类的方法？

Python中mini batch如何实现？

What is Mini-Batch Gradient Descent?

Why Use Mini-Batches?

Python Implementation from Scratch

The Steps:

The Code:

Implementation with TensorFlow/Keras

Summary

相关推荐

Java Socket编程实例具体怎么实现？