杰瑞科技汇

Python rollingmean如何实现?

Of course! Calculating a rolling mean (also known as a moving average) is a very common task in data analysis, especially for time-series data. It smooths out short-term fluctuations and highlights longer-term trends.

Python rollingmean如何实现?-图1
(图片来源网络,侵删)

Here’s a comprehensive guide on how to calculate rolling means in Python, covering the most popular and effective methods.

What is a Rolling Mean?

A rolling mean calculates the average of a fixed-size "window" of data points as it moves forward through a dataset. For a window size of N, the first value is the average of the first N data points, the second value is the average of data points 2 through N+1, and so on.


Method 1: The Best and Most Common Way (using Pandas)

If you are working with any kind of tabular data, especially time-series, Pandas is the standard tool for the job. It's fast, efficient, and has a dedicated, easy-to-use function.

Step 1: Install Pandas

If you don't have it installed, open your terminal or command prompt and run:

Python rollingmean如何实现?-图2
(图片来源网络,侵删)
pip install pandas

Step 2: Create a Pandas Series

A Pandas Series is a one-dimensional labeled array, which is perfect for this task.

import pandas as pd
import numpy as np
# Create some sample data (e.g., daily sales over 10 days)
data = [10, 12, 15, 14, 16, 18, 20, 19, 22, 24]
dates = pd.date_range(start='2025-01-01', periods=len(data))
# Create a Pandas Series
sales_series = pd.Series(data, index=dates)
print("Original Data:")
print(sales_series)

Step 3: Calculate the Rolling Mean

Use the .rolling() method followed by .mean().

# Calculate a 3-day rolling mean
window_size = 3
rolling_mean = sales_series.rolling(window=window_size).mean()
print(f"\n{window_size}-day Rolling Mean:")
print(rolling_mean)

Explanation of the Output

Notice the first two values in the rolling mean are NaN (Not a Number). This is because there aren't enough data points before the third day to calculate a 3-day average.

Original Data:
2025-01-01    10
2025-01-02    12
2025-01-03    15
2025-01-04    14
2025-01-05    16
2025-01-06    18
2025-01-07    20
2025-01-08    19
2025-01-09    22
2025-01-10    24
dtype: int64
3-day Rolling Mean:
2025-01-01       NaN
2025-01-02       NaN
2025-01-03    12.333333  # (10+12+15)/3
2025-01-04    13.666667  # (12+15+14)/3
2025-01-05    15.000000
2025-01-06    16.000000
2025-01-07    18.000000
2025-01-08    19.000000
2025-01-09    20.333333
2025-01-10    21.666667
dtype: float64

Handling the NaN Values

You can easily fill the NaN values using the .fillna() method. Common strategies are to fill with 0 or the first available value.

Python rollingmean如何实现?-图3
(图片来源网络,侵删)
# Fill NaN values with 0
rolling_mean_filled_zero = rolling_mean.fillna(0)
# Fill NaN values with the first non-NaN value (forward fill)
rolling_mean_filled_ffill = rolling_mean.fillna(method='ffill')
print("\nRolling Mean (NaN filled with 0):")
print(rolling_mean_filled_zero)
print("\nRolling Mean (NaN forward-filled):")
print(rolling_mean_filled_ffill)

Method 2: Using NumPy (The Manual Way)

NumPy is a powerful library for numerical operations. You can calculate a rolling mean with NumPy using a clever trick with np.cumsum() (cumulative sum), which is much faster than a manual loop for large datasets.

import numpy as np
# Use the same data from before
data = np.array([10, 12, 15, 14, 16, 18, 20, 19, 22, 24])
window_size = 3
# Calculate cumulative sum
cumsum = np.cumsum(data)
# Prepend the sum of the first window-1 elements to the cumsum array
# This handles the initial part of the calculation correctly
cumsum = np.insert(cumsum, 0, 0)
# Calculate the rolling sum and then the mean
rolling_sum = (cumsum[window_size:] - cumsum[:-window_size])
rolling_mean_np = rolling_sum / window_size
print(f"NumPy {window_size}-day Rolling Mean:")
print(rolling_mean_np)

Note: This NumPy method gives you the result starting from the first complete window. It doesn't produce the NaN values at the beginning like the Pandas method.


Method 3: Using a Simple Python Loop (For Understanding)

This method is great for understanding the underlying logic, but it is very slow for large arrays and should be avoided in production code. It's primarily for educational purposes.

def rolling_mean_loop(data, window_size):
    """Calculates the rolling mean using a simple for loop."""
    rolling_means = []
    for i in range(len(data) - window_size + 1):
        window = data[i : i + window_size]
        window_mean = sum(window) / window_size
        rolling_means.append(window_mean)
    return rolling_means
# Use the same data from before
data = [10, 12, 15, 14, 16, 18, 20, 19, 22, 24]
window_size = 3
rolling_mean_loop_result = rolling_mean_loop(data, window_size)
print(f"Loop-based {window_size}-day Rolling Mean:")
print(rolling_mean_loop_result)

Comparison and Recommendation

Method Pros Cons Best For
Pandas Fast, efficient, easy syntax, handles NaN automatically, integrates with plotting. Requires Pandas library. Almost all data analysis tasks, especially time-series. This is the recommended approach.
NumPy Very fast, no external dependencies needed (besides NumPy). Syntax is less intuitive, doesn't handle edge cases like NaN by default. Numerical computing, performance-critical applications where Pandas overhead is a concern.
Python Loop Easy to understand the logic, no libraries needed. Extremely slow for large datasets. Learning and educational purposes.

Complete Example: Visualizing the Rolling Mean

A key benefit of using Pandas is how easily it integrates with plotting libraries like Matplotlib.

import pandas as pd
import matplotlib.pyplot as plt
# Create sample data with some noise
np.random.seed(42)
dates = pd.date_range(start='2025-01-01', periods=50)
values = np.random.randn(50).cumsum() + 50 # Random walk starting at 50
data_series = pd.Series(values, index=dates)
# Calculate rolling means with different window sizes
rolling_mean_3 = data_series.rolling(window=3).mean()
rolling_mean_7 = data_series.rolling(window=7).mean()
rolling_mean_15 = data_series.rolling(window=15).mean()
# Plot the results
plt.figure(figsize=(12, 6))
plt.plot(data_series, label='Original Data', alpha=0.5, color='gray')
plt.plot(rolling_mean_3, label='3-day Rolling Mean', linewidth=2)
plt.plot(rolling_mean_7, label='7-day Rolling Mean', linewidth=2)
plt.plot(rolling_mean_15, label='15-day Rolling Mean', linewidth=2)
'Rolling Mean Smoothing')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.show()

This plot will clearly show how a larger window size results in a smoother line, as it averages out more of the short-term noise.

分享:
扫描分享到社交APP
上一篇
下一篇