Python rollingmean如何实现？-杰瑞科技汇

Of course! Calculating a rolling mean (also known as a moving average) is a very common task in data analysis, especially for time-series data. It smooths out short-term fluctuations and highlights longer-term trends.

（图片来源网络，侵删）

Here’s a comprehensive guide on how to calculate rolling means in Python, covering the most popular and effective methods.

What is a Rolling Mean?

A rolling mean calculates the average of a fixed-size "window" of data points as it moves forward through a dataset. For a window size of N, the first value is the average of the first N data points, the second value is the average of data points 2 through N+1, and so on.

Method 1: The Best and Most Common Way (using Pandas)

If you are working with any kind of tabular data, especially time-series, Pandas is the standard tool for the job. It's fast, efficient, and has a dedicated, easy-to-use function.

Step 1: Install Pandas

If you don't have it installed, open your terminal or command prompt and run:

（图片来源网络，侵删）

pip install pandas

Step 2: Create a Pandas Series

A Pandas Series is a one-dimensional labeled array, which is perfect for this task.

import pandas as pd
import numpy as np
# Create some sample data (e.g., daily sales over 10 days)
data = [10, 12, 15, 14, 16, 18, 20, 19, 22, 24]
dates = pd.date_range(start='2025-01-01', periods=len(data))
# Create a Pandas Series
sales_series = pd.Series(data, index=dates)
print("Original Data:")
print(sales_series)

Step 3: Calculate the Rolling Mean

Use the .rolling() method followed by .mean().

# Calculate a 3-day rolling mean
window_size = 3
rolling_mean = sales_series.rolling(window=window_size).mean()
print(f"\n{window_size}-day Rolling Mean:")
print(rolling_mean)

Explanation of the Output

Notice the first two values in the rolling mean are NaN (Not a Number). This is because there aren't enough data points before the third day to calculate a 3-day average.

Original Data:
2025-01-01    10
2025-01-02    12
2025-01-03    15
2025-01-04    14
2025-01-05    16
2025-01-06    18
2025-01-07    20
2025-01-08    19
2025-01-09    22
2025-01-10    24
dtype: int64
3-day Rolling Mean:
2025-01-01       NaN
2025-01-02       NaN
2025-01-03    12.333333  # (10+12+15)/3
2025-01-04    13.666667  # (12+15+14)/3
2025-01-05    15.000000
2025-01-06    16.000000
2025-01-07    18.000000
2025-01-08    19.000000
2025-01-09    20.333333
2025-01-10    21.666667
dtype: float64

Handling the NaN Values

You can easily fill the NaN values using the .fillna() method. Common strategies are to fill with 0 or the first available value.

（图片来源网络，侵删）

# Fill NaN values with 0
rolling_mean_filled_zero = rolling_mean.fillna(0)
# Fill NaN values with the first non-NaN value (forward fill)
rolling_mean_filled_ffill = rolling_mean.fillna(method='ffill')
print("\nRolling Mean (NaN filled with 0):")
print(rolling_mean_filled_zero)
print("\nRolling Mean (NaN forward-filled):")
print(rolling_mean_filled_ffill)

Method 2: Using NumPy (The Manual Way)

NumPy is a powerful library for numerical operations. You can calculate a rolling mean with NumPy using a clever trick with np.cumsum() (cumulative sum), which is much faster than a manual loop for large datasets.

import numpy as np
# Use the same data from before
data = np.array([10, 12, 15, 14, 16, 18, 20, 19, 22, 24])
window_size = 3
# Calculate cumulative sum
cumsum = np.cumsum(data)
# Prepend the sum of the first window-1 elements to the cumsum array
# This handles the initial part of the calculation correctly
cumsum = np.insert(cumsum, 0, 0)
# Calculate the rolling sum and then the mean
rolling_sum = (cumsum[window_size:] - cumsum[:-window_size])
rolling_mean_np = rolling_sum / window_size
print(f"NumPy {window_size}-day Rolling Mean:")
print(rolling_mean_np)

Note: This NumPy method gives you the result starting from the first complete window. It doesn't produce the NaN values at the beginning like the Pandas method.

Method 3: Using a Simple Python Loop (For Understanding)

This method is great for understanding the underlying logic, but it is very slow for large arrays and should be avoided in production code. It's primarily for educational purposes.

def rolling_mean_loop(data, window_size):
    """Calculates the rolling mean using a simple for loop."""
    rolling_means = []
    for i in range(len(data) - window_size + 1):
        window = data[i : i + window_size]
        window_mean = sum(window) / window_size
        rolling_means.append(window_mean)
    return rolling_means
# Use the same data from before
data = [10, 12, 15, 14, 16, 18, 20, 19, 22, 24]
window_size = 3
rolling_mean_loop_result = rolling_mean_loop(data, window_size)
print(f"Loop-based {window_size}-day Rolling Mean:")
print(rolling_mean_loop_result)

Comparison and Recommendation

Method	Pros	Cons	Best For
Pandas	Fast, efficient, easy syntax, handles `NaN` automatically, integrates with plotting.	Requires Pandas library.	Almost all data analysis tasks, especially time-series. This is the recommended approach.
NumPy	Very fast, no external dependencies needed (besides NumPy).	Syntax is less intuitive, doesn't handle edge cases like `NaN` by default.	Numerical computing, performance-critical applications where Pandas overhead is a concern.
Python Loop	Easy to understand the logic, no libraries needed.	Extremely slow for large datasets.	Learning and educational purposes.

Complete Example: Visualizing the Rolling Mean

A key benefit of using Pandas is how easily it integrates with plotting libraries like Matplotlib.

import pandas as pd
import matplotlib.pyplot as plt
# Create sample data with some noise
np.random.seed(42)
dates = pd.date_range(start='2025-01-01', periods=50)
values = np.random.randn(50).cumsum() + 50 # Random walk starting at 50
data_series = pd.Series(values, index=dates)
# Calculate rolling means with different window sizes
rolling_mean_3 = data_series.rolling(window=3).mean()
rolling_mean_7 = data_series.rolling(window=7).mean()
rolling_mean_15 = data_series.rolling(window=15).mean()
# Plot the results
plt.figure(figsize=(12, 6))
plt.plot(data_series, label='Original Data', alpha=0.5, color='gray')
plt.plot(rolling_mean_3, label='3-day Rolling Mean', linewidth=2)
plt.plot(rolling_mean_7, label='7-day Rolling Mean', linewidth=2)
plt.plot(rolling_mean_15, label='15-day Rolling Mean', linewidth=2)
'Rolling Mean Smoothing')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.show()

This plot will clearly show how a larger window size results in a smoother line, as it averages out more of the short-term noise.

Python rollingmean如何实现？

What is a Rolling Mean?

Method 1: The Best and Most Common Way (using Pandas)

Step 1: Install Pandas

Step 2: Create a Pandas Series

Step 3: Calculate the Rolling Mean

Explanation of the Output

Handling the NaN Values

Method 2: Using NumPy (The Manual Way)

Method 3: Using a Simple Python Loop (For Understanding)

Comparison and Recommendation

Complete Example: Visualizing the Rolling Mean

99ANYc3cd6

华为H30C00如何安全刷机？

Python Jupyter教程该怎么学？

新手如何学炒股票？入门教程全集在哪看？

三星Note3换卡槽教程，如何操作？

OpenCV Python GPU加速怎么用？

Java Socket编程实例具体怎么实现？

eclipse for java 64位怎么下载安装？

ArrayList与List有何区别？

webservice wsdl java

python pandas header

Python lib reference，常用库如何快速查询？

Java中如何创建新文件？

Python contourf如何设置边缘线？

Python中Redis的SortSet如何实现高效排序？

Java String转int，如何实现？

Python如何重载已加载的模块？

Python rollingmean如何实现？

What is a Rolling Mean?

Method 1: The Best and Most Common Way (using Pandas)

Step 1: Install Pandas

Step 2: Create a Pandas Series

Step 3: Calculate the Rolling Mean

Explanation of the Output

Handling the NaN Values

Method 2: Using NumPy (The Manual Way)

Method 3: Using a Simple Python Loop (For Understanding)

Comparison and Recommendation

Complete Example: Visualizing the Rolling Mean

相关推荐

Java Socket编程实例具体怎么实现？