杰瑞科技汇

Python rolling sum如何高效实现?

Of course! The term "rolling sum" is very common in data analysis, time series, and signal processing. It refers to calculating the sum of a sliding window of elements over an array or list.

Python rolling sum如何高效实现?-图1
(图片来源网络,侵删)

In Python, there are several ways to implement a rolling sum, ranging from a simple manual loop to highly optimized functions in popular libraries.

Here's a complete guide covering the most common and effective methods.

The Manual Approach (Using a Loop)

This is the most fundamental way to understand the concept. It's great for learning but can be slow for large datasets.

The logic is:

Python rolling sum如何高效实现?-图2
(图片来源网络,侵删)
  1. Iterate through the list, starting from the first element up to N - window_size + 1.
  2. For each starting index, slice the list to get the window and sum it.
def rolling_sum_manual(data, window_size):
    """
    Calculates the rolling sum using a simple loop.
    Args:
        data (list): A list of numbers.
        window_size (int): The size of the rolling window.
    Returns:
        list: A list of rolling sums.
    """
    if window_size > len(data):
        return []
    rolling_sums = []
    # Loop through the data, stopping before the window goes out of bounds
    for i in range(len(data) - window_size + 1):
        # Slice the data from the current index to the end of the window
        window = data[i : i + window_size]
        # Sum the window and append it to the results
        rolling_sums.append(sum(window))
    return rolling_sums
# --- Example Usage ---
my_data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
window = 3
result = rolling_sum_manual(my_data, window)
print(f"Data: {my_data}")
print(f"Rolling Sum (window={window}): {result}")
# Expected Output: [6, 9, 12, 15, 18, 21, 24, 27]

Pros:

  • Easy to understand the logic.
  • No external libraries needed.

Cons:

  • Inefficient: For large lists, this can be very slow because it recalculates the sum of overlapping windows from scratch.

The Efficient Manual Approach (Using a Sliding Window)

This is a much more optimized manual approach. Instead of recalculating the entire window sum, it "slides" the window by subtracting the element that's leaving and adding the new element that's entering.

def rolling_sum_efficient(data, window_size):
    """
    Calculates the rolling sum using an efficient sliding window technique.
    Args:
        data (list): A list of numbers.
        window_size (int): The size of the rolling window.
    Returns:
        list: A list of rolling sums.
    """
    if window_size > len(data) or window_size <= 0:
        return []
    # Calculate the sum of the first window
    current_sum = sum(data[:window_size])
    rolling_sums = [current_sum]
    # Slide the window through the rest of the data
    for i in range(window_size, len(data)):
        # Subtract the element that's leaving the window (at the beginning)
        # Add the new element that's entering the window (at the end)
        current_sum = current_sum - data[i - window_size] + data[i]
        rolling_sums.append(current_sum)
    return rolling_sums
# --- Example Usage ---
my_data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
window = 3
result = rolling_sum_efficient(my_data, window)
print(f"Data: {my_data}")
print(f"Rolling Sum (window={window}): {result}")
# Expected Output: [6, 9, 12, 15, 18, 21, 24, 27]

Pros:

Python rolling sum如何高效实现?-图3
(图片来源网络,侵删)
  • Highly Efficient: Runs in O(n) time, making it suitable for large datasets.
  • No external libraries needed.

Cons:

  • Slightly more complex logic than the simple loop.

The Best Practice: Using NumPy

For any serious numerical or data analysis work, NumPy is the standard library. It's fast, concise, and powerful.

The numpy.convolve function is perfect for this. The idea is to convolve your data with a "window" (a kernel) of ones. The convolution operation effectively slides the window and calculates the sum at each step.

import numpy as np
def rolling_sum_numpy(data, window_size):
    """
    Calculates the rolling sum using NumPy's convolve function.
    Args:
        data (list or np.array): A list or array of numbers.
        window_size (int): The size of the rolling window.
    Returns:
        np.array: An array of rolling sums.
    """
    # Create a window (kernel) of ones with the desired size
    window = np.ones(window_size)
    # Use convolve to calculate the sum. 'valid' mode returns only
    # where the window completely overlaps the data.
    return np.convolve(data, window, 'valid')
# --- Example Usage ---
my_data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
window = 3
result = rolling_sum_numpy(my_data, window)
print(f"Data: {my_data}")
print(f"Rolling Sum (window={window}): {result}")
# Expected Output: [ 6  9 12 15 18 21 24 27]

Pros:

  • Extremely Fast: Implemented in C, it's orders of magnitude faster than pure Python loops.
  • Concise and Readable: A single line of code does the job.
  • Part of a powerful ecosystem: Integrates seamlessly with other libraries like Pandas and Matplotlib.

Cons:

  • Requires an external library (pip install numpy).

The Pandas Approach (For Data Analysis)

If you are working with tabular data, time series, or anything that can be put into a DataFrame, Pandas is the go-to tool. It has a built-in, highly optimized rolling function.

import pandas as pd
def rolling_sum_pandas(data, window_size):
    """
    Calculates the rolling sum using a Pandas Series.
    Args:
        data (list): A list of numbers.
        window_size (int): The size of the rolling window.
    Returns:
        pd.Series: A Pandas Series of rolling sums.
    """
    # Create a Pandas Series from the data
    s = pd.Series(data)
    # Use the .rolling() method and then .sum()
    return s.rolling(window=window_size).sum()
# --- Example Usage ---
my_data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
window = 3
result = rolling_sum_pandas(my_data, window)
print(f"Data: {my_data}")
print(f"Rolling Sum (window={window}):\n{result}")
# Expected Output:
# Data: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Rolling Sum (window=3):
# 0     NaN
# 1     NaN
# 2     6.0
# 3     9.0
# 4    12.0
# 5    15.0
# 6    18.0
# 7    21.0
# 8    24.0
# 9    27.0
# dtype: float64

Notice that Pandas returns NaN (Not a Number) for the first window_size - 1 elements because there isn't enough data to fill the window. You can easily remove these with .dropna().

# To get a clean list without NaNs:
clean_result = rolling_sum_pandas(my_data, window).dropna().tolist()
print(f"Clean Result: {clean_result}")
# Expected Output: [6.0, 9.0, 12.0, 15.0, 18.0, 21.0, 24.0, 27.0]

Pros:

  • Extremely Fast: Optimized for performance.
  • Handles Missing Data: Built-in NaN handling is robust.
  • Rich Functionality: .rolling() provides many other operations (mean, std, min, max, etc.).
  • Integrates with the rest of the Pandas ecosystem.

Cons:

  • Requires an external library (pip install pandas).
  • Overkill for a simple, one-off calculation on a small list.

Summary and Recommendation

Method When to Use Performance Dependencies
Manual Loop Learning the concept or for very small, simple scripts. Slow (O(n*m)) None
Efficient Loop When you can't use external libraries but need performance. Fast (O(n)) None
NumPy The best general-purpose choice for numerical computing. Very Fast (C-optimized) numpy
Pandas When your data is already in a DataFrame or for time series analysis. Very Fast pandas

Recommendation:

  • For general-purpose numerical work, use NumPy. It's the standard and offers the best balance of performance and simplicity.
  • If you are doing any kind of data analysis, cleaning, or working with time series, Pandas is the superior choice. Its .rolling() method is designed for exactly this kind of task.
分享:
扫描分享到社交APP
上一篇
下一篇