杰瑞科技汇

Python中pandas的sub()如何实现数据相减?

Of course! The sub() method in pandas is a powerful and flexible tool for performing subtraction operations on Series and DataFrames. It's part of a family of arithmetic methods that includes add(), mul(), div(), floordiv(), mod, and pow().

Python中pandas的sub()如何实现数据相减?-图1
(图片来源网络,侵删)

Let's break down sub() with clear examples.

Basic Concept

The sub() method subtracts one object from another. Its syntax is:

Series.sub(other, axis='columns', level=None, fill_value=None)
  • other: The value or object you want to subtract. This can be a scalar (single number), a list, a tuple, a Series, or a DataFrame.
  • axis: Specifies the axis along which to perform the operation.
    • 'columns' (or 0): Perform the operation row-wise. This is the default.
    • 'index' (or 1): Perform the operation column-wise.
  • level: For hierarchical (MultiIndex) indices, this specifies the level to perform the operation on.
  • fill_value: A value to use as a filler when there are missing values (NaN) in either of the objects. If None (default), NaN will propagate.

sub() on a Series

This is the most common use case. You subtract a scalar or another Series from a Series.

Example A: Subtracting a Scalar

You can subtract a single number from every element in the Series.

Python中pandas的sub()如何实现数据相减?-图2
(图片来源网络,侵删)
import pandas as pd
import numpy as np
# Create a sample Series
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
# Subtract 5 from every element
result = s.sub(5)
print("Original Series:")
print(s)
print("\nSeries after subtracting 5:")
print(result)

Output:

Original Series:
a    10
b    20
c    30
d    40
dtype: int64
Series after subtracting 5:
a     5
b    15
c    25
d    35
dtype: int64

Example B: Subtracting Another Series (Element-wise)

When you subtract another Series, pandas aligns the data based on the index, not the position. This is a fundamental pandas concept.

import pandas as pd
# Create two Series with different indices
s1 = pd.Series([100, 200, 300], index=['a', 'b', 'c'])
s2 = pd.Series([10, 20, 30], index=['b', 'a', 'd'])
# Subtract s2 from s1. Note the alignment.
result = s1.sub(s2)
print("Series 1:")
print(s1)
print("\nSeries 2:")
print(s2)
print("\nResult of s1.sub(s2):")
print(result)

Output:

Series 1:
a    100
b    200
c    300
dtype: int64
Series 2:
b    10
a    20
d    30
dtype: int64
Result of s1.sub(s2):
a     80.0  # 100 (s1['a']) - 20 (s2['a'])
b    190.0  # 200 (s1['b']) - 10 (s2['b'])
c      NaN  # No matching index in s2
d      NaN  # No matching index in s1
dtype: float64

Notice how a and b were correctly matched by index, and the result for c and d is NaN because there was no corresponding value in the other Series.

Python中pandas的sub()如何实现数据相减?-图3
(图片来源网络,侵删)

sub() on a DataFrame

You can perform subtraction between DataFrames and scalars, or between two DataFrames.

Example A: Subtracting a Scalar from a DataFrame

The scalar is subtracted from every element in the DataFrame.

import pandas as pd
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [10, 20, 30]
}, index=['row1', 'row2', 'row3'])
# Subtract 1 from every element
result_df = df.sub(1)
print("Original DataFrame:")
print(df)
print("\nDataFrame after subtracting 1:")
print(result_df)

Output:

Original DataFrame:
         A   B
row1    1  10
row2    2  20
row3    3  30
DataFrame after subtracting 1:
         A   B
row1    0   9
row2    1  19
row3    2  29

Example B: Subtracting Two DataFrames (Alignment is Key)

Just like with Series, pandas aligns DataFrames based on both index and column labels.

import pandas as pd
df1 = pd.DataFrame({
    'A': [10, 20],
    'B': [100, 200]
}, index=['x', 'y'])
df2 = pd.DataFrame({
    'A': [1, 2],
    'C': [5, 6]  # Note different column name 'C'
}, index=['y', 'z'])
# Subtract df2 from df1
result_df = df1.sub(df2)
print("DataFrame 1:")
print(df1)
print("\nDataFrame 2:")
print(df2)
print("\nResult of df1.sub(df2):")
print(result_df)

Output:

DataFrame 1:
      A    B
x    10  100
y    20  200
DataFrame 2:
      A  C
y    1  5
z    2  6
Result of df1.sub(df2):
       A     B    C
x    NaN   NaN  NaN
y   19.0  195.0  NaN
z    NaN   NaN  NaN
  • The value at ('y', 'A') is 20 - 1 = 19.
  • The value at ('y', 'B') is 200 - NaN = NaN.
  • The value at ('x', 'A') is 10 - NaN = NaN.
  • The value at ('z', 'C') is NaN - 6 = NaN.

Example C: Using the axis Parameter

The axis parameter is useful when you want to subtract a Series from a DataFrame along a specific axis.

  • axis=0 (or 'index'): The Series is treated as a row and subtracted from each row of the DataFrame.
  • axis=1 (or 'columns'): The Series is treated as a column and subtracted from each column of the DataFrame.
import pandas as pd
df = pd.DataFrame({
    'A': [10, 20, 30],
    'B': [100, 200, 300]
}, index=['r1', 'r2', 'r3'])
# A Series to subtract
s = pd.Series([1, 2], index=['A', 'B'])
# 1. Subtract along the columns (axis=0)
# This subtracts the Series s from each row of df
result_axis0 = df.sub(s, axis=0)
print("Subtraction with axis=0 (row-wise):")
print(result_axis0)
# 2. Subtract along the rows (axis=1)
# This subtracts the Series s from each column of df
# Note: The Series index must match the DataFrame columns for this to work as expected.
result_axis1 = df.sub(s, axis=1)
print("\nSubtraction with axis=1 (column-wise):")
print(result_axis1)

Output:

Subtraction with axis=0 (row-wise):
       A    B
r1    9   98
r2   19  198
r3   29  298
Subtraction with axis=1 (column-wise):
       A    B
r1    9   98
r2   18  198
r3   28  298
  • axis=0: For each row, it does df['A'] - s['A'] and df['B'] - s['B'].
  • axis=1: For each row, it broadcasts the Series s and subtracts it element-wise from the row.

The fill_value Parameter

The fill_value parameter is very useful for avoiding NaN results when indices don't align. It provides a default value to use where one of the objects has a missing value.

import pandas as pd
s1 = pd.Series([100, 200, 300], index=['a', 'b', 'c'])
s2 = pd.Series([10, 20, 30], index=['b', 'a', 'd'])
# Without fill_value, we get NaNs
result_nan = s1.sub(s2)
print("Result without fill_value:")
print(result_nan)
# With fill_value=0, missing values are treated as 0
result_fill = s1.sub(s2, fill_value=0)
print("\nResult with fill_value=0:")
print(result_fill)

Output:

Result without fill_value:
a     80.0
b    190.0
c      NaN
d      NaN
dtype: float64
Result with fill_value=0:
a     90.0  # 100 - 20
b    190.0  # 200 - 10
c    300.0  # 300 - 0
d    -30.0  # 0 - 30
dtype: float64

Summary: sub() vs. The Operator

For most simple cases, you can use the standard operator, and it will call the sub() method behind the scenes.

Method Syntax Notes
sub() method df.sub(other) More explicit. Allows for additional parameters like fill_value and level directly.
operator df - other More concise and Pythonic. Perfect for simple cases.

For example, df - 5 is functionally identical to df.sub(5). The choice is often a matter of style and whether you need the extra functionality of the sub() method.

分享:
扫描分享到社交APP
上一篇
下一篇