杰瑞科技汇

sum axis 1 python

Of course! In Python, especially with libraries like NumPy and Pandas, "sum axis 1" is a very common operation. It refers to summing elements along the rows of a 2D array or DataFrame.

Let's break it down with clear examples for both libraries.


The Core Concept: Axis 0 vs. Axis 1

Imagine a 2D table (like a spreadsheet or a matrix):

Column 0 Column 1 Column 2
Row 0 1 2 3
Row 1 4 5 6
Row 2 7 8 9
  • axis=0 (Sum along the columns): You move down the columns.

    • Sum of Column 0: 1 + 4 + 7 = 12
    • Sum of Column 1: 2 + 5 + 8 = 15
    • Sum of Column 2: 3 + 6 + 9 = 18
    • The result is a 1D array: [12, 15, 18]
  • axis=1 (Sum along the rows): You move across the rows.

    • Sum of Row 0: 1 + 2 + 3 = 6
    • Sum of Row 1: 4 + 5 + 6 = 15
    • Sum of Row 2: 7 + 8 + 9 = 24
    • The result is a 1D array: [6, 15, 24]

Mnemonic: axis=1 sums across the 1st dimension (the rows), leaving you with a sum for each row.


Using NumPy

NumPy is the fundamental library for numerical operations in Python. Its axis parameter is consistent and powerful.

Example:

import numpy as np
# Create a 2D NumPy array
data = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
print("Original Array:")
print(data)
# Original Array:
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]
# Sum along axis 1 (sum each row)
row_sums = np.sum(data, axis=1)
print("\nSum along axis 1 (row sums):")
print(row_sums)
# Sum along axis 1 (row sums):
# [ 6 15 24]

Common NumPy Functions with axis=1

Most reduction functions in NumPy work the same way:

# Sum of squares for each row
row_sums_of_squares = np.sum(data**2, axis=1)
print("Sum of squares for each row:", row_sums_of_squares)
# Sum of squares for each row: [ 14  77 194]
# Mean of each row
row_means = np.mean(data, axis=1)
print("Mean of each row:", row_means)
# Mean of each row: [2. 5. 8.]
# Maximum value in each row
row_maxes = np.max(data, axis=1)
print("Max value in each row:", row_maxes)
# Max value in each row: [3 6 9]

Using Pandas

Pandas is built on top of NumPy and is designed for data manipulation, typically using DataFrames. The concept is identical, but the syntax is slightly different.

Example:

import pandas as pd
# Create a Pandas DataFrame
df = pd.DataFrame({
    'A': [1, 4, 7],
    'B': [2, 5, 8],
    'C': [3, 6, 9]
})
print("Original DataFrame:")
print(df)
# Original DataFrame:
#    A  B  C
# 0  1  2  3
# 1  4  5  6
# 2  7  8  9
# Sum along axis 1 (sum each row)
# In Pandas, axis=1 is the default for many row-wise operations,
# but it's good practice to be explicit.
row_sums = df.sum(axis=1)
print("\nSum along axis 1 (row sums):")
print(row_sums)
# Sum along axis 1 (row sums):
# 0     6
# 1    15
# 2    24
# dtype: int64

Notice that the result is a Pandas Series, with the original DataFrame's index preserved.

Common Pandas Methods with axis=1

# Add the row sums as a new column to the DataFrame
df['Row_Sum'] = df.sum(axis=1)
print("\nDataFrame with new 'Row_Sum' column:")
print(df)
# DataFrame with new 'Row_Sum' column:
#    A  B  C  Row_Sum
# 0  1  2  3        6
# 1  4  5  6       15
# 2  7  8  9       24
# Calculate the mean for each row
row_means = df.mean(axis=1)
print("\nMean of each row:")
print(row_means)
# Mean of each row:
# 0    2.0
# 1    5.0
# 2    8.0
# dtype: float64

Important: The axis=1 Trap for Beginners

A very common point of confusion is that axis=1 in Pandas often refers to columns, not rows.

  • df.drop(columns_to_drop, axis=1): This drops columns. You are acting on the columns.
  • df.rename(new_column_names, axis=1): This renames columns.

This seems contradictory, but it makes sense if you think of axis as the axis along which to apply the operation. When you drop a column, you are applying the drop operation along the column axis (axis=1).

However, for aggregation functions like sum(), mean(), std(), etc., the NumPy convention holds:

  • axis=0: Aggregate down the rows (result is one value per column).
  • axis=1: Aggregate across the columns (result is one value per row).

Summary Table

Operation NumPy (np.array) Pandas (pd.DataFrame) Result
Sum along rows np.sum(data, axis=1) df.sum(axis=1) A 1D array / Series with sum for each row
Sum along columns np.sum(data, axis=0) df.sum(axis=0) A 1D array / Series with sum for each column

Key Takeaway: For aggregation (sum, mean, etc.), axis=1 means sum across the columns for each row. This is the standard in both NumPy and Pandas.

分享:
扫描分享到社交APP
上一篇
下一篇