杰瑞科技汇

Python pd如何实现行号添加?

Of course! In Python with the Pandas library, "row number" can mean a few different things. I'll cover the most common use cases, from the simplest to the most flexible.

Python pd如何实现行号添加?-图1
(图片来源网络,侵删)

The Short Answer: df.reset_index()

If you just want to add a new column with a standard 0-based or 1-based index, the easiest way is to use reset_index().

import pandas as pd
import io
# Sample data
data = """Name,Score,Age
Alice,95,25
Bob,88,30
Charlie,92,28"""
df = pd.read_csv(io.StringIO(data))
print("Original DataFrame:")
print(df)
# Original DataFrame:
#       Name  Score  Age
# 0    Alice     95   25
# 1      Bob     88   30
# 2  Charlie     92   28
# --- Method 1: Add a new column with the row number (0-based) ---
df_with_row_num = df.reset_index()
print("\nDataFrame with new 'index' column (0-based):")
print(df_with_row_num)
# DataFrame with new 'index' column (0-based):
#    index     Name  Score  Age
# 0      0    Alice     95   25
# 1      1      Bob     88   30
# 2      2  Charlie     92   28
# --- Method 2: Add a 1-based row number column ---
# First, reset the index, then add 1 to it
df_with_row_num_1_based = df.reset_index()
df_with_row_num_1_based['row_num'] = df_with_row_num_1_based['index'] + 1
# A more direct way is to use `reset_index(drop=True)` and then `range`
df_with_row_num_1_based_direct = df.reset_index(drop=True)
df_with_row_num_1_based_direct['row_num'] = range(1, len(df) + 1)
print("\nDataFrame with a 1-based 'row_num' column:")
print(df_with_row_num_1_based_direct)
# DataFrame with a 1-based 'row_num' column:
#    index     Name  Score  Age  row_num
# 0      0    Alice     95   25        1
# 1      1      Bob     88   30        2
# 2      2  Charlie     92   28        3

Detailed Breakdown of Methods

Here are the different scenarios and how to handle them.

Adding a Simple Row Number Column (1-based or 0-based)

This is the most common request. You want a column that simply counts the rows from 1 or 0.

Method A: Using reset_index() (Recommended)

This is the most idiomatic Pandas way. It converts the existing index into a new column.

Python pd如何实现行号添加?-图2
(图片来源网络,侵删)
import pandas as pd
df = pd.DataFrame({'Product': ['A', 'B', 'C', 'D'], 'Price': [10, 20, 15, 25]})
# Add a 0-based index as a new column
df_with_index = df.reset_index()
print(df_with_index)
#    index Product  Price
# 0      0       A     10
# 1      1       B     20
# 2      2       C     15
# 3      3       D     25
# Add a 1-based row number as a new column
# The `drop=True` argument prevents the old index from becoming a column.
df['RowNumber'] = range(1, len(df) + 1)
print(df)
#   Product  Price  RowNumber
# 0       A     10          1
# 1       B     20          2
# 2       C     15          3
# 3       D     25          4

Method B: Using DataFrame.index

You can also directly assign the index to a new column. This is simple but always creates a 0-based number.

df = pd.DataFrame({'Product': ['A', 'B', 'C'], 'Price': [10, 20, 15]})
df['RowNumber_0_based'] = df.index
df['RowNumber_1_based'] = df.index + 1
print(df)
#   Product  Price  RowNumber_0_based  RowNumber_1_based
# 0       A     10                  0                  1
# 1       B     20                  1                  2
# 2       C     15                  2                  3

Adding a Row Number within Groups

This is a very powerful and common operation. You want to number rows independently for each category in a column (e.g., number of sales per customer).

For this, groupby().cumcount() is the perfect tool.

  • groupby(): Groups the DataFrame by a column(s).
  • .cumcount(): Returns a cumulative count for each group, starting from 0.
import pandas as pd
data = {'CustomerID': [1, 1, 2, 1, 2, 3],
        'OrderDate': ['2025-01-01', '2025-01-15', '2025-01-10', '2025-02-01', '2025-02-05', '2025-01-20'],
        'Amount': [100, 150, 50, 200, 75, 300]}
df = pd.DataFrame(data)
# Add a row number for each customer's orders
# The `+1` makes it 1-based instead of 0-based
df['OrderNumber'] = df.groupby('CustomerID').cumcount() + 1
print(df.sort_values(by=['CustomerID', 'OrderDate']))
#    CustomerID OrderDate  Amount  OrderNumber
# 0           1 2025-01-01     100            1
# 1           1 2025-01-15     150            2
# 3           1 2025-02-01     200            3
# 2           2 2025-01-10      50            1
# 4           2 2025-02-05      75            2
# 5           3 2025-01-20     300            1

Adding a Row Number after Sorting (Ranking)

Sometimes you want a row number that reflects a specific order, like a rank. For example, ranking products by price.

Python pd如何实现行号添加?-图3
(图片来源网络,侵删)

For this, rank() is the best choice.

import pandas as pd
df = pd.DataFrame({'Product': ['A', 'B', 'C', 'D'], 'Price': [25, 10, 20, 15]})
# Rank products by price in ascending order (cheapest is #1)
# method='dense' handles ties well (e.g., if two products had price 15, they'd both be rank 2)
df['PriceRank'] = df['Price'].rank(method='dense').astype(int)
print(df.sort_values(by='PriceRank'))
#   Product  Price  PriceRank
# 1       B     10          1
# 3       D     15          2
# 2       C     20          3
# 0       A     25          4

Adding a Row Number based on a Condition

This is more advanced. You might want to number rows only if they meet a certain criterion.

Here, you can use cumsum() on a boolean (True/False) series. True is treated as 1 and False as 0.

import pandas as pd
df = pd.DataFrame({'TransactionID': ['T001', 'T002', 'T003', 'T004', 'T005'],
                   'IsRefund': [False, True, False, False, True]})
# We want to number the refunds: 1 for the first refund, 2 for the second, etc.
df['RefundNumber'] = (df['IsRefund']).cumsum()
print(df)
#   TransactionID  IsRefund  RefundNumber
# 0           T001     False             0
# 1           T002      True             1
# 2           T003     False             1
# 3           T004     False             1
# 4           T005      True             2

Summary: Which Method to Use?

Goal Method Example
Add a simple, static row number to the whole DataFrame. df['col'] = range(1, len(df)+1) df['RowNum'] = range(1, len(df) + 1)
Convert the current index into a new column. df.reset_index() df_with_index = df.reset_index()
Number rows within distinct groups. df.groupby('col').cumcount() + 1 df['OrderNum'] = df.groupby('CustomerID').cumcount() + 1
Create a rank based on the values in a column (e.g., price, score). df['col'].rank(method='...') df['Rank'] = df['Score'].rank(method='dense')
Number rows based on a specific condition (e.g., only refunds). (df['col']).cumsum() df['RefundNum'] = (df['IsRefund']).cumsum()
分享:
扫描分享到社交APP
上一篇
下一篇