杰瑞科技汇

Python中如何操作行与列?

Of course! When you talk about "rows and columns" in Python, you're almost always referring to working with tabular data—data that looks like a spreadsheet or a database table.

Python中如何操作行与列?-图1
(图片来源网络,侵删)

The most popular and powerful library for this is Pandas. It's built on top of NumPy and provides easy-to-use data structures like the DataFrame.

Let's break down how to work with rows and columns using Pandas.


Setup: Installing and Importing Pandas

First, you need to have Pandas installed. If you don't, open your terminal or command prompt and run:

pip install pandas

Then, in your Python script or notebook, you'll typically import it with the conventional alias pd:

import pandas as pd
import numpy as np # Often used for creating sample data

Creating a Sample DataFrame

To understand rows and columns, let's create a sample DataFrame. A DataFrame is the primary Pandas data structure, a 2D table with labeled axes (rows and columns).

# Data for our DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 28],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
    'Score': [88, 92, 76, 85]
}
# Create the DataFrame
df = pd.DataFrame(data)
# Display the DataFrame
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
      Name  Age         City  Score
0    Alice   25     New York     88
1      Bob   30  Los Angeles     92
2  Charlie   35      Chicago     76
3    David   28      Houston     85

Notice the numbers on the left (0, 1, 2, 3). These are the index labels for the rows. The Name, Age, etc., are the column labels.


Working with COLUMNS

Columns in a Pandas DataFrame are essentially Pandas Series (a 1D labeled array).

Selecting a Single Column

You can select a column using its label as a key, similar to a dictionary.

# Select the 'Name' column
names = df['Name']
print(names)
print(type(names))

Output:

0      Alice
1        Bob
2    Charlie
3      David
Name: Name, dtype: object
<class 'pandas.core.series.Series'>

Selecting Multiple Columns

To select multiple columns, pass a list of column labels.

# Select the 'Name' and 'City' columns
name_and_city = df[['Name', 'City']]
print(name_and_city)

Output:

      Name         City
0    Alice     New York
1      Bob  Los Angeles
2  Charlie      Chicago
3    David      Houston

Adding a New Column

This is very straightforward. You can assign a new list or Series to a new column label.

# Add a new 'Country' column
df['Country'] = ['USA', 'USA', 'USA', 'USA']
print("\nDataFrame with new 'Country' column:")
print(df)

Output:

DataFrame with new 'Country' column:
      Name  Age         City  Score Country
0    Alice   25     New York     88     USA
1      Bob   30  Los Angeles     92     USA
2  Charlie   35      Chicago     76     USA
3    David   28      Houston     85     USA

Deleting a Column

Use the .drop() method. The axis=1 argument tells Pandas to look for the label in the columns (axis 1). The inplace=True argument modifies the DataFrame directly instead of returning a new one.

# Delete the 'Score' column
df.drop('Score', axis=1, inplace=True)
print("\nDataFrame after deleting 'Score' column:")
print(df)

Output:

DataFrame after deleting 'Score' column:
      Name  Age         City Country
0    Alice   25     New York     USA
1      Bob   30  Los Angeles     USA
2  Charlie   35      Chicago     USA
3    David   28      Houston     USA

Working with ROWS

Rows are a bit more complex because you can select them by their index label or their integer position.

Selecting a Single Row by Index Label

Use .loc[] for label-based indexing.

# Select the row with index label '1'
row_bob = df.loc[1]
print(row_bob)
print(type(row_bob))

Output:

Name          Bob
Age            30
City    Los Angeles
Country         USA
Name: 1, dtype: object
<class 'pandas.core.series.Series'>

Selecting a Single Row by Integer Position

Use .iloc[] for integer position-based indexing (like standard Python lists). Remember, indexing starts at 0.

# Select the row at integer position 2 (which is the 3rd row)
row_charlie = df.iloc[2]
print(row_charlie)

Output:

Name        Charlie
Age              35
City         Chicago
Country          USA
Name: 2, dtype: object

Selecting Multiple Rows

You can pass a list of labels to .loc or a list of integers to .iloc.

# Select rows with index labels 0 and 2
rows_0_and_2 = df.loc[[0, 2]]
print(rows_0_and_2)

Output:

    Name  Age     City Country
0  Alice   25  New York     USA
2  Charlie   35   Chicago     USA

Selecting Rows Based on a Condition (Filtering)

This is one of the most powerful features of Pandas. You create a "boolean mask" (a Series of True/False values) and pass it to .loc[].

# Select all rows where the 'Age' is greater than 28
older_than_28 = df.loc[df['Age'] > 28]
print(older_than_28)

Output:

    Name  Age         City Country
1    Bob   30  Los Angeles     USA
2  Charlie   35      Chicago     USA

You can also combine conditions using & (and) and (or). Important: Use parentheses around each condition.

# Select rows where Age > 28 AND City is 'Chicago'
specific_row = df.loc[(df['Age'] > 28) & (df['City'] == 'Chicago')]
print(specific_row)

Output:

      Name  Age   City Country
2  Charlie   35  Chicago     USA

Adding a New Row

Use .loc[] with a new index label.

# Add a new row at the end with index label 4
df.loc[4] = ['Eve', 40, 'Miami', 'USA']
print("\nDataFrame after adding a new row:")
print(df)

Output:

DataFrame after adding a new row:
      Name  Age         City Country
0    Alice   25     New York     USA
1      Bob   30  Los Angeles     USA
2  Charlie   35      Chicago     USA
3    David   28      Houston     USA
4      Eve   40        Miami     USA

Deleting a Row

Again, use .drop(). This time, you don't need to specify axis because the default is axis=0 (rows).

# Delete the row with index label 3
df.drop(3, inplace=True)
print("\nDataFrame after deleting row with index 3:")
print(df)

Output:

DataFrame after deleting row with index 3:
      Name  Age         City Country
0    Alice   25     New York     USA
1      Bob   30  Los Angeles     USA
2  Charlie   35      Chicago     USA
4      Eve   40        Miami     USA

Summary Table

Task Code Description
Column Selection df['column_name'] Select a single column.
df[['col1', 'col2']] Select multiple columns.
Column Operations df['new_col'] = [...] Add a new column.
df.drop('column', axis=1) Delete a column.
Row Selection (Label) df.loc[index_label] Select a single row by its index.
df.loc[[label1, label2]] Select multiple rows by their index.
Row Selection (Position) df.iloc[integer_position] Select a single row by its position.
df.iloc[[pos1, pos2]] Select multiple rows by their position.
Row Operations (Filtering) df.loc[df['col'] > 10] Select rows based on a condition.
Row Operations (Add/Delete) df.loc[new_label] = [...] Add a new row.
df.drop(index_label) Delete a row.

Key Takeaways

  1. Pandas is the standard for tabular data manipulation in Python.
  2. Columns are selected like dictionary keys (df['col']). They are essentially Pandas Series.
  3. Rows are selected using .loc[] (for index labels) or .iloc[] (for integer positions).
  4. Filtering rows (.loc[df['condition'] > value]) is an incredibly common and powerful operation.
  5. Always be mindful of the index. It's the label for your rows and can be more than just simple integers (e.g., dates, strings).
分享:
扫描分享到社交APP
上一篇
下一篇