杰瑞科技汇

Python如何高效删除指定列?

Of course! Dropping columns is a very common operation in data analysis with Python. The primary library for this is Pandas.

Python如何高效删除指定列?-图1
(图片来源网络,侵删)

Here’s a comprehensive guide covering the most common methods, from the simplest to more advanced use cases.

The Setup: First, Create a Sample DataFrame

All the examples below will use this sample DataFrame. It's good practice to create a small, reproducible example like this to follow along.

import pandas as pd
import numpy as np
# Create a sample DataFrame
data = {
    'student_id': [101, 102, 103, 104],
    'first_name': ['Alice', 'Bob', 'Charlie', 'David'],
    'last_name': ['Smith', 'Johnson', 'Brown', 'Lee'],
    'age': [20, 21, 19, 22],
    'major': ['Physics', 'Math', 'Chemistry', 'Biology'],
    'grade_level': ['Sophomore', 'Junior', 'Freshman', 'Senior'],
    'tuition_fee': [10000, 10500, 9800, 11000],
    'has_scholarship': [True, False, True, False]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Original DataFrame:

   student_id first_name last_name  age     major grade_level  tuition_fee  has_scholarship
0         101      Alice     Smith   20    Physics    Sophomore         10000             True
1         102        Bob   Johnson   21      Math       Junior         10500            False
2         103    Charlie     Brown   19  Chemistry    Freshman          9800             True
3         104      David       Lee   22    Biology      Senior         11000            False

Method 1: df.drop() (The Most Common Method)

This is the standard and most flexible way to drop columns. The key is to use the axis=1 parameter.

Python如何高效删除指定列?-图2
(图片来源网络,侵删)
  • axis=0 refers to rows.
  • axis=1 refers to columns.

Syntax

df.drop(columns=['column_name1', 'column_name2'], axis=1, inplace=False)

Parameters

  • labels: A single label or a list-like object of the column names to drop.
  • axis: Set to 1 (or 'columns') to drop columns. This is the most important parameter.
  • inplace: A boolean.
    • inplace=False (default): Returns a new DataFrame with the columns dropped. The original df is unchanged. This is safer and generally recommended.
    • inplace=True: Modifies the original DataFrame directly and returns None. This can be slightly more memory-efficient for very large DataFrames but can lead to bugs if you're not careful.

Example 1: Dropping a Single Column

Let's drop the last_name column. We'll use inplace=False to show that the original DataFrame remains unchanged.

# Create a copy to demonstrate inplace=False
df_copy = df.copy()
# Drop the 'last_name' column
df_dropped = df_copy.drop(columns=['last_name'])
print("\nDataFrame after dropping 'last_name' (inplace=False):")
print(df_dropped)
print("\nOriginal DataFrame is unchanged:")
print(df_copy)

Output:

DataFrame after dropping 'last_name' (inplace=False):
   student_id first_name  age     major grade_level  tuition_fee  has_scholarship
0         101      Alice   20    Physics    Sophomore         10000             True
1         102        Bob   21      Math       Junior         10500            False
2         103    Charlie   19  Chemistry    Freshman          9800             True
3         104      David   22    Biology      Senior         11000            False
Original DataFrame is unchanged:
   student_id first_name last_name  age     major grade_level  tuition_fee  has_scholarship
0         101      Alice     Smith   20    Physics    Sophomore         10000             True
1         102        Bob   Johnson   21      Math       Junior         10500            False
2         103    Charlie     Brown   19  Chemistry    Freshman          9800             True
3         104      David       Lee   22    Biology      Senior         11000            False

Example 2: Dropping Multiple Columns

You can pass a list of column names to the columns argument.

# Drop 'first_name' and 'last_name' columns
df_dropped_multiple = df.drop(columns=['first_name', 'last_name'])
print("\nDataFrame after dropping multiple columns:")
print(df_dropped_multiple)

Output:

Python如何高效删除指定列?-图3
(图片来源网络,侵删)
DataFrame after dropping multiple columns:
   student_id  age     major grade_level  tuition_fee  has_scholarship
0         101   20    Physics    Sophomore         10000             True
1         102   21      Math       Junior         10500            False
2         103   19  Chemistry    Freshman          9800             True
3         104   22    Biology      Senior         11000            False

Example 3: Using inplace=True

This modifies the DataFrame directly. Use with caution!

# Modifying the original df
df.drop(columns=['tuition_fee', 'has_scholarship'], inplace=True)
print("\nOriginal DataFrame after inplace=True:")
print(df)

Output:

Original DataFrame after inplace=True:
   student_id first_name last_name  age     major grade_level
0         101      Alice     Smith   20    Physics    Sophomore
1         102        Bob   Johnson   21      Math       Junior
2         103    Charlie     Brown   19  Chemistry    Freshman
3         104      David       Lee   22    Biology      Senior

Notice that the original df is now permanently changed.


Method 2: Selecting Columns to Keep (Often More Robust)

Instead of thinking about what to remove, you can think about what to keep. This is often safer, especially in automated scripts, because if a column you expect to drop is missing, your code won't error out.

You select the columns you want and assign the result back to a variable (or use inplace).

# Let's restore the original df first
df = pd.DataFrame(data)
# Select only the columns you want to keep
df_kept = df[['student_id', 'first_name', 'age', 'major']]
print("\nDataFrame keeping only selected columns:")
print(df_kept)

Output:

DataFrame keeping only selected columns:
   student_id first_name  age     major
0         101      Alice   20    Physics
1         102        Bob   21      Math
2         103    Charlie   19  Chemistry
3         104      David   22    Biology

Method 3: Dropping Columns Based on a Condition

Sometimes you want to drop columns that meet a certain criteria, like having all NaN values or a specific data type.

Example A: Dropping Columns with All NaN Values

This is useful for cleaning data after an operation that might have introduced empty columns.

# Add a column of all NaN values
df['empty_col'] = np.nan
print("\nDataFrame with an empty column:")
print(df)
# Drop columns where all values are NaN
df_dropped_nan = df.dropna(axis=1, how='all')
print("\nDataFrame after dropping all-NaN columns:")
print(df_dropped_nan)

Output:

DataFrame with an empty column:
   student_id first_name last_name  age     major grade_level  tuition_fee  has_scholarship  empty_col
0         101      Alice     Smith   20    Physics    Sophomore         10000             True        NaN
1         102        Bob   Johnson   21      Math       Junior         10500            False        NaN
2         103    Charlie     Brown   19  Chemistry    Freshman          9800             True        NaN
3         104      David       Lee   22    Biology      Senior         11000            False        NaN
DataFrame after dropping all-NaN columns:
   student_id first_name last_name  age     major grade_level  tuition_fee  has_scholarship
0         101      Alice     Smith   20    Physics    Sophomore         10000             True
1         102        Bob   Johnson   21      Math       Junior         10500            False
2         103    Charlie     Brown   19  Chemistry    Freshman          9800             True
3         104      David       Lee   22    Biology      Senior         11000            False

Example B: Dropping Columns by Data Type

You can filter columns based on their dtype.

# Let's restore the original df
df = pd.DataFrame(data)
# Identify columns to drop (e.g., all object/string columns)
cols_to_drop = df.select_dtypes(include=['object']).columns
# Drop those columns
df_dropped_type = df.drop(columns=cols_to_drop)
print("\nDataFrame after dropping all object/string columns:")
print(df_dropped_type)

Output:

DataFrame after dropping all object/string columns:
   student_id  age  tuition_fee  has_scholarship
0         101   20         10000             True
1         102   21         10500            False
2         103   19          9800             True
3         104   22         11000            False

Summary: Which Method to Use?

Scenario Recommended Method Why?
Dropping a known list of columns df.drop(columns=[...], inplace=False) Clear, explicit, and safe (doesn't modify original data).
You are sure which columns to remove and performance is critical df.drop(columns=[...], inplace=True) Modifies DataFrame in-place, saving memory on very large DataFrames.
The list of columns to drop might change or is dynamic df_kept = df[[col1, col2, ...]] More robust. Won't error if a column is missing.
Dropping columns that are empty or mostly empty df.dropna(axis=1, how='all' or 'any') The idiomatic way to handle NaN-based filtering.
Dropping columns based on their data type (e.g., all strings) df.drop(columns=df.select_dtypes(...).columns) Powerful and flexible for type-based cleaning.
分享:
扫描分享到社交APP
上一篇
下一篇