杰瑞科技汇

Python Pandas如何判断NaN值?

Of course! This is a very common and important task in data analysis with pandas. The key thing to remember is that pandas uses NumPy as its foundation, so you'll often use NumPy's functions to check for NaN (Not a Number) values.

Python Pandas如何判断NaN值?-图1
(图片来源网络,侵删)

Here’s a comprehensive guide covering the most common methods, from simple checks to practical replacements.

The Short Answer: The Best Methods

For everyday use, you will almost always use one of these two functions:

  1. pd.isna() (Recommended): The most flexible and modern function. It works on pandas.Series, pandas.DataFrame, and even single scalar values (like a number or a string).
  2. pd.notna(): The logical opposite of pd.isna().

pd.isna() and pd.notna() - The Recommended Approach

These functions are your go-to tools. They are intuitive, readable, and handle all data types gracefully (including None and NaT).

A) Checking a Single Value

You can use these functions directly on any value.

Python Pandas如何判断NaN值?-图2
(图片来源网络,侵删)
import pandas as pd
import numpy as np
# With a NaN value
print(f"Is np.nan NaN? {pd.isna(np.nan)}")
# Output: Is np.nan NaN? True
# With a None value (pandas treats None as NaN in numeric columns)
print(f"Is None NaN? {pd.isna(None)}")
# Output: Is None NaN? True
# With a regular number
print(f"Is 42 NaN? {pd.isna(42)}")
# Output: Is 42 NaN? False
# With a string
print(f"Is 'hello' NaN? {pd.isna('hello')}")
# Output: Is 'hello' NaN? False

B) Checking a Pandas Series

This is where it becomes powerful. The function returns a new Series of boolean values (True for NaN, False otherwise).

# Create a Series with some missing values
s = pd.Series([1, 2, np.nan, 4, None, 6])
print("Original Series:")
print(s)
# Output:
# 0    1.0
# 1    2.0
# 2    NaN
# 3    4.0
# 4    NaN
# 5    6.0
# dtype: float64
print("\nBoolean mask of NaN values:")
print(pd.isna(s))
# Output:
# 0    False
# 1    False
# 2     True
# 3    False
# 4     True
# 5    False
# dtype: bool

C) Checking a Pandas DataFrame

When used on a DataFrame, pd.isna() returns a DataFrame of the same shape, filled with boolean values.

# Create a DataFrame with missing values
data = {'A': [1, np.nan, 3], 'B': [5, None, 7], 'C': ['a', 'b', np.nan]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Output:
#      A    B    C
# 0  1.0  5.0    a
# 1  NaN  NaN    b
# 2  3.0  7.0  NaN
print("\nBoolean mask of NaN values (DataFrame):")
print(pd.isna(df))
# Output:
#       A      B      C
# 0  False  False  False
# 1   True   True  False
# 2  False  False   True

Counting Missing Values

Once you have the boolean mask, you can easily count the number of NaN values.

Counting in a Series

s = pd.Series([1, 2, np.nan, 4, None, 6])
# Sum the boolean values (True is treated as 1, False as 0)
num_missing = s.isna().sum()
print(f"Number of missing values in Series: {num_missing}")
# Output: Number of missing values in Series: 2

Counting in a DataFrame (Column-wise)

df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [5, None, 7], 'C': ['a', 'b', np.nan]})
# Sum along axis=0 (columns)
missing_per_column = df.isna().sum()
print("Missing values per column:")
print(missing_per_column)
# Output:
# Missing values per column:
# A    1
# B    1
# C    1
# dtype: int64

Counting in a DataFrame (Total)

# Sum the previous result
total_missing = df.isna().sum().sum()
print(f"Total missing values in DataFrame: {total_missing}")
# Output: Total missing values in DataFrame: 3

Filtering Out Missing Values

A common operation is to create a new DataFrame or Series that excludes rows with any NaN values.

Using .dropna()

This is the most direct way to remove missing data.

df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [5, None, 7], 'C': ['a', 'b', np.nan]})
print("Original DataFrame:")
print(df)
# Drop rows with ANY missing values (default behavior)
df_dropped = df.dropna()
print("\nDataFrame after dropna():")
print(df_dropped)
# Output:
#    A    B  C
# 2  3.0  7.0  NaN
# Note: Row 2 is kept because it has a NaN in column 'C', but not in 'A' or 'B'.
# Wait, that's not right. Let's re-run.
# Ah, the default is how='any'. Let's be explicit.
df_dropped_any = df.dropna(how='any')
print("\nDataFrame after dropna(how='any'):")
print(df_dropped_any)
# Output:
# Empty DataFrame
# Columns: [A, B, C]
# Index: []
# Drop rows where ALL values are missing
df_dropped_all = df.dropna(how='all')
print("\nDataFrame after dropna(how='all'):")
print(df_dropped_all)
# Output:
#      A    B    C
# 0  1.0  5.0    a
# 1  NaN  NaN    b
# 2  3.0  7.0  NaN

Handling Missing Values: Filling Them In

Sometimes you want to keep the data but replace NaN with a specific value.

Using .fillna()

df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [5, None, 7]})
print("Original DataFrame:")
print(df)
# Fill all NaN values with a single number (e.g., 0)
df_filled_zero = df.fillna(0)
print("\nDataFrame filled with 0:")
print(df_filled_zero)
# Output:
#      A    B
# 0  1.0  5.0
# 1  0.0  0.0
# 2  3.0  7.0
# Fill with different values for each column
fill_values = {'A': 100, 'B': -999}
df_filled_specific = df.fillna(value=fill_values)
print("\nDataFrame filled with specific values per column:")
print(df_filled_specific)
# Output:
#        A      B
# 0    1.0    5.0
# 1  100.0 -999.0
# 2    3.0    7.0

The numpy.isnan() Function and Why It's Less Convenient

You might see np.isnan() used. While it works, it has a major drawback: it only works with floating-point numbers (float) and will raise a TypeError with strings or other types.

import numpy as np
s = pd.Series([1, np.nan, 'text', None])
# This works fine for the float NaN
print(np.isnan(s[1]))
# Output: True
# This will FAIL with a string or None
try:
    np.isnan(s[2]) # 'text'
except TypeError as e:
    print(f"\nError with string: {e}")
try:
    np.isnan(s[3]) # None
except TypeError as e:
    print(f"Error with None: {e}")

Because pd.isna() handles np.nan, None, and pd.NaT (Not a Time) without errors, it is the superior and more "pandas-idiomatic" choice.

Summary Table

Goal Recommended Method Example
Check if a single value is NaN pd.isna(value) pd.isna(my_df.iloc[0, 1])
Get a mask of NaN values series.isna() or df.isna() mask = df['column_name'].isna()
Count NaNs in a Series series.isna().sum() df['A'].isna().sum()
Count NaNs per column (DF) df.isna().sum() missing_counts = df.isna().sum()
Count total NaNs (DF) df.isna().sum().sum() total_missing = df.isna().sum().sum()
Remove rows with NaNs df.dropna() clean_df = df.dropna()
Fill NaNs with a value df.fillna(value) filled_df = df.fillna(0)
分享:
扫描分享到社交APP
上一篇
下一篇