Python Pandas如何判断NaN值？-杰瑞科技汇

Of course! This is a very common and important task in data analysis with pandas. The key thing to remember is that pandas uses NumPy as its foundation, so you'll often use NumPy's functions to check for NaN (Not a Number) values.

（图片来源网络，侵删）

Here’s a comprehensive guide covering the most common methods, from simple checks to practical replacements.

The Short Answer: The Best Methods

For everyday use, you will almost always use one of these two functions:

pd.isna() (Recommended): The most flexible and modern function. It works on pandas.Series, pandas.DataFrame, and even single scalar values (like a number or a string).
pd.notna(): The logical opposite of pd.isna().

`pd.isna()` and `pd.notna()` - The Recommended Approach

These functions are your go-to tools. They are intuitive, readable, and handle all data types gracefully (including None and NaT).

A) Checking a Single Value

You can use these functions directly on any value.

（图片来源网络，侵删）

import pandas as pd
import numpy as np
# With a NaN value
print(f"Is np.nan NaN? {pd.isna(np.nan)}")
# Output: Is np.nan NaN? True
# With a None value (pandas treats None as NaN in numeric columns)
print(f"Is None NaN? {pd.isna(None)}")
# Output: Is None NaN? True
# With a regular number
print(f"Is 42 NaN? {pd.isna(42)}")
# Output: Is 42 NaN? False
# With a string
print(f"Is 'hello' NaN? {pd.isna('hello')}")
# Output: Is 'hello' NaN? False

B) Checking a Pandas Series

This is where it becomes powerful. The function returns a new Series of boolean values (True for NaN, False otherwise).

# Create a Series with some missing values
s = pd.Series([1, 2, np.nan, 4, None, 6])
print("Original Series:")
print(s)
# Output:
# 0    1.0
# 1    2.0
# 2    NaN
# 3    4.0
# 4    NaN
# 5    6.0
# dtype: float64
print("\nBoolean mask of NaN values:")
print(pd.isna(s))
# Output:
# 0    False
# 1    False
# 2     True
# 3    False
# 4     True
# 5    False
# dtype: bool

C) Checking a Pandas DataFrame

When used on a DataFrame, pd.isna() returns a DataFrame of the same shape, filled with boolean values.

# Create a DataFrame with missing values
data = {'A': [1, np.nan, 3], 'B': [5, None, 7], 'C': ['a', 'b', np.nan]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Output:
#      A    B    C
# 0  1.0  5.0    a
# 1  NaN  NaN    b
# 2  3.0  7.0  NaN
print("\nBoolean mask of NaN values (DataFrame):")
print(pd.isna(df))
# Output:
#       A      B      C
# 0  False  False  False
# 1   True   True  False
# 2  False  False   True

Counting Missing Values

Once you have the boolean mask, you can easily count the number of NaN values.

Counting in a Series

s = pd.Series([1, 2, np.nan, 4, None, 6])
# Sum the boolean values (True is treated as 1, False as 0)
num_missing = s.isna().sum()
print(f"Number of missing values in Series: {num_missing}")
# Output: Number of missing values in Series: 2

Counting in a DataFrame (Column-wise)

df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [5, None, 7], 'C': ['a', 'b', np.nan]})
# Sum along axis=0 (columns)
missing_per_column = df.isna().sum()
print("Missing values per column:")
print(missing_per_column)
# Output:
# Missing values per column:
# A    1
# B    1
# C    1
# dtype: int64

Counting in a DataFrame (Total)

# Sum the previous result
total_missing = df.isna().sum().sum()
print(f"Total missing values in DataFrame: {total_missing}")
# Output: Total missing values in DataFrame: 3

Filtering Out Missing Values

A common operation is to create a new DataFrame or Series that excludes rows with any NaN values.

Using `.dropna()`

This is the most direct way to remove missing data.

df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [5, None, 7], 'C': ['a', 'b', np.nan]})
print("Original DataFrame:")
print(df)
# Drop rows with ANY missing values (default behavior)
df_dropped = df.dropna()
print("\nDataFrame after dropna():")
print(df_dropped)
# Output:
#    A    B  C
# 2  3.0  7.0  NaN
# Note: Row 2 is kept because it has a NaN in column 'C', but not in 'A' or 'B'.
# Wait, that's not right. Let's re-run.
# Ah, the default is how='any'. Let's be explicit.
df_dropped_any = df.dropna(how='any')
print("\nDataFrame after dropna(how='any'):")
print(df_dropped_any)
# Output:
# Empty DataFrame
# Columns: [A, B, C]
# Index: []
# Drop rows where ALL values are missing
df_dropped_all = df.dropna(how='all')
print("\nDataFrame after dropna(how='all'):")
print(df_dropped_all)
# Output:
#      A    B    C
# 0  1.0  5.0    a
# 1  NaN  NaN    b
# 2  3.0  7.0  NaN

Handling Missing Values: Filling Them In

Sometimes you want to keep the data but replace NaN with a specific value.

Using `.fillna()`

df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [5, None, 7]})
print("Original DataFrame:")
print(df)
# Fill all NaN values with a single number (e.g., 0)
df_filled_zero = df.fillna(0)
print("\nDataFrame filled with 0:")
print(df_filled_zero)
# Output:
#      A    B
# 0  1.0  5.0
# 1  0.0  0.0
# 2  3.0  7.0
# Fill with different values for each column
fill_values = {'A': 100, 'B': -999}
df_filled_specific = df.fillna(value=fill_values)
print("\nDataFrame filled with specific values per column:")
print(df_filled_specific)
# Output:
#        A      B
# 0    1.0    5.0
# 1  100.0 -999.0
# 2    3.0    7.0

The `numpy.isnan()` Function and Why It's Less Convenient

You might see np.isnan() used. While it works, it has a major drawback: it only works with floating-point numbers (float) and will raise a TypeError with strings or other types.

import numpy as np
s = pd.Series([1, np.nan, 'text', None])
# This works fine for the float NaN
print(np.isnan(s[1]))
# Output: True
# This will FAIL with a string or None
try:
    np.isnan(s[2]) # 'text'
except TypeError as e:
    print(f"\nError with string: {e}")
try:
    np.isnan(s[3]) # None
except TypeError as e:
    print(f"Error with None: {e}")

Because pd.isna() handles np.nan, None, and pd.NaT (Not a Time) without errors, it is the superior and more "pandas-idiomatic" choice.

Summary Table

Goal	Recommended Method	Example
Check if a single value is NaN	`pd.isna(value)`	`pd.isna(my_df.iloc[0, 1])`
Get a mask of NaN values	`series.isna()` or `df.isna()`	`mask = df['column_name'].isna()`
Count NaNs in a Series	`series.isna().sum()`	`df['A'].isna().sum()`
Count NaNs per column (DF)	`df.isna().sum()`	`missing_counts = df.isna().sum()`
Count total NaNs (DF)	`df.isna().sum().sum()`	`total_missing = df.isna().sum().sum()`
Remove rows with NaNs	`df.dropna()`	`clean_df = df.dropna()`
Fill NaNs with a value	`df.fillna(value)`	`filled_df = df.fillna(0)`

Python Pandas如何判断NaN值？

The Short Answer: The Best Methods

`pd.isna()` and `pd.notna()` - The Recommended Approach

A) Checking a Single Value

B) Checking a Pandas Series

C) Checking a Pandas DataFrame

Counting Missing Values

Counting in a Series

Counting in a DataFrame (Column-wise)

Counting in a DataFrame (Total)

Filtering Out Missing Values

Using `.dropna()`

Handling Missing Values: Filling Them In

Using `.fillna()`

The `numpy.isnan()` Function and Why It's Less Convenient

Summary Table

99ANYc3cd6

Python安装MySQL模块报错怎么办？

Ubuntu 16.04教程怎么学？

Python如何连接MySQL数据库？

Java调用DLL是多线程的吗？

Java中JSON字符串转Map有哪些方法？

Python自然语言处理如何高效处理PDF文档？

Java保留2位小数四舍五入怎么实现？

Java isDirectory()如何判断目录存在？

Java email正则表达式如何正确验证？

Java class 与 Java 有何关联？

PHP MySQL基础教程如何快速入门？

Hypermesh安装教程具体步骤是什么？

Java如何实现MongoDB ORM映射？

苹果电脑如何安装Windows双系统？

configparser怎么用？Python配置文件解析指南

Java数据结构与算法，如何高效掌握核心？

Python Pandas如何判断NaN值？

The Short Answer: The Best Methods

pd.isna() and pd.notna() - The Recommended Approach

A) Checking a Single Value

B) Checking a Pandas Series

C) Checking a Pandas DataFrame

Counting Missing Values

Counting in a Series

Counting in a DataFrame (Column-wise)

Counting in a DataFrame (Total)

Filtering Out Missing Values

Using .dropna()

Handling Missing Values: Filling Them In

Using .fillna()

The numpy.isnan() Function and Why It's Less Convenient

Summary Table

相关推荐

Python自然语言处理如何高效处理PDF文档？

`pd.isna()` and `pd.notna()` - The Recommended Approach

Using `.dropna()`

Using `.fillna()`

The `numpy.isnan()` Function and Why It's Less Convenient