Python如何计算Spearman相关系数？-杰瑞科技汇

Of course! Let's dive deep into spearmanr() from Python's scipy.stats module.

（图片来源网络，侵删）

What is `spearmanr()`?

spearmanr() is a function used to calculate the Spearman rank correlation coefficient. This is a non-parametric measure of the monotonic relationship between two datasets.

In simple terms, it tells you how well the relationship between two variables can be described by a monotonic function. A monotonic function is one that is either entirely non-increasing or non-decreasing.

Key Concepts: Pearson vs. Spearman

To understand spearmanr(), it's helpful to compare it with the more common Pearson correlation coefficient.

Feature	Pearson Correlation (`pearsonr`)	Spearman Correlation (`spearmanr`)
Type of Relationship	Measures linear relationships.	Measures monotonic relationships (linear or non-linear, as long as it's consistently increasing/decreasing).
Data Type	Works on the raw data values.	Works on the rank of the data values. It first converts the data into ranks (1st, 2nd, 3rd, etc.).
Robustness	Sensitive to outliers. A single extreme value can dramatically change the result.	Robust to outliers, since an outlier's rank is just its position in the sorted list, not its actual extreme value.
Assumptions	Assumes the data is roughly normally distributed and has a linear relationship.	Makes no assumptions about the distribution of the data. It's non-parametric.

When to use spearmanr()?

（图片来源网络，侵删）

When your data is not normally distributed.
When you have ordinal data (e.g., rankings like "low, medium, high").
When you suspect a non-linear but monotonic relationship (e.g., an exponential growth curve).
When your data contains significant outliers.

How to Use `spearmanr()`

Import the Function

First, you need to import it from scipy.stats.

from scipy.stats import spearmanr
import numpy as np
import matplotlib.pyplot as plt

Basic Syntax

The basic syntax is spearmanr(x, y), where x and y are arrays or lists of data.

# Two lists of data
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
# Calculate the Spearman correlation coefficient
corr, p_value = spearmanr(x, y)
print(f"Spearman correlation coefficient: {corr:.4f}")
print(f"P-value: {p_value:.4f}")

Output:

Spearman correlation coefficient: 1.0000
P-value: 0.0000

This is a perfect positive monotonic relationship. As x increases, y also increases perfectly.

Understanding the Output

The spearmanr() function returns two values:

corr (The Correlation Coefficient):
- Ranges from -1 to +1.
- +1: Perfect positive monotonic relationship (as one variable increases, the other always increases).
- -1: Perfect negative monotonic relationship (as one variable increases, the other always decreases).
- 0: No monotonic relationship.
p_value (The P-value):
- This tests the null hypothesis that the two datasets are uncorrelated.
- A small p_value (typically < 0.05) indicates that you can reject the null hypothesis. In other words, there is a statistically significant correlation.
- A large p_value (>= 0.05) suggests that there is not enough evidence to conclude that a correlation exists.

Practical Examples

Let's look at different scenarios.

Example 1: Strong Positive Monotonic Relationship (Non-Linear)

This is a classic case where spearmanr() excels over pearsonr.

# Create a non-linear but monotonic relationship (exponential)
x = np.linspace(0, 10, 50)
y = np.exp(x) + np.random.normal(0, 5, 50) # Add some noise
# Calculate Spearman correlation
corr_s, p_s = spearmanr(x, y)
# For comparison, calculate Pearson correlation
from scipy.stats import pearsonr
corr_p, p_p = pearsonr(x, y)
print(f"Spearman Correlation: {corr_s:.4f} (p-value: {p_s:.4f})")
print(f"Pearson Correlation:  {corr_p:.4f} (p-value: {p_p:.4f})")
# Visualize the relationship
plt.figure(figsize=(10, 5))
plt.scatter(x, y)"Exponential Relationship with Noise")
plt.xlabel("x")
plt.ylabel("y")
plt.show()

Output:

Spearman Correlation: 0.9878 (p-value: 0.0000)
Pearson Correlation:  0.8902 (p-value: 0.0000)

The Spearman correlation (0.988) is much closer to 1 than the Pearson correlation (0.890), because Spearman correctly identifies the strong, consistent increasing trend, while Pearson is weakened by the non-linearity of the relationship.

Example 2: Strong Negative Monotonic Relationship

x = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
y = [100, 90, 80, 70, 60, 50, 40, 30, 20, 10]
corr, p_value = spearmanr(x, y)
print(f"Spearman correlation coefficient: {corr:.4f}")
print(f"P-value: {p_value:.4f}")

Output:

Spearman correlation coefficient: -1.0000
P-value: 0.0000

This is a perfect negative monotonic relationship.

Example 3: No Relationship

import random
x = [random.randint(1, 100) for _ in range(50)]
y = [random.randint(1, 100) for _ in range(50)]
corr, p_value = spearmanr(x, y)
print(f"Spearman correlation coefficient: {corr:.4f}")
print(f"P-value: {p_value:.4f}")

Output (will vary due to randomness):

Spearman correlation coefficient: 0.0871
P-value: 0.5529

The coefficient is close to 0, and the p-value is high (> 0.05), indicating no significant correlation.

Example 4: The Effect of Outliers

This example shows why Spearman is more robust.

# Data with a strong linear trend
x1 = np.linspace(1, 10, 20)
y1 = 2 * x1 + np.random.normal(0, 1, 20)
# Add a massive outlier
x2 = np.append(x1, 15)
y2 = np.append(y1, 100) # This point is way off the trend
# Calculate correlations
corr_s_clean, _ = spearmanr(x1, y1)
corr_s_outlier, _ = spearmanr(x2, y2)
corr_p_clean, _ = pearsonr(x1, y1)
corr_p_outlier, _ = pearsonr(x2, y2)
print("--- Without Outlier ---")
print(f"Spearman: {corr_s_clean:.4f}")
print(f"Pearson:  {corr_p_clean:.4f}")
print("\n--- With Outlier ---")
print(f"Spearman: {corr_s_outlier:.4f}")
print(f"Pearson:  {corr_p_outlier:.4f}")

Output:

--- Without Outlier ---
Spearman: 0.9878
Pearson:  0.9872
--- With Outlier ---
Spearman: 0.8353
Pearson:  0.5179

Notice how the Pearson correlation drops much more significantly (from 0.99 to 0.52) than the Spearman correlation (from 0.99 to 0.84) when the outlier is introduced. This demonstrates Spearman's robustness.

Handling Ties

If your data has duplicate values (ties), spearmanr() handles them by assigning the average rank. For example, if two values are tied for 2nd and 3rd place, they both receive a rank of 2.5.

x = [1, 2, 2, 3, 4]
y = [5, 6, 7, 8, 9]
# In x, the two '2's are tied. Their ranks are (2+3)/2 = 2.5
# Ranks of x: [1, 2.5, 2.5, 4, 5]
# Ranks of y: [1, 2, 3, 4, 5]
corr, p_value = spearmanr(x, y)
print(f"Spearman correlation with ties: {corr:.4f}")

Output:

Spearman correlation with ties: 0.9000

The function is designed to handle this automatically. You can also explicitly tell it to use a correction for ties by setting nan_policy='propagate' if needed, but the default behavior is correct for most cases.

Summary

Use Case	Best Function	Why?
You want to measure the strength of a linear relationship.	`pearsonr()`	It's specifically designed for linear correlation.
You want to measure the strength of a monotonic relationship (linear or non-linear).	`spearmanr()`	It's based on ranks and captures any consistent upward/downward trend.
Your data has outliers.	`spearmanr()`	It's robust to outliers because it uses ranks.
Your data is ordinal (ranked) or not normally distributed.	`spearmanr()`	It's non-parametric and makes no distributional assumptions.

Python如何计算Spearman相关系数？

What is `spearmanr()`?

Key Concepts: Pearson vs. Spearman

How to Use `spearmanr()`

Import the Function

Basic Syntax

Understanding the Output

Practical Examples

Example 1: Strong Positive Monotonic Relationship (Non-Linear)

Example 2: Strong Negative Monotonic Relationship

Example 3: No Relationship

Example 4: The Effect of Outliers

Handling Ties

Summary

99ANYc3cd6

Python request 数据如何高效获取？

中兴MF90C1怎么用？新手教程指南

Java中ASCII码如何转字符？

Java中的null和null有何区别？

Java如何实现WAV转MP3？

Java Socket编程实例具体怎么实现？

Oracle数据库安装教程，详细步骤是怎样的？

索尼d70遥控器怎么用？

Zen Coding的Python实现如何高效提升编码效率？

Python client.post如何发送POST请求？

oppor9s换屏幕教程步骤详解？

Windows XP教程，如何快速上手操作？

Flash电子相册制作教程有哪些关键步骤？

Python range()函数如何灵活生成列表？

Python如何安装LightGBM？

Python 2.6 ABI版本差异如何兼容？

Python如何计算Spearman相关系数？

What is spearmanr()?

Key Concepts: Pearson vs. Spearman

How to Use spearmanr()

Import the Function

Basic Syntax

Understanding the Output

Practical Examples

Example 1: Strong Positive Monotonic Relationship (Non-Linear)

Example 2: Strong Negative Monotonic Relationship

Example 3: No Relationship

Example 4: The Effect of Outliers

Handling Ties

Summary

相关推荐

Java Socket编程实例具体怎么实现？

What is `spearmanr()`?

How to Use `spearmanr()`