Of course! Let's dive deep into spearmanr() from Python's scipy.stats module.

What is spearmanr()?
spearmanr() is a function used to calculate the Spearman rank correlation coefficient. This is a non-parametric measure of the monotonic relationship between two datasets.
In simple terms, it tells you how well the relationship between two variables can be described by a monotonic function. A monotonic function is one that is either entirely non-increasing or non-decreasing.
Key Concepts: Pearson vs. Spearman
To understand spearmanr(), it's helpful to compare it with the more common Pearson correlation coefficient.
| Feature | Pearson Correlation (pearsonr) |
Spearman Correlation (spearmanr) |
|---|---|---|
| Type of Relationship | Measures linear relationships. | Measures monotonic relationships (linear or non-linear, as long as it's consistently increasing/decreasing). |
| Data Type | Works on the raw data values. | Works on the rank of the data values. It first converts the data into ranks (1st, 2nd, 3rd, etc.). |
| Robustness | Sensitive to outliers. A single extreme value can dramatically change the result. | Robust to outliers, since an outlier's rank is just its position in the sorted list, not its actual extreme value. |
| Assumptions | Assumes the data is roughly normally distributed and has a linear relationship. | Makes no assumptions about the distribution of the data. It's non-parametric. |
When to use spearmanr()?

- When your data is not normally distributed.
- When you have ordinal data (e.g., rankings like "low, medium, high").
- When you suspect a non-linear but monotonic relationship (e.g., an exponential growth curve).
- When your data contains significant outliers.
How to Use spearmanr()
Import the Function
First, you need to import it from scipy.stats.
from scipy.stats import spearmanr import numpy as np import matplotlib.pyplot as plt
Basic Syntax
The basic syntax is spearmanr(x, y), where x and y are arrays or lists of data.
# Two lists of data
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
# Calculate the Spearman correlation coefficient
corr, p_value = spearmanr(x, y)
print(f"Spearman correlation coefficient: {corr:.4f}")
print(f"P-value: {p_value:.4f}")
Output:
Spearman correlation coefficient: 1.0000
P-value: 0.0000
This is a perfect positive monotonic relationship. As x increases, y also increases perfectly.
Understanding the Output
The spearmanr() function returns two values:
-
corr(The Correlation Coefficient):- Ranges from -1 to +1.
- +1: Perfect positive monotonic relationship (as one variable increases, the other always increases).
- -1: Perfect negative monotonic relationship (as one variable increases, the other always decreases).
- 0: No monotonic relationship.
-
p_value(The P-value):- This tests the null hypothesis that the two datasets are uncorrelated.
- A small
p_value(typically < 0.05) indicates that you can reject the null hypothesis. In other words, there is a statistically significant correlation. - A large
p_value(>= 0.05) suggests that there is not enough evidence to conclude that a correlation exists.
Practical Examples
Let's look at different scenarios.
Example 1: Strong Positive Monotonic Relationship (Non-Linear)
This is a classic case where spearmanr() excels over pearsonr.
# Create a non-linear but monotonic relationship (exponential)
x = np.linspace(0, 10, 50)
y = np.exp(x) + np.random.normal(0, 5, 50) # Add some noise
# Calculate Spearman correlation
corr_s, p_s = spearmanr(x, y)
# For comparison, calculate Pearson correlation
from scipy.stats import pearsonr
corr_p, p_p = pearsonr(x, y)
print(f"Spearman Correlation: {corr_s:.4f} (p-value: {p_s:.4f})")
print(f"Pearson Correlation: {corr_p:.4f} (p-value: {p_p:.4f})")
# Visualize the relationship
plt.figure(figsize=(10, 5))
plt.scatter(x, y)"Exponential Relationship with Noise")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
Output:
Spearman Correlation: 0.9878 (p-value: 0.0000)
Pearson Correlation: 0.8902 (p-value: 0.0000)
The Spearman correlation (0.988) is much closer to 1 than the Pearson correlation (0.890), because Spearman correctly identifies the strong, consistent increasing trend, while Pearson is weakened by the non-linearity of the relationship.
Example 2: Strong Negative Monotonic Relationship
x = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
y = [100, 90, 80, 70, 60, 50, 40, 30, 20, 10]
corr, p_value = spearmanr(x, y)
print(f"Spearman correlation coefficient: {corr:.4f}")
print(f"P-value: {p_value:.4f}")
Output:
Spearman correlation coefficient: -1.0000
P-value: 0.0000
This is a perfect negative monotonic relationship.
Example 3: No Relationship
import random
x = [random.randint(1, 100) for _ in range(50)]
y = [random.randint(1, 100) for _ in range(50)]
corr, p_value = spearmanr(x, y)
print(f"Spearman correlation coefficient: {corr:.4f}")
print(f"P-value: {p_value:.4f}")
Output (will vary due to randomness):
Spearman correlation coefficient: 0.0871
P-value: 0.5529
The coefficient is close to 0, and the p-value is high (> 0.05), indicating no significant correlation.
Example 4: The Effect of Outliers
This example shows why Spearman is more robust.
# Data with a strong linear trend
x1 = np.linspace(1, 10, 20)
y1 = 2 * x1 + np.random.normal(0, 1, 20)
# Add a massive outlier
x2 = np.append(x1, 15)
y2 = np.append(y1, 100) # This point is way off the trend
# Calculate correlations
corr_s_clean, _ = spearmanr(x1, y1)
corr_s_outlier, _ = spearmanr(x2, y2)
corr_p_clean, _ = pearsonr(x1, y1)
corr_p_outlier, _ = pearsonr(x2, y2)
print("--- Without Outlier ---")
print(f"Spearman: {corr_s_clean:.4f}")
print(f"Pearson: {corr_p_clean:.4f}")
print("\n--- With Outlier ---")
print(f"Spearman: {corr_s_outlier:.4f}")
print(f"Pearson: {corr_p_outlier:.4f}")
Output:
--- Without Outlier ---
Spearman: 0.9878
Pearson: 0.9872
--- With Outlier ---
Spearman: 0.8353
Pearson: 0.5179
Notice how the Pearson correlation drops much more significantly (from 0.99 to 0.52) than the Spearman correlation (from 0.99 to 0.84) when the outlier is introduced. This demonstrates Spearman's robustness.
Handling Ties
If your data has duplicate values (ties), spearmanr() handles them by assigning the average rank. For example, if two values are tied for 2nd and 3rd place, they both receive a rank of 2.5.
x = [1, 2, 2, 3, 4]
y = [5, 6, 7, 8, 9]
# In x, the two '2's are tied. Their ranks are (2+3)/2 = 2.5
# Ranks of x: [1, 2.5, 2.5, 4, 5]
# Ranks of y: [1, 2, 3, 4, 5]
corr, p_value = spearmanr(x, y)
print(f"Spearman correlation with ties: {corr:.4f}")
Output:
Spearman correlation with ties: 0.9000
The function is designed to handle this automatically. You can also explicitly tell it to use a correction for ties by setting nan_policy='propagate' if needed, but the default behavior is correct for most cases.
Summary
| Use Case | Best Function | Why? |
|---|---|---|
| You want to measure the strength of a linear relationship. | pearsonr() |
It's specifically designed for linear correlation. |
| You want to measure the strength of a monotonic relationship (linear or non-linear). | spearmanr() |
It's based on ranks and captures any consistent upward/downward trend. |
| Your data has outliers. | spearmanr() |
It's robust to outliers because it uses ranks. |
| Your data is ordinal (ranked) or not normally distributed. | spearmanr() |
It's non-parametric and makes no distributional assumptions. |
