Of course! The rank() method in pandas is a powerful and versatile tool for ranking data within a DataFrame or Series. It's much more than a simple "1st, 2nd, 3rd" counter; it can handle ties, account for descending order, and use different ranking methodologies.

Here’s a comprehensive guide to using pandas.DataFrame.rank() and pandas.Series.rank().
The Basic Idea
At its core, rank() assigns a rank to each value in a group of values. The default behavior is to assign the rank 1 to the smallest value.
Let's start with a simple example.
import pandas as pd
import numpy as np
# Create a simple Series
data = pd.Series([10, 50, 20, 50, 40])
print("Original Data:")
print(data)
Original Data:

0 10
1 50
2 20
3 50
4 40
dtype: int64
Now, let's rank the values. The smallest value (10) gets rank 1, the next (20) gets rank 2, and so on.
# Rank the series (default is 'average' for ties)
ranked_data = data.rank()
print("\nRanked Data (default method):")
print(ranked_data)
Ranked Data (default method):
0 1.0
1 4.5
2 2.0
3 4.5
4 3.0
dtype: float64
Notice the result:
10is the smallest, so it gets rank0.20is next, so it gets rank0.40is next, so it gets rank0.- The two
50s are tied for the highest value. The defaultmethod='average'gives them the average of the ranks they would have occupied (ranks 4 and 5). So,(4 + 5) / 2 = 4.5.
Key Parameters of the rank() Method
The rank() method has several important parameters that control its behavior.
method: How to Handle Ties
This is the most crucial parameter. It determines how to assign ranks when values are identical.
| Method | Description | Example for [10, 50, 20, 50, 40] |
|---|---|---|
'average' (Default) |
Assigns the average of the ranks. | [1.0, 4.5, 2.0, 4.5, 3.0] |
'min' |
Assigns the minimum of the ranks. | [1.0, 4.0, 2.0, 4.0, 3.0] |
'max' |
Assigns the maximum of the ranks. | [1.0, 5.0, 2.0, 5.0, 3.0] |
'first' |
Assigns the rank based on the order they appear in the data. | [1.0, 4.0, 2.0, 5.0, 3.0] (The first 50 gets rank 4) |
'dense' |
Like 'min', but the rank increases by 1, not by the number of tied elements. |
[1.0, 3.0, 2.0, 3.0, 4.0] |
Demonstration:
print("Original Data:", data.values)
# Using different methods for ties
print("'min' method: ", data.rank(method='min').values)
print("'max' method: ", data.rank(method='max').values)
print("'first' method: ", data.rank(method='first').values)
print("'dense' method: ", data.rank(method='dense').values)
Output:
Original Data: [10 50 20 50 40]
'min' method: [1. 4. 2. 4. 3.]
'max' method: [1. 5. 2. 5. 3.]
'first' method: [1. 4. 2. 5. 3.]
'dense' method: [1. 3. 2. 3. 4.]
ascending: Rank Order
By default (ascending=True), the smallest value gets rank 1. You can set ascending=False to rank in descending order (the largest value gets rank 1).
# Rank in descending order (largest value is rank 1)
print("Descending Rank (average method):")
print(data.rank(ascending=False))
Descending Rank (average method):
0 5.0
1 2.5
2 4.0
3 2.5
4 3.0
dtype: float64
50is the largest, so it gets rank5(average of 2 and 3).40is next, so it gets rank0.10is the smallest, so it gets the highest rank,0.
axis: Rank Along Rows or Columns
axis=0(default): Ranks values within each column.axis=1: Ranks values within each row.
# Create a DataFrame
df = pd.DataFrame({
'Score_A': [88, 92, 85, 92],
'Score_B': [75, 92, 80, 85]
})
print("Original DataFrame:")
print(df)
# Rank within each column (axis=0)
print("\nRanking by columns (axis=0):")
print(df.rank(axis=0))
Original DataFrame:
Score_A Score_B
0 88 75
1 92 92
2 85 80
3 92 85
Ranking by columns (axis=0):
Score_A Score_B
0 2.0 1.0
1 3.5 3.5
2 1.0 2.0
3 3.5 3.5
- Score_A column:
85(1st),88(2nd),92(tied for 3rd/4th -> 3.5). - Score_B column:
75(1st),80(2nd),85(3rd),92(4th). Wait, why is92rank 3.5? Because the two92s are in rows 1 and 3. They are tied for the highest value, so they get the average of ranks 4 and 3, which is 3.5.
# Rank within each row (axis=1)
print("\nRanking by rows (axis=1):")
print(df.rank(axis=1))
Ranking by rows (axis=1):
Score_A Score_B
0 2.0 1.0
1 1.0 2.0
2 2.0 1.0
3 1.0 2.0
- Row 0:
75<88. SoScore_Bis 1st,Score_Ais 2nd. - Row 1:
92==92. They are tied. The default'average'method gives both the average of ranks 1 and 2, which is 1.5. My previous output was slightly off, let's re-run it correctly. Correction:print(df.rank(axis=1))
Corrected Output:
Score_A Score_B 0 2.0 1.0 1 1.5 1.5 # The two 92s are tied 2 2.0 1.0 3 1.0 2.0
na_option: Handling Missing Values
What should rank() do with NaN (Not a Number) values?
| Option | Description |
|---|---|
'keep' (Default) |
Keeps NaN in the same position. |
'top' |
Ranks NaN as the smallest value (rank 1). |
`'bottom'`` | Ranks NaN as the largest value. |
df_with_nan = pd.DataFrame({'col': [10, np.nan, 20, np.nan, 5]})
print("DataFrame with NaNs:")
print(df_with_nan)
print("\nRank with 'keep' (default):")
print(df_with_nan.rank(na_option='keep'))
print("\nRank with 'top':")
print(df_with_nan.rank(na_option='top'))
DataFrame with NaNs:
col
0 10.0
1 NaN
2 20.0
3 NaN
4 5.0
Rank with 'keep' (default):
col
0 3.0
1 NaN
2 4.0
3 NaN
4 1.0
Rank with 'top':
col
0 3.0
1 1.0 # NaN is treated as the smallest
2 4.0
3 2.0 # NaN is treated as the second smallest
4 1.0
pct: Rank as a Percentage
If pct=True, the rank is returned as a percentage (from 0.0 to 1.0) instead of an absolute rank. This is useful for statistics.
# Rank as a percentage of the total number of values
print("Rank as percentage:")
print(data.rank(pct=True))
Rank as percentage:
0 0.20 # 1 out of 5 values
1 0.90 # 4.5 out of 5
2 0.40 # 2 out of 5
3 0.90 # 4.5 out of 5
4 0.60 # 3 out of 5
dtype: float64
Practical Example: Ranking Students
Let's put it all together with a more realistic example.
# Student scores DataFrame
students = pd.DataFrame({
'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
'Math_Score': [88, 92, 85, 92, 78, 85],
'Physics_Score': [75, 92, 80, 85, 95, np.nan]
})
print("Student Scores:")
print(students)
# 1. Rank students by Math Score (descending, highest is rank 1)
students['Math_Rank'] = students['Math_Score'].rank(ascending=False, method='min')
print("\nRanking by Math Score (highest is 1):")
print(students[['Student', 'Math_Score', 'Math_Rank']].sort_values('Math_Rank'))
# 2. Rank students by Physics Score, treating NaNs as last
students['Physics_Rank'] = students['Physics_Score'].rank(ascending=False, na_option='bottom')
print("\nRanking by Physics Score (highest is 1, NaNs last):")
print(students[['Student', 'Physics_Score', 'Physics_Rank']].sort_values('Physics_Rank'))
# 3. Get the overall rank (average of Math and Physics ranks)
students['Overall_Rank'] = students[['Math_Rank', 'Physics_Rank']].mean(axis=1)
students['Overall_Rank'] = students['Overall_Rank'].rank(method='min').astype(int) # Rank the average ranks
print("\nFinal DataFrame with Overall Rank:")
print(students.sort_values('Overall_Rank'))
Output:
Student Scores:
Student Math_Score Physics_Score
0 Alice 88 75
1 Bob 92 92
2 Charlie 85 80
3 David 92 85
4 Eve 78 95
5 Frank 85 NaN
Ranking by Math Score (highest is 1):
Student Math_Score Math_Rank
1 Bob 92 1.0
3 David 92 1.0
0 Alice 88 3.0
2 Charlie 85 4.0
5 Frank 85 4.0
4 Eve 78 6.0
Ranking by Physics Score (highest is 1, NaNs last):
Student Physics_Score Physics_Rank
4 Eve 95 1.0
1 Bob 92 2.0
3 David 85 3.0
2 Charlie 80 4.0
0 Alice 75 5.0
5 Frank NaN 6.0
Final DataFrame with Overall Rank:
Student Math_Score Physics_Score Math_Rank Physics_Rank Overall_Rank
1 Bob 92 92 1.0 2.0 1
3 David 92 85 1.0 3.0 2
0 Alice 88 75 3.0 5.0 3
2 Charlie 85 80 4.0 4.0 4
5 Frank 85 NaN 4.0 6.0 5
4 Eve 78 95 6.0 1.0 6
Common Pitfalls & Best Practices
- Method Choice Matters: The
methodparameter is critical.'average'is common, but'min'or'dense'might be more appropriate depending on your use case (e.g., competition rankings). Be deliberate about your choice. - Axis Confusion: Remember
axis=0is for columns andaxis=1is for rows. It's easy to mix them up. NaNHandling: Always decide what to do with missing values. The default'keep'is often safe, but in some cases, you might want to treat them as the best or worst performers.- Data Types:
rank()always returns a float, even if the ranks are whole numbers. This is to accommodate ties. You can convert the result to an integer if you're sure there are no ties (e.g.,df['rank'].astype(int)).
