Of course! The pandas.DataFrame.diff() method is a powerful tool for calculating the difference between elements in a DataFrame or Series. It's extremely useful for time-series analysis, financial data, and any situation where you need to understand the change between consecutive data points.

Here's a comprehensive guide to using pandas.diff(), from the basics to advanced examples.
What is diff()?
The diff() method computes the difference of a DataFrame object with another object, in this case, by default, it computes the difference between the current element and the element from the previous row.
The formula is simple: result[i] = element[i] - element[i-1]
Basic Syntax
The diff() method can be called on a DataFrame or a Series.

# For a Series Series.diff(periods=1, axis=0) # For a DataFrame DataFrame.diff(periods=1, axis=0)
Key Parameters:
periods(int, default1): The number of positions to shift for calculating the difference.periods=1(default): Difference with the previous row.periods=2: Difference with the row two places back.periods=-1: Difference with the next row (looks forward).
axis({0 or 'index', 1 or 'columns'}, default0): The axis to take the difference along.axis=0or'index': Calculates the difference between rows (the default).axis=1or'columns': Calculates the difference between columns.
inplace(bool, defaultFalse): IfTrue, do the operation in-place and returnNone.
Examples on a Series
Let's start with a simple Series to understand the core functionality.
import pandas as pd
import numpy as np
# Create a sample Series
s = pd.Series([10, 12, 15, 14, 18, 20])
print("Original Series:")
print(s)
Original Series:
0 10
1 12
2 15
3 14
4 18
5 20
dtype: int64
Default Behavior (periods=1)
Calculates the difference from the previous element.
# Default: difference with the previous element
s_diff_default = s.diff()
print("\nDefault diff (periods=1):")
print(s_diff_default)
Output:

0 NaN # No previous element for the first item
1 2.0 # 12 - 10
2 3.0 # 15 - 12
3 -1.0 # 14 - 15
4 4.0 # 18 - 14
5 2.0 # 20 - 18
dtype: float64
Notice the first value is NaN (Not a Number) because there's no element before it to subtract from.
Using periods
You can change how many steps back to look.
# Difference with the element two places back (periods=2)
s_diff_period2 = s.diff(periods=2)
print("\nDiff with periods=2:")
print(s_diff_period2)
# Difference with the next element (periods=-1)
s_diff_next = s.diff(periods=-1)
print("\nDiff with periods=-1 (looking forward):")
print(s_diff_next)
Output:
# Diff with periods=2:
0 NaN # Not enough history
1 NaN # Not enough history
2 5.0 # 15 - 10
3 2.0 # 14 - 12
4 3.0 # 18 - 15
5 4.0 # 20 - 14
dtype: float64
# Diff with periods=-1 (looking forward):
0 -2.0 # 10 - 12
1 -3.0 # 12 - 15
2 1.0 # 15 - 14
3 -4.0 # 14 - 18
4 -2.0 # 18 - 20
5 NaN # No next element
dtype: float64
Examples on a DataFrame
diff() is even more useful on DataFrames. You can apply the difference operation either row-wise (axis=0) or column-wise (axis=1).
# Create a sample DataFrame
data = {'A': [100, 102, 105, 107, 110],
'B': [5, 7, 6, 8, 9],
'C': [50, 52, 51, 53, 55]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Original DataFrame:
A B C
0 100 5 50
1 102 7 52
2 105 6 51
3 107 8 53
4 110 9 55
Row-wise Difference (axis=0, the default)
This is the most common use case. It calculates the difference for each column between consecutive rows.
# Difference between rows for each column
df_diff_rows = df.diff()
print("\nDataFrame diff (axis=0):")
print(df_diff_rows)
Output:
# DataFrame diff (axis=0):
A B C
0 NaN NaN NaN
1 2.0 2.0 2.0
2 3.0 -1.0 -1.0
3 2.0 2.0 2.0
4 3.0 1.0 2.0
Each cell (i, j) contains the value df[i, j] - df[i-1, j].
Column-wise Difference (axis=1)
This calculates the difference between columns for each row.
# Difference between columns for each row
df_diff_cols = df.diff(axis=1)
print("\nDataFrame diff (axis=1):")
print(df_diff_cols)
Output:
# DataFrame diff (axis=1):
A B C
0 NaN -95.0 -45.0
1 NaN -95.0 -45.0
2 NaN -99.0 -45.0
3 NaN -99.0 -45.0
4 NaN -101.0 -46.0
Each cell (i, j) contains the value df[i, j] - df[i, j-1]. The first column is NaN because there's no preceding column.
Handling Missing Data (NaN)
diff() propagates NaN values. If a value in the original data is NaN, the difference calculation for the next row will also be NaN.
# Create a DataFrame with a missing value
df_nan = pd.DataFrame({'A': [10, 12, np.nan, 18, 20]})
print("\nDataFrame with NaN:")
print(df_nan)
df_nan_diff = df_nan.diff()
print("\nDiff of DataFrame with NaN:")
print(df_nan_diff)
Output:
# DataFrame with NaN:
A
0 10.0
1 12.0
2 NaN
3 18.0
4 20.0
# Diff of DataFrame with NaN:
A
0 NaN
1 2.0
2 NaN # 12.0 - NaN = NaN
3 NaN # 18.0 - NaN = NaN
4 2.0
Practical Use Cases
Use Case 1: Time-Series Analysis (Daily Price Change)
This is a classic application. Imagine you have daily stock prices.
import pandas as pd
# Create a time-series DataFrame
dates = pd.to_datetime(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04', '2025-01-05'])
prices = {'Open': [150, 152, 151, 155, 160],
'Close': [152, 151, 155, 158, 162]}
df_prices = pd.DataFrame(prices, index=dates)
print("Daily Stock Prices:")
print(df_prices)
# Calculate the daily price change
df_prices['Daily_Change'] = df_prices['Close'].diff()
print("\nDaily Price Change:")
print(df_prices)
Output:
Daily Stock Prices:
Open Close
2025-01-01 150 152
2025-01-02 152 151
2025-01-03 151 155
2025-01-04 155 158
2025-01-05 160 162
Daily Price Change:
Open Close Daily_Change
2025-01-01 150 152 NaN
2025-01-02 152 151 -1.0
2025-01-03 151 155 4.0
2025-01-04 155 158 3.0
2025-01-05 160 162 4.0
Use Case 2: Calculating Percentage Change
While pandas has a dedicated .pct_change() method, you can also calculate it using diff() and shift().
# pct_change is simply the difference divided by the previous value
df_prices['Pct_Change_manual'] = df_prices['Close'].diff() / df_prices['Close'].shift(1)
# For comparison, let's use the built-in method
df_prices['Pct_Change_builtin'] = df_prices['Close'].pct_change()
print("\nPercentage Change Calculation:")
print(df_prices[['Close', 'Pct_Change_manual', 'Pct_Change_builtin']])
Output:
Percentage Change Calculation:
Close Pct_Change_manual Pct_Change_builtin
2025-01-01 152 NaN NaN
2025-01-02 151 -0.006578 -0.006578
2025-01-03 155 0.026490 0.026490
2025-01-04 158 0.019355 0.019355
2025-01-05 162 0.025316 0.025316
Notice how shift(1) is used to get the previous value to divide by. df['Close'].shift(1) is essentially the same as the result of df['Close'].diff() but without the subtraction.
Summary
| Feature | Description | Example |
|---|---|---|
| Core Function | Calculates the difference between elements. | df['col'].diff() |
periods |
Controls how many steps to shift for the difference. | df.diff(periods=2) |
axis |
Controls direction of difference (rows or columns). | df.diff(axis=1) |
NaN Handling |
Propagates NaN values forward. |
diff() on a series with NaN |
| Common Use | Time-series analysis, financial calculations, change detection. | stock_prices['Close'].diff() |
