Pandas Series.diff() Function

In Pandas, the diff() function is used to compute the difference between consecutive elements in a Series. This can be useful for calculating the difference between consecutive time periods or identifying changes in data over time.

Syntax of Pandas Series.diff() Function

Following is the syntax of the pandas Series.diff() function.


# Syntax of the Series.diff() function
Series.diff(periods=1)

Parameters of the Series.diff()

Following are the parameters of the Series.diff() function.

periods – An integer indicating the number of periods to shift for calculating the difference. Default is 1. It can be positive (for forward shifting) or negative (for backward shifting).

Return Value

It returns a new Series containing the differences between consecutive elements. The first element will be NaN if periods is 1, and the resulting Series will have the same length as the original Series.

Using Series.diff() to get the Difference between Consecutive Elements

To compute the difference between consecutive elements in a given series, you can use the diff() function in pandas. Here’s an example of how you can compute the difference between consecutive elements in a pandas Series.

First, let’s create a Pandas Series from a list.


import pandas as pd

# Create a sample Series
series = pd.Series([10, 15, 20, 25, 30])
print("Original Series:\n",series)

Yields below output.

Now, let’s use the diff() function.


# Compute the difference between consecutive elements
diff_series = series.diff()
print("Difference between consecutive elements:\n",diff_series)

In the above example, diff_series contains the differences between consecutive elements of the original Series. The first element is NaN because there is no previous element to compute the difference with. Subsequent elements represent the differences between each element and its previous element. This example yields the below output.

Difference Between Elements Shifted by 2 Periods

Alternatively, you can specify the number of periods to shift for calculating the difference using the periods parameter.


# Calculate the difference between elements shifted by 2 periods
diff_series_shifted = series.diff(periods=2)
print("Difference between elements shifted by 2 periods:\n",diff_series_shifted)

# Output:
# Difference between elements shifted by 2 periods:
# 0     NaN
#1     NaN
#2    10.0
#3    10.0
#4    10.0
#dtype: float64

In the above example, diff_series_shifted contains the differences between elements shifted by 2 periods. The first two elements are NaN because there are not enough previous elements to compute the difference with. Subsequent elements represent the differences between each element and the element two periods prior.

Difference Between Elements Shifted by -1 Periods

To calculate the difference between elements shifted by -1 periods in a pandas Series, you can use the diff() function with a negative value for the periods parameter.


# Calculate the difference between elements shifted by -1 period
result = series.diff(periods=-1)
print("Difference between elements shifted by -1 periods:\n",result)

# Output:
# Difference between elements shifted by -1 periods:
# 0   -5.0
#1   -5.0
#2   -5.0
#3   -5.0
#4    NaN
#dtype: float64

In this example, result contains the differences between elements shifted by -1 period (looking back). The last element is NaN because there is no subsequent element to compute the difference with. Subsequent elements represent the differences between each element and the element one period ahead (previous element).

Compute the Difference Between Consecutive Dates

To compute the difference between consecutive dates in a given series, you can use the diff() function in pandas with a series of datetime objects.


import pandas as pd

# Create a sample series of dates
dates = pd.Series(pd.date_range(start='2022-01-01', periods=5))

# Compute the difference between consecutive dates
diff_dates = dates.diff()
print("Difference between consecutive dates:\n",diff_dates)

# Output:
# Difference between consecutive dates:
# 0      NaT
#1   1 days
#2   1 days
#3   1 days
#4   1 days
#dtype: timedelta64[ns]

In the above example, the resulting series contains the differences between consecutive dates. The first value is NaT (Not a Time) because there is no preceding date to compute the difference with. The subsequent values represent the differences between consecutive dates, measured in days.

Compute the Difference Between Consecutive Boolean Values

Similarly, to compute the difference between consecutive boolean values in a pandas Series, you can use the diff() function.


import pandas as pd

# Create a sample series of boolean values
series = pd.Series([True, False, True, True, False])

# Compute the difference between consecutive boolean values
result = series.diff()
print("Difference between consecutive boolean values:\n",result)

# Output:
# Difference between consecutive boolean values:
# 0      NaN
#1     True
#2     True
#3    False
#4     True
#dtype: object

In the above example, the resulting series contains the differences between consecutive boolean values. The first value is NaN because there is no preceding element to compute the difference with. The subsequent values represent the differences between consecutive boolean values.

Compute the Difference Between Consecutive Elements with Missing Values

You can also compute the difference between consecutive elements with missing values in a pandas Series, you can simply use the diff() method.


import pandas as pd
import numpy as np

# Create a sample series with missing values
series = pd.Series([1, np.nan, 3, 5, np.nan])

# Compute the difference between consecutive elements 
# With missing values
diff_series = series.diff()
print(diff_series)

# Output:
# 0    NaN
#1    NaN
#2    NaN
#3    2.0
#4    NaN
#dtype: float64

In the above example, the resulting series contains the differences between consecutive elements. The differences are computed as usual, but if any of the elements are missing (NaN), the corresponding difference is also NaN.

Frequently Asked Questions on Pandas Series.diff() Function

What does the pandas Series.diff() function do?

The diff() function in pandas Series computes the difference between consecutive elements in the Series. It provides insight into the rate of change or increments between adjacent values.

How does the diff() function handle missing values?

The first element of the resulting Series is typically NaN because there’s no preceding element to compute the difference with. Subsequent elements represent the differences between each element and its previous element according to the specified shift period.

Can I specify the number of periods to shift for computing the difference?

The periods parameter allows you to specify the number of periods to shift for calculating the difference. Positive values indicate forward shifting (looking ahead), while negative values indicate backward shifting (looking back).

In what scenarios is the diff() function commonly used?

The diff() function is commonly used in time series analysis, financial data analysis, signal processing, and data preprocessing tasks. It helps identify trends, seasonality, and anomalies by quantifying changes between consecutive data points.

Does the resulting Series maintain the same index as the original Series?

The resulting Series maintains the same index as the original Series, ensuring proper alignment of data. This allows for easy comparison and analysis of differences between corresponding elements.

Conclusion

In this article, I have explained the diff() function in Pandas; It is used to compute the difference between consecutive elements in a Series. It calculates the difference between each element and the element preceding it. The resulting Series will have one fewer element than the original Series.

Happy Learning!!

References

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html