• Post author:
  • Post category:Pandas
  • Post last modified:August 15, 2024
  • Reading time:18 mins read
You are currently viewing Pandas DataFrame diff() Method

In Pandas, the diff() method is used to calculate the difference between two successive rows or columns in a DataFrame. It is useful for finding changes or variations between data points over a specified axis.

Advertisements

In this article, I will explain the pandas DataFrame diff() method by using its syntax, parameters, and usage, we can generate a DataFrame or Series of the same shape with the calculated differences. The first row or column (depending on the axis) will have NaN values due to the absence of preceding elements for comparison.

Key Points –

  • The diff() method calculates the difference between consecutive elements in a DataFrame or Series, useful for identifying changes over a specified interval.
  • Returns a DataFrame or Series of the same shape with the differences calculated, where the first periods rows or columns will be NaN due to the lack of preceding elements.
  • Resulting NaN values can be handled using methods like fillna() or dropna() to clean the DataFrame after computing differences.
  • Commonly used in time series analysis to compute changes over time, such as stock price movements or temperature changes.

Syntax of Pandas DataFrame diff() Method

Following is the syntax of the pandas DataFrame diff() method.


# Syntax of the DataFrame diff() method
DataFrame.diff(periods=1, axis=0)

Parameters of the DataFrame diff()

Following are the parameters of the DataFrame diff() method.

  • periods – (int, default 1) – The number of periods to shift for calculating the difference.
  • axis – (int, default 0) – The axis along which to calculate the difference. 0 for row-wise, 1 for column-wise.

Return Value

It returns a DataFrame of the same shape as the input, containing the calculated differences.

Usage of Pandas DataFrame diff() Method

The diff() method in Pandas is used to compute the difference between successive rows or columns in a DataFrame.

To run some examples of the Pandas DataFrame diff() method, let’s create a Pandas DataFrame using data from a dictionary.


import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [5, 8, 13, 15, 19],
    'B': [2, 6, 9, 12, 16]
})

print("Original DataFrame:\n",df)

Yields below output.

pandas diff

For default usage of the diff() method, which computes the row-wise difference in a DataFrame. By default, diff() calculates the difference between the current row and the previous row within each column. This is done along axis=0, which means the differences are computed row-wise.


# Compute row-wise differences
df2 = df.diff()
print("Row-wise difference (default):\n", df2)

Here,

  • Row 0 – The result is NaN because there is no previous row to compute the difference from.
  • Row 1 – The difference is 8 - 5 = 3 for column A and 6 - 2 = 4 for column B.
  • Row 2 – The difference is 13 - 8 = 5 for column A and 9 - 6 = 3 for column B.
  • Row 3 – The difference is 15 - 13 = 2 for column A and 12 - 9 = 3 for column B.
  • Row 4 – The difference is 19 - 15 = 4 for column A and 16 - 12 = 3 for column B.
pandas diff

Specifying the Number of Periods

Alternatively, to specify the number of periods for calculating differences in a DataFrame, you can use the periods parameter in the diff() method. This allows you to calculate the difference between the current row and the row that is a specified number of periods before it.


# Compute differences with periods=2
df2 = df.diff(periods=2)
print("Row-wise difference with periods=2:\n", df2)

# Output:
# Row-wise difference with periods=2:
#      A    B
# 0  NaN  NaN
# 1  NaN  NaN
# 2  8.0  7.0
# 3  7.0  6.0
# 4  6.0  7.0

Here,

  • Row 0 and Row 1 – The result is NaN because there are not enough preceding rows to compute the difference for periods=2.
  • Row 2 – The difference is 13 - 5 = 8 for column A and 9 - 2 = 7 for column B.
  • Row 3 – The difference is 15 - 8 = 7 for column A and 12 - 6 = 6 for column B.
  • Row 4 – The difference is 19 - 13 = 6 for column A and 16 - 9 = 7 for column B.

Column-wise Difference

To calculate the column-wise difference in a DataFrame, you can use the diff() method with the axis=1 parameter. This will compute the difference between each element and the previous element in the same row across columns.


# Compute column-wise differences
df2 = df.diff(axis=1)
print("Column-wise difference:\n", df2)

# Output:
# Column-wise difference:
#     A    B
# 0 NaN -3.0
# 1 NaN -2.0
# 2 NaN -4.0
# 3 NaN -3.0
# 4 NaN -3.0

Here,

  • Column A – The result is NaN for all rows because there is no preceding column to compute the difference from.
  • Column BRow 0 – The difference is 2-5 = -3. Row 1 – The difference is 6-8 = -2. Row 2 – The difference is 9-13 = -4. Row 3 -The difference is 12-15 = -3. Row 4 -The difference is 16-19 = -3.

Handling NaN Values

Similarly, when working with the diff() method, NaN (Not a Number) values can appear in several scenarios, such as at the beginning of the DataFrame where there are no preceding values to compute the difference, or if the DataFrame already contains NaN values.


# Compute row-wise differences
df2 = df_with_nan.diff()
print("Row-wise difference (default handling of NaN values):\n", df2)

# Output:
# Row-wise difference (default handling of NaN values):
#      A    B
# 0  NaN  NaN
# 1  3.0  4.0
# 2  NaN  3.0
# 3  NaN  NaN
# 4  4.0  NaN

Here,

  • The differences are calculated where possible.
  • NaN values propagate through the result.

Filling NaN Values Before Applying diff()

You can fill NaN values with a specific value before computing differences using the fillna() method.


# Fill NaN values with 0 and then compute differences
df2 = df_with_nan.fillna(0).diff()
print("Row-wise difference after filling NaN values with 0:\n", df2)

# Output:
# Row-wise difference after filling NaN values with 0:
#        A     B
# 0   NaN   NaN
# 1   3.0   4.0
# 2  -8.0   3.0
# 3  15.0  -9.0
# 4   4.0  16.0

Here,

  • NaN values are filled with 0 before computing differences.
  • This approach can be useful if you have a specific value that makes sense for your context.

Dropping NaN Values Before Applying diff()

You can drop rows with NaN values before computing differences using the dropna() method.


# Drop NaN values and then compute differences
df2 = df_with_nan.dropna().diff()
print("Row-wise difference after dropping rows with NaN values:\n", df2)

# Output:
# Row-wise difference after dropping rows with NaN values:
#       A     B
# 0   NaN   NaN
# 1   3.0   4.0
# 4  11.0  10.0

Here,

  • Rows containing NaN values are dropped before computing differences.
  • This approach ensures that differences are only computed on rows with valid data.

Forward Filling NaN Values

You can forward-fill NaN values using the ffill() method before applying diff().


# Forward fill NaN values and then compute differences
df2 = df_with_nan.ffill().diff()
print("Row-wise difference after forward filling NaN values:\n", df2)

# Output:
# Row-wise difference after forward filling NaN values:
#      A    B
# 0  NaN  NaN
# 1  3.0  4.0
# 2  0.0  3.0
# 3  7.0  0.0
# 4  4.0  7.0

Here,

  • NaN values are forward-filled with the previous valid value before computing differences.
  • This method is useful when the missing values should be replaced by the preceding values.

Frequently Asked Questions on Pandas DataFrame diff() Method

What is the diff() method in Pandas?

The diff() method in Pandas is used to compute the difference between consecutive elements in a DataFrame or Series. By default, it calculates the difference between the current and the previous row.

What is the default behavior of the diff() method?

By default, diff() calculates the difference between each element and the previous element in the same column (row-wise difference) with a period of 1.

How do I specify the number of periods for the diff() method?

You can specify the number of periods by using the periods parameter. For example, df.diff(periods=2) will compute the difference between the current row and the row two periods before.

How can I fill or drop NaN values when using the diff() method?

You can handle NaN values before or after applying diff() using methods like fillna(), dropna(), ffill() (forward fill), and bfill() (backward fill).

Can the diff() method be used to calculate column-wise differences?

By setting the axis parameter to 1 (i.e., df.diff(axis=1)), you can compute the difference between each element and the previous element in the same row (column-wise difference).

Conclusion

In conclusion, the pandas.DataFrame.diff() method is a powerful tool for calculating differences between consecutive elements in a DataFrame or Series. This functionality is particularly useful in time series analysis, identifying trends, and detecting changes over specified periods. By understanding its syntax, parameters, and various applications, you can effectively leverage this method to gain deeper insights into your data.

Happy Learning!!

References