In Pandas, the `diff()`

method is used to calculate the difference between two successive rows or columns in a DataFrame. It is useful for finding changes or variations between data points over a specified axis.

In this article, I will explain the pandas DataFrame `diff()`

method by using its syntax, parameters, and usage, we can generate a DataFrame or Series of the same shape with the calculated differences. The first row or column (depending on the axis) will have NaN values due to the absence of preceding elements for comparison.

**Key Points –**

- The
`diff()`

method calculates the difference between consecutive elements in a DataFrame or Series, useful for identifying changes over a specified interval. - Returns a DataFrame or Series of the same shape with the differences calculated, where the first
`periods`

rows or columns will be`NaN`

due to the lack of preceding elements. - Resulting NaN values can be handled using methods like
`fillna()`

or`dropna()`

to clean the DataFrame after computing differences. - Commonly used in time series analysis to compute changes over time, such as stock price movements or temperature changes.

## Syntax of Pandas DataFrame diff() Method

Following is the syntax of the pandas DataFrame diff() method.

```
# Syntax of the DataFrame diff() method
DataFrame.diff(periods=1, axis=0)
```

### Parameters of the DataFrame diff()

Following are the parameters of the DataFrame diff() method.

`periods`

– (int, default 1) – The number of periods to shift for calculating the difference.`axis`

– (int, default 0) – The axis along which to calculate the difference.`0`

for row-wise,`1`

for column-wise.

### Return Value

It returns a DataFrame of the same shape as the input, containing the calculated differences.

## Usage of Pandas DataFrame diff() Method

The `diff()`

method in Pandas is used to compute the difference between successive rows or columns in a DataFrame.

To run some examples of the Pandas DataFrame diff() method, let’s create a Pandas DataFrame using data from a dictionary.

```
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [5, 8, 13, 15, 19],
'B': [2, 6, 9, 12, 16]
})
print("Original DataFrame:\n",df)
```

Yields below output.

For default usage of the `diff()`

method, which computes the row-wise difference in a DataFrame. By default, `diff()`

calculates the difference between the current row and the previous row within each column. This is done along `axis=0`

, which means the differences are computed row-wise.

```
# Compute row-wise differences
df2 = df.diff()
print("Row-wise difference (default):\n", df2)
```

Here,

`Row 0`

– The result is`NaN`

because there is no previous row to compute the difference from.`Row 1`

– The difference is`8 - 5 = 3`

for column`A`

and`6 - 2 = 4`

for column`B`

.`Row 2`

– The difference is`13 - 8 = 5`

for column`A`

and`9 - 6 = 3`

for column`B`

.`Row 3`

– The difference is`15 - 13 = 2`

for column`A`

and`12 - 9 = 3`

for column`B`

.`Row 4`

– The difference is`19 - 15 = 4`

for column`A`

and`16 - 12 = 3`

for column`B`

.

## Specifying the Number of Periods

Alternatively, to specify the number of periods for calculating differences in a DataFrame, you can use the `periods`

parameter in the `diff()`

method. This allows you to calculate the difference between the current row and the row that is a specified number of periods before it.

```
# Compute differences with periods=2
df2 = df.diff(periods=2)
print("Row-wise difference with periods=2:\n", df2)
# Output:
# Row-wise difference with periods=2:
# A B
# 0 NaN NaN
# 1 NaN NaN
# 2 8.0 7.0
# 3 7.0 6.0
# 4 6.0 7.0
```

Here,

`Row 0`

and`Row 1`

– The result is`NaN`

because there are not enough preceding rows to compute the difference for`periods=2`

.`Row 2`

– The difference is`13 - 5 = 8`

for column`A`

and`9 - 2 = 7`

for column`B`

.`Row 3`

– The difference is`15 - 8 = 7`

for column`A`

and`12 - 6 = 6`

for column`B`

.`Row 4`

– The difference is`19 - 13 = 6`

for column A and`16 - 9 = 7`

for column`B`

.

## Column-wise Difference

To calculate the column-wise difference in a DataFrame, you can use the `diff()`

method with the `axis=1`

parameter. This will compute the difference between each element and the previous element in the same row across columns.

```
# Compute column-wise differences
df2 = df.diff(axis=1)
print("Column-wise difference:\n", df2)
# Output:
# Column-wise difference:
# A B
# 0 NaN -3.0
# 1 NaN -2.0
# 2 NaN -4.0
# 3 NaN -3.0
# 4 NaN -3.0
```

Here,

`Column A`

– The result is`NaN`

for all rows because there is no preceding column to compute the difference from.`Column B`

–**Row 0**– The difference is`2-5 = -3`

.**Row 1**– The difference is`6-8 = -2`

.**Row 2**– The difference is`9-13 = -4`

.**Row 3**-The difference is`12-15 = -3`

.**Row 4**-The difference is`16-19 = -3`

.

## Handling NaN Values

Similarly, when working with the `diff()`

method, NaN (Not a Number) values can appear in several scenarios, such as at the beginning of the DataFrame where there are no preceding values to compute the difference, or if the DataFrame already contains NaN values.

```
# Compute row-wise differences
df2 = df_with_nan.diff()
print("Row-wise difference (default handling of NaN values):\n", df2)
# Output:
# Row-wise difference (default handling of NaN values):
# A B
# 0 NaN NaN
# 1 3.0 4.0
# 2 NaN 3.0
# 3 NaN NaN
# 4 4.0 NaN
```

Here,

- The differences are calculated where possible.
- NaN values propagate through the result.

### Filling NaN Values Before Applying diff()

You can fill NaN values with a specific value before computing differences using the `fillna()`

method.

```
# Fill NaN values with 0 and then compute differences
df2 = df_with_nan.fillna(0).diff()
print("Row-wise difference after filling NaN values with 0:\n", df2)
# Output:
# Row-wise difference after filling NaN values with 0:
# A B
# 0 NaN NaN
# 1 3.0 4.0
# 2 -8.0 3.0
# 3 15.0 -9.0
# 4 4.0 16.0
```

Here,

- NaN values are filled with 0 before computing differences.
- This approach can be useful if you have a specific value that makes sense for your context.

### Dropping NaN Values Before Applying diff()

You can drop rows with NaN values before computing differences using the `dropna()`

method.

```
# Drop NaN values and then compute differences
df2 = df_with_nan.dropna().diff()
print("Row-wise difference after dropping rows with NaN values:\n", df2)
# Output:
# Row-wise difference after dropping rows with NaN values:
# A B
# 0 NaN NaN
# 1 3.0 4.0
# 4 11.0 10.0
```

Here,

- Rows containing NaN values are dropped before computing differences.
- This approach ensures that differences are only computed on rows with valid data.

### Forward Filling NaN Values

You can forward-fill NaN values using the `ffill()`

method before applying `diff()`

.

```
# Forward fill NaN values and then compute differences
df2 = df_with_nan.ffill().diff()
print("Row-wise difference after forward filling NaN values:\n", df2)
# Output:
# Row-wise difference after forward filling NaN values:
# A B
# 0 NaN NaN
# 1 3.0 4.0
# 2 0.0 3.0
# 3 7.0 0.0
# 4 4.0 7.0
```

Here,

- NaN values are forward-filled with the previous valid value before computing differences.
- This method is useful when the missing values should be replaced by the preceding values.

## Frequently Asked Questions on Pandas DataFrame diff() Method

**What is the diff() method in Pandas?**

The `diff()`

method in Pandas is used to compute the difference between consecutive elements in a DataFrame or Series. By default, it calculates the difference between the current and the previous row.

**What is the default behavior of the diff() method?**

By default, `diff()`

calculates the difference between each element and the previous element in the same column (row-wise difference) with a period of 1.

**How do I specify the number of periods for the diff() method?**

You can specify the number of periods by using the `periods`

parameter. For example, `df.diff(periods=2)`

will compute the difference between the current row and the row two periods before.

**How can I fill or drop NaN values when using the diff() method?**

You can handle NaN values before or after applying `diff()`

using methods like `fillna()`

, `dropna()`

, `ffill()`

(forward fill), and `bfill()`

(backward fill).

**Can the diff() method be used to calculate column-wise differences?**

By setting the `axis`

parameter to `1`

(i.e., `df.diff(axis=1)`

), you can compute the difference between each element and the previous element in the same row (column-wise difference).

## Conclusion

In conclusion, the `pandas.DataFrame.diff()`

method is a powerful tool for calculating differences between consecutive elements in a DataFrame or Series. This functionality is particularly useful in time series analysis, identifying trends, and detecting changes over specified periods. By understanding its syntax, parameters, and various applications, you can effectively leverage this method to gain deeper insights into your data.

Happy Learning!!

## Related Articles

- Pandas DataFrame cov() Method
- Pandas DataFrame mode() Method
- Pandas DataFrame mad() Method
- Pandas DataFrame copy() Function
- Pandas DataFrame corrwith() Method
- Pandas DataFrame product() Method
- Pandas DataFrame rank() Method
- Pandas DataFrame mask() Method
- Pandas DataFrame corr() Method
- Pandas DataFrame equals() Method