• Post author:
  • Post category:Pandas
  • Post last modified:September 9, 2024
  • Reading time:16 mins read
You are currently viewing Pandas DataFrame cumprod() Method

In Pandas, the cumprod() method calculates the cumulative product of a DataFrame or Series along a specified axis. By default, it performs this operation column-wise, but you can adjust the axis to perform the calculation row-wise. The result is that each value in the specified direction is the product of all preceding values, including itself.

Advertisements

In this article, I will explain the Pandas DataFrame cumprod() method and using its syntax, parameters, and usage how we can generate a DataFrame that displays the cumulative product for each row.

Key Points –

  • The cumprod() method returns the cumulative product of DataFrame or Series elements along a specified axis.
  • By default, the operation is performed along the columns (axis=0), but it can also be applied to rows(axis=1).
  • It ignores missing (NaN) values by default with the skipna=True parameter, continuing the product from the next valid value.
  • It returns a DataFrame or Series where each value is the product of all preceding values (including itself) along the chosen axis.
  • The method is useful in scenarios where you need to track the compounded effect of values over time or across a sequence.

Pandas DataFrame cumprod() Introduction

Following is the syntax of the Pandas DataFrame cumprod() method.


# Syntax of Pandas dataframe cumprod() method
DataFrame.cumprod(axis=None, skipna=True, *args, **kwargs)

Parameters of the DataFrame cumprod()

Following are the parameters of the DataFrame cumprod() method.

  • axis – {0 or ‘index’, 1 or ‘columns’}, default 0
    • 0 or index – Cumulative product is calculated along rows.
    • 1 or columns – Cumulative product is calculated along columns.
  • skipna – bool, default True
    • If True, it skips any NA/null values in the calculations.
  • argskwargs – Optional additional arguments or keyword arguments that can be passed to the method.

Return Value

It returns a DataFrame or Series with the cumulative product of the values along the specified axis.

Usage of Pandas DataFrame cumprod() Method

The cumprod() method in Pandas computes the cumulative product of a DataFrame or Series along a specified axis. For each element, it multiplies the current value by all the previous values in the direction of the specified axis.

Now, let’s create a Pandas DataFrame using data from a dictionary, with columns labeled A, B, and C.


import pandas as pd

# Creating a sample DataFrame
data = {
    'A': [5, 8, 2, 4],
    'B': [3, 6, 9, 2],
    'C': [4, 7, 6, 5]
}

df = pd.DataFrame(data)
print("Original DataFrame:\n",df)

Yields below output.

pandas cumprod

Cumulative Product for DataFrame Columns

To compute the cumulative product for each column in a DataFrame, use the cumprod() method. By default, this method calculates the cumulative product for each column across the rows (axis=0).


# Computing the cumulative product along columns
df2 = df.cumprod()
print("Cumulative Product DataFrame:\n", df2)

Here,

  • Column A – Cumulative product: [5, 5*8, 5*8*2, 5*8*2*4] which is [5, 40, 80, 320]
  • Column B – Cumulative product: [3, 3*6, 3*6*9, 3*6*9*2] which is [3, 18, 162, 324]
  • Column C – Cumulative product: [4, 4*7, 4*7*6, 4*7*6*5] which is [4, 28, 168, 840]

Cumulative Product Along Rows

Alternatively, to calculate the cumulative product along the rows of a DataFrame, apply the cumprod() method with axis=1. This computes the cumulative product for each row across the columns.


# Computing the cumulative product along rows (axis=1)
df2 = df.cumprod(axis=1)
print("Cumulative product along rows:\n", df2)

# Output:
# Cumulative product along rows:
#    A   B    C
# 0  5  15   60
# 1  8  48  336
# 2  2  18  108
# 3  4   8   40

Here,

  • Row 0 – Cumulative product: [5, 5*3, 5*3*4][5, 15, 60]
  • Row 1 – Cumulative product: [8, 8*6, 8*6*7][8, 48, 336]
  • Row 2 – Cumulative product: [2, 2*9, 2*9*6][2, 18, 108]
  • Row 3 – Cumulative product: [4, 4*2, 4*2*5][4, 8, 40]

Handling Missing Values

Handling missing values when using the cumprod() method in Pandas depends on how you choose to deal with NaN (missing) values.

Cumulative Product with skipna=True

By default, when skipna=True in the cumprod() method, it ignores any NaN values and continues calculating the cumulative product with the next available numeric value. This allows the cumulative product to proceed without interruption by missing data.


import pandas as pd
import numpy as np

# Creating a DataFrame with missing values
data = {
    'A': [2, np.nan, 3, 4],
    'B': [1, 5, np.nan, 2],
    'C': [3, np.nan, 2, 4]
}

df = pd.DataFrame(data)

# Cumulative product with skipna=True (default)
cumprod_df = df.cumprod(skipna=True)
print("\nCumulative Product with skipna=True:\n", cumprod_df)

# Output:
# Cumulative Product with skipna=True:
#       A     B     C
# 0   2.0   1.0   3.0
# 1   NaN   5.0   NaN
# 2   6.0   NaN   6.0
# 3  24.0  10.0  24.0

Cumulative Product with skipna=False

When you use cumprod() with skipna=False, the method does not ignore NaN values. If a NaN is encountered, all subsequent results in that row or column will also be NaN.


# Computing the cumulative product with skipna=False
df2 = df.cumprod(skipna=False)
print("Cumulative Product with skipna=False:\n", df2)

# Output:
# Cumulative Product with skipna=False:
#      A    B    C
# 0  2.0  1.0  3.0
# 1  NaN  5.0  NaN
# 2  NaN  NaN  NaN
# 3  NaN  NaN  NaN

Cumulative Product for a Series

The cumulative product for a Pandas Series can be calculated using the cumprod() method, which returns a new Series where each value is the product of the current element and all preceding elements.


import pandas as pd

# Creating a Series
ser = pd.Series([2, 3, 4, 5])

# Calculating the cumulative product
result = ser.cumprod()
print("Cumulative Product of the Series:\n", result)

# Output:
# Cumulative Product of the Series:
# 0      2
# 1      6
# 2     24
# 3    120
# dtype: int64

FAQ on Pandas DataFrame cumprod() Method

What does the cumprod() method do?

The cumprod() method in Pandas calculates the cumulative product of values in a DataFrame or Series. It is helpful in scenarios where a running product of elements needs to be determined over time or across different categories.

How do I calculate the cumulative product along rows?

To compute the cumulative product for a Series, you can use the cumprod() method directly on the Series. This method multiplies each element by all previous elements, resulting in a cumulative product at each position.

How does cumprod() handle NaN values?

By default (skipna=True), cumprod() skips NaN values and continues the calculation with the next valid number. If skipna=False, any encounter with NaN will return NaN for the rest of the values along that axis.

Can I use cumprod() on mixed data types (e.g., strings and numbers)?

cumprod() only works on numeric data. If the DataFrame contains mixed data types (e.g., strings), you must first convert the non-numeric values to NaN or use pd.to_numeric(errors='coerce') to convert them to numeric types.

Can I use cumprod() with a Series?

cumprod() works on both DataFrames and Series. For a Series, the cumulative product is computed over the values in the Series.

Conclusion

In this article, I have explained the Pandas DataFrame cumprod() method by using its syntax, parameters, and usage to compute the cumulative product along a DataFrame or Series axis, resulting in a DataFrame or Series of the same dimensions with the cumulative product values.

Happy Learning!!

Reference