In Pandas, the cumprod()
method calculates the cumulative product of a DataFrame or Series along a specified axis. By default, it performs this operation column-wise, but you can adjust the axis to perform the calculation row-wise. The result is that each value in the specified direction is the product of all preceding values, including itself.
In this article, I will explain the Pandas DataFrame cumprod()
method and using its syntax, parameters, and usage how we can generate a DataFrame that displays the cumulative product for each row.
Key Points –
- The
cumprod()
method returns the cumulative product of DataFrame or Series elements along a specified axis. - By default, the operation is performed along the columns (
axis=0
), but it can also be applied to rows(axis=1
). - It ignores missing (
NaN
) values by default with theskipna=True
parameter, continuing the product from the next valid value. - It returns a DataFrame or Series where each value is the product of all preceding values (including itself) along the chosen axis.
- The method is useful in scenarios where you need to track the compounded effect of values over time or across a sequence.
Pandas DataFrame cumprod() Introduction
Following is the syntax of the Pandas DataFrame cumprod() method.
# Syntax of Pandas dataframe cumprod() method
DataFrame.cumprod(axis=None, skipna=True, *args, **kwargs)
Parameters of the DataFrame cumprod()
Following are the parameters of the DataFrame cumprod() method.
axis
– {0 or ‘index’, 1 or ‘columns’}, default 00
orindex
– Cumulative product is calculated along rows.1
orcolumns
– Cumulative product is calculated along columns.
skipna
– bool, default True- If True, it skips any NA/null values in the calculations.
args
,kwargs
– Optional additional arguments or keyword arguments that can be passed to the method.
Return Value
It returns a DataFrame or Series with the cumulative product of the values along the specified axis.
Usage of Pandas DataFrame cumprod() Method
The cumprod()
method in Pandas computes the cumulative product of a DataFrame or Series along a specified axis. For each element, it multiplies the current value by all the previous values in the direction of the specified axis.
Now, let’s create a Pandas DataFrame using data from a dictionary, with columns labeled A
, B
, and C
.
import pandas as pd
# Creating a sample DataFrame
data = {
'A': [5, 8, 2, 4],
'B': [3, 6, 9, 2],
'C': [4, 7, 6, 5]
}
df = pd.DataFrame(data)
print("Original DataFrame:\n",df)
Yields below output.
Cumulative Product for DataFrame Columns
To compute the cumulative product for each column in a DataFrame, use the cumprod()
method. By default, this method calculates the cumulative product for each column across the rows (axis=0
).
# Computing the cumulative product along columns
df2 = df.cumprod()
print("Cumulative Product DataFrame:\n", df2)
Here,
Column A
– Cumulative product:[5, 5*8, 5*8*2, 5*8*2*4]
which is[5, 40, 80, 320]
Column B
– Cumulative product:[3, 3*6, 3*6*9, 3*6*9*2]
which is[3, 18, 162, 324]
Column C
– Cumulative product:[4, 4*7, 4*7*6, 4*7*6*5]
which is[4, 28, 168, 840]
Cumulative Product Along Rows
Alternatively, to calculate the cumulative product along the rows of a DataFrame, apply the cumprod()
method with axis=1
. This computes the cumulative product for each row across the columns.
# Computing the cumulative product along rows (axis=1)
df2 = df.cumprod(axis=1)
print("Cumulative product along rows:\n", df2)
# Output:
# Cumulative product along rows:
# A B C
# 0 5 15 60
# 1 8 48 336
# 2 2 18 108
# 3 4 8 40
Here,
Row 0
– Cumulative product:[5, 5*3, 5*3*4]
→[5, 15, 60]
Row 1
– Cumulative product:[8, 8*6, 8*6*7]
→[8, 48, 336]
Row 2
– Cumulative product:[2, 2*9, 2*9*6]
→[2, 18, 108]
Row 3
– Cumulative product:[4, 4*2, 4*2*5]
→[4, 8, 40]
Handling Missing Values
Handling missing values when using the cumprod()
method in Pandas depends on how you choose to deal with NaN
(missing) values.
Cumulative Product with skipna=True
By default, when skipna=True
in the cumprod()
method, it ignores any NaN values and continues calculating the cumulative product with the next available numeric value. This allows the cumulative product to proceed without interruption by missing data.
import pandas as pd
import numpy as np
# Creating a DataFrame with missing values
data = {
'A': [2, np.nan, 3, 4],
'B': [1, 5, np.nan, 2],
'C': [3, np.nan, 2, 4]
}
df = pd.DataFrame(data)
# Cumulative product with skipna=True (default)
cumprod_df = df.cumprod(skipna=True)
print("\nCumulative Product with skipna=True:\n", cumprod_df)
# Output:
# Cumulative Product with skipna=True:
# A B C
# 0 2.0 1.0 3.0
# 1 NaN 5.0 NaN
# 2 6.0 NaN 6.0
# 3 24.0 10.0 24.0
Cumulative Product with skipna=False
When you use cumprod()
with skipna=False
, the method does not ignore NaN
values. If a NaN
is encountered, all subsequent results in that row or column will also be NaN
.
# Computing the cumulative product with skipna=False
df2 = df.cumprod(skipna=False)
print("Cumulative Product with skipna=False:\n", df2)
# Output:
# Cumulative Product with skipna=False:
# A B C
# 0 2.0 1.0 3.0
# 1 NaN 5.0 NaN
# 2 NaN NaN NaN
# 3 NaN NaN NaN
Cumulative Product for a Series
The cumulative product for a Pandas Series can be calculated using the cumprod()
method, which returns a new Series where each value is the product of the current element and all preceding elements.
import pandas as pd
# Creating a Series
ser = pd.Series([2, 3, 4, 5])
# Calculating the cumulative product
result = ser.cumprod()
print("Cumulative Product of the Series:\n", result)
# Output:
# Cumulative Product of the Series:
# 0 2
# 1 6
# 2 24
# 3 120
# dtype: int64
FAQ on Pandas DataFrame cumprod() Method
The cumprod()
method in Pandas calculates the cumulative product of values in a DataFrame or Series. It is helpful in scenarios where a running product of elements needs to be determined over time or across different categories.
To compute the cumulative product for a Series, you can use the cumprod()
method directly on the Series. This method multiplies each element by all previous elements, resulting in a cumulative product at each position.
By default (skipna=True
), cumprod()
skips NaN
values and continues the calculation with the next valid number. If skipna=False
, any encounter with NaN
will return NaN
for the rest of the values along that axis.
cumprod()
only works on numeric data. If the DataFrame contains mixed data types (e.g., strings), you must first convert the non-numeric values to NaN
or use pd.to_numeric(errors='coerce')
to convert them to numeric types.
cumprod()
works on both DataFrames and Series. For a Series, the cumulative product is computed over the values in the Series.
Conclusion
In this article, I have explained the Pandas DataFrame cumprod()
method by using its syntax, parameters, and usage to compute the cumulative product along a DataFrame or Series axis, resulting in a DataFrame or Series of the same dimensions with the cumulative product values.
Happy Learning!!
Related Articles
- Pandas DataFrame std() Method
- Pandas DataFrame min() Method
- Pandas DataFrame corrwith() Method
- Pandas DataFrame eval() Function
- Pandas DataFrame bfill() Method
- Pandas DataFrame cov() Method
- Pandas DataFrame ffill() Method
- Pandas DataFrame max() Function
- Pandas DataFrame any() Method
- Pandas DataFrame rank() Method
- Pandas DataFrame round() Method
- Pandas DataFrame cumsum() Method