In Pandas, the cumsum()
method is used to compute the cumulative sum of a DataFrame or Series along with a specified axis. This means that for each element in the DataFrame or Series, the cumsum() will return the sum of all previous elements in the specified direction (row-wise or column-wise).
In this article, I will explain the Pandas DataFrame cumsum()
method and by using its syntax, parameters, and usage how we can generate an object of the same shape as the original DataFrame or Series with cumulative sums calculated.
Key Points –
- The
cumsum()
method calculates the cumulative sum of DataFrame or Series elements based on a particular axis. - The method allows cumulative sum calculations either column-wise (
axis=0
, the default) or row-wise (axis=1
). - The
skipna
parameter (defaultTrue
) determines whether to ignore NA/null values during computation. - The output is a DataFrame or Series with the same dimensions and index/column labels as the input, containing cumulative sums.
- The
cumsum()
method does not modify the original data structure; it returns a new DataFrame or Series with cumulative sums.
Syntax of Pandas DataFrame cumsum() Method
Let’s know the syntax of the cumsum() method.
# Syntax of DataFrame cumsum() method
DataFrame.cumsum(axis=None, skipna=True, *args, **kwargs)
Parameters of the DataFrame cumsum()
Following are the parameters of the DataFrame cumsum() method.
axis
– {0 or ‘index’, 1 or ‘columns’}, default 0. Determines the axis along which to compute the cumulative sum.0
or'index'
: Compute column-wise (down each column).1
or'columns'
: Compute row-wise (across each row).
skipna
– bool, defaultTrue
. IfTrue
, it skipsNaN
values during computation. IfFalse
,NaN
values propagate in the cumulative sum.args
,kwargs
– Additional arguments for compatibility; are not used.
Return Value
it returns an object of the same size, containing the cumulative sums along the specified axis.
Usage of Pandas DataFrame cumsum() Method
The cumsum()
method in Pandas calculates the cumulative sum of elements in a DataFrame or Series along a specified axis. This means that each element in the resulting DataFrame or Series represents the sum of all preceding elements, including the current one, either row-wise or column-wise.
To run some examples of pandas DataFrame cumsum() method, let’s create a Pandas DataFrame using data from a dictionary.
import pandas as pd
# Creating a sample DataFrame
data = {
'A': [15, 38, 12, 24],
'B': [52, 31, 49, 11],
'C': [13, 22, 36, 18]
}
df = pd.DataFrame(data)
print("Original DataFrame:\n",df)
Yields below output.
To compute the cumulative sum for the given DataFrame, you can use the cumsum()
method without specifying the axis
parameter. By default, it computes the cumulative sum along the rows (i.e., column-wise).
# Calculating the column-wise cumulative sum
df2 = df.cumsum()
print("Column-wise cumulative sum:\n", df2)
Here,
- Column
A
shows cumulative sums: 15, 53 (15+38), 65 (53+12), and 89 (65+24). - Column
B
shows cumulative sums: 52, 83 (52+31), 132 (83+49), and 143 (132+11). - Column
C
shows cumulative sums: 13, 35 (13+22), 71 (35+36), and 89 (71+18).
Cumulative Sum Across Each Row
Alternatively, to compute the cumulative sum across each row, you can set the axis
parameter to 1
(or columns
) when using the cumsum()
method.
# Calculating the row-wise cumulative sum
df2 = df.cumsum(axis=1)
print("Row-wise Cumulative Sum:\n", df2)
# Output:
# Row-wise Cumulative Sum:
# A B C
# 0 15 67 80
# 1 38 69 91
# 2 12 61 97
# 3 24 35 53
Here,
Row 0
– The cumulative sums are 15 (just the value in columnA
), 67 (15+52), and 100 (67+13).Row 1
– The cumulative sums are 38 (just the value in columnA
), 69 (38+31), and 91 (69+22).Row 2
– The cumulative sums are 12 (just the value in columnA
), 61 (12+49), and 97 (61+36).Row 3
– The cumulative sums are 24 (just the value in columnA
), 35 (24+11), and 53 (35+18).
Cumulative Sum with Missing Values
To demonstrate the cumulative sum with missing values, we will modify your DataFrame to include some NaN
(missing) values. The cumsum()
method by default skips NaN
values (skipna=True
), but you can also choose to propagate NaN
values by setting skipna=False
.
Column-wise Cumulative Sum (Default Behavior, skipna=True)
To compute the column-wise cumulative sum with the default behavior of skipping NaN
values (skipna=True
), you can use the cumsum()
method without specifying the skipna
parameter since skipna=True
is the default setting.
import pandas as pd
import numpy as np
# Creating a sample DataFrame with missing values (NaN)
data= {
'A': [15, np.nan, 12, 24],
'B': [52, 31, np.nan, 11],
'C': [13, 22, 36, np.nan]
}
df_with_nan = pd.DataFrame(data)
# Calculating the column-wise cumulative sum
# With default skipna=True
df2 = df_with_nan.cumsum()
print("Column-wise Cumulative Sum with skipna=True:\n", df2)
# Output:
# Column-wise Cumulative Sum with skipna=True:
# A B C
# 0 15.0 52.0 13.0
# 1 NaN 83.0 35.0
# 2 27.0 NaN 71.0
# 3 51.0 94.0 NaN
Column-wise Cumulative Sum with Propagation of NaN (skipna=False)
You can also use the cumsum()
method with the skipna
parameter set to False
to calculate the column-wise cumulative sum while propagating NaN
values (skipna=False
). This ensures that once a NaN
value is encountered in a column, all subsequent cumulative sums in that column will also be NaN
.
# Calculating the column-wise cumulative sum
# With skipna=False
df2 = df_with_nan.cumsum(skipna=False)
print("Column-wise Cumulative Sum with skipna=False:\n", df2)
# Output:
# Column-wise Cumulative Sum with skipna=False:
# A B C
# 0 15.0 52.0 13.0
# 1 NaN 83.0 35.0
# 2 NaN NaN 71.0
# 3 NaN NaN NaN
Cumulative Sum on a Series
Similarly, to calculate the cumulative sum on a Pandas Series, you can use the cumsum()
method directly on the Series. This method computes the cumulative sum of the elements in the Series, meaning each value in the result is the sum of all the previous values, including the current one.
import pandas as pd
# Creating a sample Series
ser = pd.Series([5, 3, 8, None, 7, 2])
# Calculating the cumulative sum
ser2 = ser.cumsum()
print("Cumulative Sum of the Series:\n", ser2)
# Ouput:
# Cumulative Sum of the Series:
# 0 5.0
# 1 8.0
# 2 16.0
# 3 NaN
# 4 23.0
# 5 25.0
# dtype: float64
FAQ on Pandas DataFrame cumsum() Method
The cumsum()
method in Pandas is used to compute the cumulative sum of DataFrame or Series elements along a specified axis. This means that each element in the output DataFrame or Series represents the sum of all previous elements, including the current one, either row-wise or column-wise.
To use the cumsum()
method on a DataFrame, simply call it on the DataFrame object. By default, it computes the cumulative sum along the rows (i.e., column-wise).
cumsum()
can be used directly on a Pandas Series. It calculates the cumulative sum of the Series elements.
The cumsum()
method returns a DataFrame if called on a DataFrame or a Series if called on a Series. The output retains the same shape and index/column labels as the input, but with the values replaced by their cumulative sums.
While cumsum()
calculates the cumulative sum, cumprod()
and calculates the cumulative product. For example, instead of summing the values, cumprod()
multiply them cumulatively along the specified axis.
Conclusion
In conclusion, the Pandas cumsum()
method is a powerful tool for calculating cumulative sums within a DataFrame or Series. It works effectively with numeric data types, allowing you to compute cumulative totals down each column or across each row.
Happy Learning!!
Related Articles
- Pandas DataFrame bfill() Method
- Pandas DataFrame any() Method
- Pandas DataFrame round() Method
- Pandas DataFrame mad() Method
- Pandas DataFrame copy() Function
- Pandas DataFrame cov() Method
- Pandas DataFrame ffill() Method
- Pandas DataFrame max() Function
- Pandas DataFrame min() Method
- Pandas DataFrame eval() Function
- Pandas DataFrame diff() Method
- Pandas DataFrame mask() Method