Pandas DataFrame cumsum() Method

In Pandas, the cumsum() method is used to compute the cumulative sum of a DataFrame or Series along with a specified axis. This means that for each element in the DataFrame or Series, the cumsum() will return the sum of all previous elements in the specified direction (row-wise or column-wise).

Syntax of Pandas DataFrame cumsum() Method

Let’s know the syntax of the cumsum() method.


# Syntax of DataFrame cumsum() method
DataFrame.cumsum(axis=None, skipna=True, *args, **kwargs)

Parameters of the DataFrame cumsum()

Following are the parameters of the DataFrame cumsum() method.

axis – {0 or ‘index’, 1 or ‘columns’}, default 0. Determines the axis along which to compute the cumulative sum.
- 0 or 'index': Compute column-wise (down each column).
- 1 or 'columns': Compute row-wise (across each row).
skipna – bool, default True. If True, it skips NaN values during computation. If False, NaN values propagate in the cumulative sum.
args, kwargs – Additional arguments for compatibility; are not used.

Return Value

it returns an object of the same size, containing the cumulative sums along the specified axis.

Usage of Pandas DataFrame cumsum() Method

The cumsum() method in Pandas calculates the cumulative sum of elements in a DataFrame or Series along a specified axis. This means that each element in the resulting DataFrame or Series represents the sum of all preceding elements, including the current one, either row-wise or column-wise.

To run some examples of pandas DataFrame cumsum() method, let’s create a Pandas DataFrame using data from a dictionary.


import pandas as pd

# Creating a sample DataFrame
data = {
    'A': [15, 38, 12, 24],
    'B': [52, 31, 49, 11],
    'C': [13, 22, 36, 18]
}

df = pd.DataFrame(data)
print("Original DataFrame:\n",df)

Yields below output.

To compute the cumulative sum for the given DataFrame, you can use the cumsum() method without specifying the axis parameter. By default, it computes the cumulative sum along the rows (i.e., column-wise).


# Calculating the column-wise cumulative sum
df2 = df.cumsum()
print("Column-wise cumulative sum:\n", df2)

Here,

Column A shows cumulative sums: 15, 53 (15+38), 65 (53+12), and 89 (65+24).
Column B shows cumulative sums: 52, 83 (52+31), 132 (83+49), and 143 (132+11).
Column C shows cumulative sums: 13, 35 (13+22), 71 (35+36), and 89 (71+18).

Cumulative Sum Across Each Row

Alternatively, to compute the cumulative sum across each row, you can set the axis parameter to 1 (or columns) when using the cumsum() method.


# Calculating the row-wise cumulative sum
df2 = df.cumsum(axis=1)
print("Row-wise Cumulative Sum:\n", df2)

# Output:
# Row-wise Cumulative Sum:
#     A   B   C
# 0  15  67  80
# 1  38  69  91
# 2  12  61  97
# 3  24  35  53

Here,

Row 0 – The cumulative sums are 15 (just the value in column A), 67 (15+52), and 100 (67+13).
Row 1 – The cumulative sums are 38 (just the value in column A), 69 (38+31), and 91 (69+22).
Row 2 – The cumulative sums are 12 (just the value in column A), 61 (12+49), and 97 (61+36).
Row 3 – The cumulative sums are 24 (just the value in column A), 35 (24+11), and 53 (35+18).

Cumulative Sum with Missing Values

To demonstrate the cumulative sum with missing values, we will modify your DataFrame to include some NaN (missing) values. The cumsum() method by default skips NaN values (skipna=True), but you can also choose to propagate NaN values by setting skipna=False.

Column-wise Cumulative Sum (Default Behavior, skipna=True)

To compute the column-wise cumulative sum with the default behavior of skipping NaN values (skipna=True), you can use the cumsum() method without specifying the skipna parameter since skipna=True is the default setting.


import pandas as pd
import numpy as np

# Creating a sample DataFrame with missing values (NaN)
data= {
    'A': [15, np.nan, 12, 24],
    'B': [52, 31, np.nan, 11],
    'C': [13, 22, 36, np.nan]
}

df_with_nan = pd.DataFrame(data)

# Calculating the column-wise cumulative sum 
# With default skipna=True
df2 = df_with_nan.cumsum()
print("Column-wise Cumulative Sum with skipna=True:\n", df2)

# Output:
# Column-wise Cumulative Sum with skipna=True:
#       A     B     C
# 0  15.0  52.0  13.0
# 1   NaN  83.0  35.0
# 2  27.0   NaN  71.0
# 3  51.0  94.0   NaN

Column-wise Cumulative Sum with Propagation of NaN (skipna=False)

You can also use the cumsum() method with the skipna parameter set to False to calculate the column-wise cumulative sum while propagating NaN values (skipna=False). This ensures that once a NaN value is encountered in a column, all subsequent cumulative sums in that column will also be NaN.


# Calculating the column-wise cumulative sum 
# With skipna=False
df2 = df_with_nan.cumsum(skipna=False)
print("Column-wise Cumulative Sum with skipna=False:\n", df2)

# Output:
# Column-wise Cumulative Sum with skipna=False:
#       A     B     C
# 0  15.0  52.0  13.0
# 1   NaN  83.0  35.0
# 2   NaN   NaN  71.0
# 3   NaN   NaN   NaN

Cumulative Sum on a Series

Similarly, to calculate the cumulative sum on a Pandas Series, you can use the cumsum() method directly on the Series. This method computes the cumulative sum of the elements in the Series, meaning each value in the result is the sum of all the previous values, including the current one.


import pandas as pd

# Creating a sample Series
ser = pd.Series([5, 3, 8, None, 7, 2])

# Calculating the cumulative sum
ser2 = ser.cumsum()
print("Cumulative Sum of the Series:\n", ser2)

# Ouput:
# Cumulative Sum of the Series:
# 0     5.0
# 1     8.0
# 2    16.0
# 3     NaN
# 4    23.0
# 5    25.0
# dtype: float64

FAQ on Pandas DataFrame cumsum() Method

What is the purpose of the cumsum() method in Pandas?

The cumsum() method in Pandas is used to compute the cumulative sum of DataFrame or Series elements along a specified axis. This means that each element in the output DataFrame or Series represents the sum of all previous elements, including the current one, either row-wise or column-wise.

How do I use the cumsum() method on a DataFrame?

To use the cumsum() method on a DataFrame, simply call it on the DataFrame object. By default, it computes the cumulative sum along the rows (i.e., column-wise).

Can cumsum() be used on a Series, and how?

cumsum() can be used directly on a Pandas Series. It calculates the cumulative sum of the Series elements.

What type of object does cumsum() return?

The cumsum() method returns a DataFrame if called on a DataFrame or a Series if called on a Series. The output retains the same shape and index/column labels as the input, but with the values replaced by their cumulative sums.

How is cumsum() different from cumprod()?

While cumsum() calculates the cumulative sum, cumprod() and calculates the cumulative product. For example, instead of summing the values, cumprod() multiply them cumulatively along the specified axis.

Conclusion

In conclusion, the Pandas cumsum() method is a powerful tool for calculating cumulative sums within a DataFrame or Series. It works effectively with numeric data types, allowing you to compute cumulative totals down each column or across each row.

Happy Learning!!

Reference

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.cumprod.html