• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:16 mins read
You are currently viewing pandas rolling() Mean, Average, Sum Examples

pandas.DataFrame.rolling() function can be used to get the rolling mean, average, sum, median, max, min e.t.c for one or multiple columns. Rolling mean is also known as the moving average, It is used to get the rolling window calculation.

Rolling and moving averages are used to analyze the data for a specific time series and to spot trends in that data.

Key Points

  • The rolling() function can be used with various aggregation functions, such as mean(), sum(), min(), max(), etc. This flexibility enables you to perform different types of rolling calculations based on the specific analysis requirements.
  • The rolling() function is often applied to time-series data, and it works well when the DataFrame has a time-based index. This allows for meaningful calculations over consecutive time intervals, such as days, months, or years, depending on the frequency of the time index.
  • The rolling() function in pandas calculates the rolling mean by specifying a window size, which determines the number of data points included in each calculation. The rolling mean is computed for each window as it moves through the time-series data.
  • By default, the rolling() function produces NaN values for the first few entries where there are not enough data points to fill the specified window size. You can control how missing values are handled using the min_periods parameter. Setting min_periods=1 ensures that the rolling mean is calculated as long as there is at least one non-missing value in the window.
  • One key feature of the rolling() function is the ability to adjust the size of the rolling window. A smaller window captures short-term variations with greater sensitivity, whereas a larger window provides a more smoothed representation of the data. Choosing an appropriate window size depends on the nature of the data and the analysis goals, allowing users to balance sensitivity to changes and noise reduction based on their specific requirements.

1. Syntax of DataFrame.rolling()

Following is the syntax of DataFrame.rolling() function. Returns a window of rolling subclass.


# Syntax of DataFrame.rolling()
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')

Use the window param to specify the size of the moving window. min_periods is used to specify the minimum number of observations in the window. if the minimum number is not present it results in NA.

Use win_type=None to have all points are evenly weighted.

First, let’s create a pandas DataFrame to explain rolling() with examples


# Create DataFrame
import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [0,1,2,4,6,10,4],
                   'B': [0,1,3,6,9,np.nan,4]})
print(df)

# Outputs:
#    A    B
# 0   0  0.0
# 1   1  1.0
# 2   2  3.0
# 3   4  6.0
# 4   6  9.0
# 5  10  NaN
# 6   4  4.0

2. pandas rolling() Example

rolling() function returns a subclass of Rolling with the values used to calculate. The rolling(window=2) creates a rolling window object with a window size of 2. This object is not the result of the rolling calculation but rather a subclass of pandas.core.window.Rolling. It is used to perform subsequent rolling calculations on the specified window.


# Returns Rolling subclass.
rolling=df.rolling(window=2)
print(rolling)

# Outputs:
Rolling [window=2,center=False,axis=0,method=single]

Now, let’s do the rolling sum with window=3. By default, the result is set to the right edge of the window. You can change this to the center of the window by setting center=True.


# Rolling() of sum with window length 3
df2=df.rolling(window=3).sum()
print(df2)

# Outputs:
#      A     B
# 0   NaN   NaN
# 1   NaN   NaN
# 2   3.0   4.0
# 3    7.0  10.0
# 4  12.0  18.0
# 5  20.0   NaN
# 6  20.0   NaN

3. pandas rolling() Mean

You can also calculate the mean or average with pandas.DataFrame.rolling() function, rolling mean is also known as the moving average, It is used to get the rolling window calculation. This use win_type=None, meaning all points are evenly weighted.


# Rolling() of mean with window length 3
df2=df.rolling(window=3).mean()
print(df2)

# Outputs:
#          A         B
# 0       NaN       NaN
# 1       NaN       NaN
# 2  1.000000  1.333333
# 3   2.333333  3.333333
# 4  4.000000  6.000000
# 5  6.666667       NaN
# 6  6.666667       NaN

4. By using Triange mean

If you want to use the triangular weighted rolling mean, you can specify the win_type parameter in the rolling() function. For the triangular window, you can set win_type='triang'. For instance, the win_type='triang' parameter is added to the rolling() function, which applies a triangular weighting to the rolling window. The resulting DataFrame (rolling_triangle) will contain the triangular weighted rolling mean.


# Rolling() of sum with win_type triang
df2=df.rolling(window=3, win_type='triang').mean()
print("Rolling triangular weighted mean:\n",df2)

# Outputs:
# Rolling triangular weighted mean:
#      A     B
# 0   NaN   NaN
# 1   NaN   NaN
# 2  1.00  1.25
# 3  2.25  3.25
# 4  4.00  6.00
# 5  6.50   NaN
# 6  7.50   NaN

In this output, the rolling_triangle DataFrame contains the triangular weighted rolling mean for each column. Note that the first row will have NaN values due to the insufficient number of data points for the specified window size. Adjust the window size and other parameters based on your specific analysis requirements.

5. Using gaussian

To use the Gaussian window for calculating the rolling mean, you can specify win_type='gaussian' in the rolling() function. For example, the win_type='gaussian' parameter is added to the rolling() function. Additionally, the std parameter is used to specify the standard deviation of the Gaussian window. Adjust the std value based on your requirements.

Following example does the rolling mean with a window length of 3, using the ‘gaussian’ window type. With Gaussian window type, you have to provide the std param. Not using this results in TypeError: gaussian() missing 1 required positional argument: ‘std’


# Rolling() of sum with window type gaussian
df2=df.rolling(window=3, win_type='gaussian').mean(std=3)
print(df2)

# Outputs:
#          A         B
# 0       NaN       NaN
# 1       NaN       NaN
# 2  1.000000  1.327104
# 3  2.327104  3.327104
# 4  4.000000  6.000000
# 5  6.654209       NaN
# 6   6.728956       NaN

6. Rollings Sum & Min

To use the rolling() function with the agg() method to calculate the rolling sum for column ‘A’ and the rolling minimum for column ‘B’. This is a concise way to perform multiple rolling aggregations on different columns simultaneously.

The below example provides multiple rollings using agg() function, It does the window length of 2, and performs sum on column A and min on column B.


# Rolling agg on multuple columns
df2 = df.rolling(2).agg({"A": "sum", "B": "min"})
print(df2)

# Outputs:
#      A    B
# 0   NaN  NaN
# 1   1.0  0.0
# 2   3.0  1.0
# 3   6.0  3.0
# 4  10.0  6.0
# 5  16.0  NaN
# 6  14.0  NaN

In this output, you can see the rolling sum for column ‘A’ and the rolling minimum for column ‘B’ calculated with a window size of 2. Adjust the window size or apply different aggregation functions based on your specific analysis requirements.

Frequently Asked Questions

What is the purpose of the rolling() function in pandas?

The rolling() function in pandas is used for rolling window calculations on time-series data or sequential data. It allows you to perform operations, such as mean, average, sum, etc., on a specified window of data that “rolls” or moves through the dataset. This is useful for analyzing trends and patterns in data over time.

How do I use the rolling() function for calculating the rolling mean in pandas?

To use the rolling() function in pandas for calculating the rolling mean. For example, the rolling mean is calculated for the ‘value’ column with a window size of 3. Adjust the window size according to your analysis requirements. The result is a new series (rolling_mean) with NaN values for the first few rows due to insufficient data for the specified window size. You can use the resulting series for further analysis or visualization.

How can I calculate the rolling sum using the rolling() function in pandas?

Calculating the rolling sum using the rolling() function in pandas is similar to calculating the rolling mean. For example, the rolling sum is calculated for the ‘value’ column with a window size of 3. Adjust the window size according to your analysis requirements.

What happens when there are missing values in my data while using the rolling() function?

By default, the rolling() function handles missing values by producing NaN in the output. If you want to specify how to handle these missing values, you can use the min_periods parameter.

Can I apply multiple rolling calculations simultaneously on different columns?

You can apply multiple rolling calculations simultaneously on different columns in pandas. You can use the rolling() function independently on each column of interest.

Conclusion

In this article, you have learned the syntax of the rolling() function and how to calculate the rolling mean, average, median and sum by using different parameters with examples. It supports rolling to calculate mean, max, min, sum, count, median, std e.t.c

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium