pandas.DataFrame.rolling() function can be used to get the rolling mean, average, sum, median, max, min e.t.c for one or multiple columns. Rolling mean is also known as the moving average, It is used to get the rolling window calculation.
Rolling and moving averages are used to analyze the data for a specific time series and to spot trends in that data.
Key Points –
- The
rolling()
function can be used with various aggregation functions, such asmean()
,sum()
,min()
,max()
, etc. This flexibility enables you to perform different types of rolling calculations based on the specific analysis requirements. - The
rolling()
function is often applied to time-series data, and it works well when the DataFrame has a time-based index. This allows for meaningful calculations over consecutive time intervals, such as days, months, or years, depending on the frequency of the time index. - The
rolling()
function in pandas calculates the rolling mean by specifying a window size, which determines the number of data points included in each calculation. The rolling mean is computed for each window as it moves through the time-series data. - By default, the
rolling()
function produces NaN values for the first few entries where there are not enough data points to fill the specified window size. You can control how missing values are handled using themin_periods
parameter. Settingmin_periods=1
ensures that the rolling mean is calculated as long as there is at least one non-missing value in the window. - One key feature of the
rolling()
function is the ability to adjust the size of the rolling window. A smaller window captures short-term variations with greater sensitivity, whereas a larger window provides a more smoothed representation of the data. Choosing an appropriate window size depends on the nature of the data and the analysis goals, allowing users to balance sensitivity to changes and noise reduction based on their specific requirements.
1. Syntax of DataFrame.rolling()
Following is the syntax of DataFrame.rolling() function. Returns a window of rolling subclass.
# Syntax of DataFrame.rolling()
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')
Use the window
param to specify the size of the moving window. min_periods
is used to specify the minimum number of observations in the window. if the minimum number is not present it results in NA.
Use win_type=None
to have all points are evenly weighted.
First, let’s create a pandas DataFrame to explain rolling() with examples
# Create DataFrame
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [0,1,2,4,6,10,4],
'B': [0,1,3,6,9,np.nan,4]})
print(df)
# Outputs:
# A B
# 0 0 0.0
# 1 1 1.0
# 2 2 3.0
# 3 4 6.0
# 4 6 9.0
# 5 10 NaN
# 6 4 4.0
2. pandas rolling() Example
rolling()
function returns a subclass of Rolling with the values used to calculate. The rolling(window=2)
creates a rolling window object with a window size of 2. This object is not the result of the rolling calculation but rather a subclass of pandas.core.window.Rolling
. It is used to perform subsequent rolling calculations on the specified window.
# Returns Rolling subclass.
rolling=df.rolling(window=2)
print(rolling)
# Outputs:
Rolling [window=2,center=False,axis=0,method=single]
Now, let’s do the rolling sum with window=3
. By default, the result is set to the right edge of the window. You can change this to the center of the window by setting center=True
.
# Rolling() of sum with window length 3
df2=df.rolling(window=3).sum()
print(df2)
# Outputs:
# A B
# 0 NaN NaN
# 1 NaN NaN
# 2 3.0 4.0
# 3 7.0 10.0
# 4 12.0 18.0
# 5 20.0 NaN
# 6 20.0 NaN
3. pandas rolling() Mean
You can also calculate the mean or average with pandas.DataFrame.rolling()
function, rolling mean is also known as the moving average, It is used to get the rolling window calculation. This use win_type=None
, meaning all points are evenly weighted.
# Rolling() of mean with window length 3
df2=df.rolling(window=3).mean()
print(df2)
# Outputs:
# A B
# 0 NaN NaN
# 1 NaN NaN
# 2 1.000000 1.333333
# 3 2.333333 3.333333
# 4 4.000000 6.000000
# 5 6.666667 NaN
# 6 6.666667 NaN
4. By using Triange mean
If you want to use the triangular weighted rolling mean, you can specify the win_type
parameter in the rolling()
function. For the triangular window, you can set win_type='triang'
. For instance, the win_type='triang'
parameter is added to the rolling()
function, which applies a triangular weighting to the rolling window. The resulting DataFrame (rolling_triangle
) will contain the triangular weighted rolling mean.
# Rolling() of sum with win_type triang
df2=df.rolling(window=3, win_type='triang').mean()
print("Rolling triangular weighted mean:\n",df2)
# Outputs:
# Rolling triangular weighted mean:
# A B
# 0 NaN NaN
# 1 NaN NaN
# 2 1.00 1.25
# 3 2.25 3.25
# 4 4.00 6.00
# 5 6.50 NaN
# 6 7.50 NaN
In this output, the rolling_triangle
DataFrame contains the triangular weighted rolling mean for each column. Note that the first row will have NaN values due to the insufficient number of data points for the specified window size. Adjust the window size and other parameters based on your specific analysis requirements.
5. Using gaussian
To use the Gaussian window for calculating the rolling mean, you can specify win_type='gaussian'
in the rolling()
function. For example, the win_type='gaussian'
parameter is added to the rolling()
function. Additionally, the std
parameter is used to specify the standard deviation of the Gaussian window. Adjust the std
value based on your requirements.
Following example does the rolling mean with a window length of 3, using the ‘gaussian’ window type. With Gaussian window type, you have to provide the std param. Not using this results in TypeError: gaussian() missing 1 required positional argument: ‘std’
# Rolling() of sum with window type gaussian
df2=df.rolling(window=3, win_type='gaussian').mean(std=3)
print(df2)
# Outputs:
# A B
# 0 NaN NaN
# 1 NaN NaN
# 2 1.000000 1.327104
# 3 2.327104 3.327104
# 4 4.000000 6.000000
# 5 6.654209 NaN
# 6 6.728956 NaN
6. Rollings Sum & Min
To use the rolling()
function with the agg()
method to calculate the rolling sum for column ‘A’ and the rolling minimum for column ‘B’. This is a concise way to perform multiple rolling aggregations on different columns simultaneously.
The below example provides multiple rollings using agg()
function, It does the window length of 2, and performs sum on column A and min on column B.
# Rolling agg on multuple columns
df2 = df.rolling(2).agg({"A": "sum", "B": "min"})
print(df2)
# Outputs:
# A B
# 0 NaN NaN
# 1 1.0 0.0
# 2 3.0 1.0
# 3 6.0 3.0
# 4 10.0 6.0
# 5 16.0 NaN
# 6 14.0 NaN
In this output, you can see the rolling sum for column ‘A’ and the rolling minimum for column ‘B’ calculated with a window size of 2. Adjust the window size or apply different aggregation functions based on your specific analysis requirements.
FAQ on Pandas rolling() Mean, Average, Sum
The rolling()
function in pandas is used for rolling window calculations on time-series data or sequential data. It allows you to perform operations, such as mean, average, sum, etc., on a specified window of data that “rolls” or moves through the dataset. This is useful for analyzing trends and patterns in data over time.
To use the rolling()
function in pandas for calculating the rolling mean. For example, the rolling mean is calculated for the ‘value’ column with a window size of 3. Adjust the window size according to your analysis requirements. The result is a new series (rolling_mean
) with NaN values for the first few rows due to insufficient data for the specified window size. You can use the resulting series for further analysis or visualization.
Calculating the rolling sum using the rolling()
function in pandas is similar to calculating the rolling mean. For example, the rolling sum is calculated for the ‘value’ column with a window size of 3. Adjust the window size according to your analysis requirements.
By default, the rolling()
function handles missing values by producing NaN in the output. If you want to specify how to handle these missing values, you can use the min_periods
parameter.
You can apply multiple rolling calculations simultaneously on different columns in pandas. You can use the rolling()
function independently on each column of interest.
Conclusion
In this article, you have learned the syntax of the rolling()
function and how to calculate the rolling mean, average, median, and sum by using different parameters with examples. It supports rolling to calculate mean, max, min, sum, count, median, std e.t.c
Related Articles
- Pandas Rolling Sum
- Pandas Window Functions Explained
- Count NaN Values in Pandas DataFrame
- pandas.DataFrame.mean() Examples
- How to Plot a Scatter Plot Using Pandas?
- Convert Pandas DataFrame to Series
- Remove NaN From Pandas Series
- How to use Pandas unstack() Function
- Pandas Aggregate Functions with Examples
- Calculate Summary Statistics in Pandas
- Convert Pandas Timestamp to Datetime
- Pandas Handle Missing Data in Dataframe
- How to Get Column Average or Mean in Pandas DataFrame