• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:21 mins read
You are currently viewing Pandas Window Functions Explained

What are window functions in Pandas? Windowing operations are very frequent operations in statistical calculations and data analysis. Python Pandas library provides different windowing functions that make complicated windowing operations simpler and easier. In this article, you will see different types of windowing operations on Pandas DataFrame.

Advertisements

Just like the SQL window functions, Pandas library also provides different types of windowing functions which a lot of programmers are missing. Using these windows functions will give you more power and save time while working with the Pandas library.

Key Points –

  • Pandas window functions provide a flexible framework for performing rolling and expanding calculations on data.
  • They allow for efficient computation of statistics over sliding or expanding windows of data.
  • Window functions can be applied to various types of data, including time series and numerical data.
  • These functions offer versatility in specifying window sizes, types of aggregation, and handling of missing values.

1. What is a Window Function?

Pandas Window functions are functions where the input values are taken from a “window” of one or more rows in a series or a table and calculation is performed over them. The word window means the number of rows between the two boundaries by which we perform calculations including the boundary rows.

1.1 Example of Window Function

Let’s say, you want to do the calculation on every 3 rows then the value of the “window” variable will be 3. However, if you want to apply this calculation to every 5 rows then your “window” variable value will extend to 5.

To have a clear example, let’s say we want the mean value of each of the 3 rows of a specific column, then the window value 3 and the operation performed will be called a Window function.

windowing function example
The window size in the above picture is 3

1.2 Example of Window Function in Pandas Library

We could either find the average low price by doing the manual calculations, but we got lucky as we have the pandas library. Let’s say I need another column that has the average low price of each 3 days. Got confused? No, you have to follow the following method.

To run the following code, you will need to download the dataset From Kaggle. The code will work with any of the latest versions of pandas but in the worst case, it is to mention that I am using Pandas version 1.5.1.


import pandas as pd
df = pd.read_csv('stockdata.csv')

# Convert our "Low" Column to a DataFrame
target_col_df = pd.to_numeric(df["Low"])

# Creating another column with Average Low Price of last 3 days
df['average_low'] = target_col_df.rolling(window=3).mean()
df.head(10)

Yields below output. The “average_low” column has values by averaging the last 3 days’ Low values. Note that the first 2 rows have NaN value as there are no 3 rows before to calculate the mean.

pandas window functions
DataFrame with one Extra Column “average_low”

2. How to Use Window Functions in Pandas

With the Python Pandas library, almost any type of window operation is possible, however, Pandas only give functions for 4 types of windowing operations. My claim of performing all types of window operations support with pandas is valid, if and only if you have knowledge of other relevant pandas functions.

Anyway, our job is to make things easier by providing examples. According to the Pandas official documentation, Pandas provide window functions for the following 4 types of windowing operations.

  1. Rolling window operations
  2. Weighted window operations
  3. Expanding window operations
  4. Exponentially Weighted window

3. Pandas Rolling Window Function

Rolling Window Operation is the most common and frequent type of operation that statistics and data analysts do. Most of the time, this operation is enough to fulfil our needs. Pandas provide functions with different parameters to perform rolling window operations.

We will look into each one step by step but first, you need to have a clear picture of Rolling window operation in general.

3.1 What is Rolling Window Operation

As from the word “Rolling”, which means sliding in a sense, Rolling windows operations means applying some formula to a range of rows by sliding with that fixed number, from the initial value all the way to the final value.

Just like mentioned in the section 1 example, let’s say we want to find the sum of each 3 rows by sliding to the end of the column. We consider values from 2 previous rows and one new added row value. This is called rolling window operation.

Pandas Library provides a built-in function for Rolling Window operations. These window functions are helpful in calculating different values and finding patterns and trends in the data by seeing meaningful full insights.

Pandas rolling windows function support a window of a fixed number. This means that you need to specify the window size before you apply the rolling operation.

Related: Pandas rolling mean, average and sum examples

3.2 Example #1: Rolling window function predefined methods

In the following example all we do is find the sum of each 2 days and create another column for it. If this does not make sense right now please continue it will make sense at the end when we create our own custom function.

In the following example, we have used the predefined method sum(). You can use mean(), median() and many more. In the next section, we will learn to apply our own custom-build functions.


import pandas as pd
df = pd.read_csv('stockdata.csv')

# Converting our Low Column to a DataFrame
target_col_df = pd.to_numeric(df["High"])

# Creating another column with sum of previous 2 days High Price
df['sum_of_previous_2_day_high_price'] = target_col_df.rolling(window=2).sum()
df.head(10)

Yields below output. See the highlighted values

rolling window function pandas
Output: DataFrame with one Extra Column

3.3 Example #2: Rolling window function with apply() function

In the previous example, we have seen that we can apply only a set of functions to the Rolling object. Here in this example, you will see you call our custom function.

We can use the apply() function to pass an extra argument func that can perform more general rolling calculations. See the following example, where we find the difference between the value and the index of that value in the column.


import pandas as pd
df = pd.read_csv('stockdata.csv')

# For example, it jsut subtract index from the value
def sub_t(x):
    return x - x.index

# Converting our Low Column to a DataFrame
target_col_df = pd.to_numeric(df["Open"])

# Creating a Raw Column having Open Value - Index of the value
df['Raw'] = target_col_df.rolling(window=1).apply(sub_t)
df.head(10)

Yields below output.

rolling function output
The Raw Column values are the Result

4. Weighted Window Function

There might be cases where you prefer value with some weightage, in these cases, the Pandas weighted windows functions are making these operations much easier.

In cases, where the rolling window function fails, you can try the Weighted window function. This is a bit more advanced than using the rolling window function but not complicated.

The weighted window operation in Pandas can be achieved by using the win_type argument in .rolling() function. This function is dependent on SciPy library ( another python library for scientific calculations).

4.1 Example: Using Weighted Window Function in Pandas

Using the Pandas Weighted window function with “triang” window type. We have used the same dataframe as used in the above examples. Keep in mind we have set the window size as 4 but you can make it according to your needs.

In the below Example, We have used the “triang” window type, Feel free to replace it with the “gaussian” window type.


import pandas as pd
df = pd.read_csv('stockdata.csv')

# Converting our Low Column to a DataFrame just for processing
target_col_df = pd.to_numeric(df["Open"])

# We have created another col with weighted mean using "triang"
df['Weighted_mean'] = target_col_df.rolling(window=4, center=True, win_type="triang").mean()
df.head(10)

Yields below output.

5. Expanding Window Function

After the Rolling and weighted window, next in the list are expanding window functions. The Expanding window Operation is the Same as the rolling window except it expands with each iteration. It yields the value of the aggregation statistical function provided.

In simpler words, an Expanding window operation provides the expanding transformation of a window. To better understand, we must look into an example of expanding window function.

The most important thing about Expanding Window Opeariont is, Expanding windows has a fixed lower bound. Only the upper bound of the window is rolled forward (the window gets bigger).

5.1 Example: Using Expanding window function

There are tons of functions for expanding windows function, we will use one for the explanation and then you can apply one that suits your need.


import pandas as pd
df = pd.read_csv('stockdata.csv')

# Converting our Low Column to a DataFrame just for processing
target_col_df = pd.to_numeric(df["Low"])

# Using the Expanding function mean()
df['Expanding'] = target_col_df.expanding(3).sum()
df.head(10)

Yields below output.

pandas dataframe window functions

6. Exponentially Weighted window functions

The Exponentially weighted window function is very much the same as the expanding window function, however, in the Exponentially weighted window, the upcoming point is exponentially weighted down in relation to the current value.

This is a bit more computationally expensive window function, however, can be used in different types of statistical calculations. In pandas, we can use the pd.ewm() function to get the object of the Exponentially Weighted window. We can then call a bunch of methods.

6.1 Example: Using Exponentially Weighted functions

In the following example, we have found the mean values of the “Low” price column. We have used the using Exponentially Weighted window functions.


import pandas as pd
df = pd.read_csv('stockdata.csv')

# Converting our Low Column to a DataFrame just for processing
target_col_df = pd.to_numeric(df["Low"])

# Using the Expanding function mean()
df['EWM'] = target_col_df.ewm(com=3).mean()
df.head(10)

Yields below output.

pandas window functions

Frequently Asked Questions on Pandas Window Functions

What are Pandas window functions?

Pandas window functions are tools within the Pandas library that enable users to perform rolling and expanding calculations on data, allowing for the computation of statistics over sliding or expanding windows.

What types of data can Pandas window functions be applied to?

Pandas window functions can be applied to various types of data, including time series data, numerical data, and categorical data.

What are some common use cases for Pandas window functions?

Common use cases for Pandas window functions include calculating moving averages, computing cumulative sums or aggregations over time, detecting trends or anomalies, and performing feature engineering tasks in data analysis and machine learning workflows.

Are Pandas window functions efficient for large datasets?

Pandas window functions are designed to efficiently handle large datasets. However, the efficiency may vary depending on factors such as the size of the window, the complexity of the computation, and the hardware resources available.

Summary and Conclusion

Now, you have a clear idea of Pandas window functions. You can now convert windowing operations from SQL, Spark or any other programming language to Pandas. If you want to add something or have a query in your mind, please feel free to comment. We will instantly answer your questions.

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium