What are window functions in Pandas? Windowing operations are very frequent operations in statistical calculations and data analysis. Python Pandas library provides different windowing functions that make complicated windowing operations simpler and easier. In this article, you will see different types of windowing operations on Pandas DataFrame.
Just like the SQL window functions, Pandas library also provides different types of windowing functions which a lot of programmers are missing. Using these windows functions will give you more power and save time while working with the Pandas library.
Table of contents
- 1. What is a Window Function?
- 2. How to Use Window Functions in Pandas
- 3. Pandas Rolling Window Function
- 4. Weighted Window Function
- 5. Expanding Window Function
- 6. Exponentially Weighted window functions
- 7. Summary and Conclusion
- Related Articles
1. What is a Window Function?
Pandas Window functions are functions where the input values are taken from a “window” of one or more rows in a series or a table and calculation is performed over them. The word window means the number of rows between the two boundaries by which we perform calculations including the boundary rows.
1.1 Example of Window Function
Let’s say, you want to do the calculation on every 3 rows then the value of the “window” variable will be 3. However, if you want to apply this calculation to every 5 rows then your “window” variable value will extend to 5.
To have a clear example, let’s say we want the mean value of each of the 3 rows of a specific column, then the window value 3 and the operation performed will be called a Window function.
1.2 Example of Window Function in Pandas Library
We could either find the average low price by doing the manual calculations, but we got lucky as we have the pandas library. Let’s say I need another column that has the average low price of each 3 days. Got confused? No, you have to follow the following method.
To run the following code, you will need to download the dataset From Kaggle. The code will work with any of the latest versions of pandas but in the worst case, it is to mention that I am using Pandas version 1.5.1.
import pandas as pd df = pd.read_csv('stockdata.csv') # Convert our "Low" Column to a DataFrame target_col_df = pd.to_numeric(df["Low"]) # Creating another column with Average Low Price of last 3 days df['average_low'] = target_col_df.rolling(window=3).mean() df.head(10)
Yields below output. The “average_low” column has values by averaging the last 3 days’ Low values. Note that the first 2 rows have NaN value as there are no 3 rows before to calculate the mean.
2. How to Use Window Functions in Pandas
With the Python Pandas library, almost any type of window operation is possible, however, Pandas only give functions for 4 types of windowing operations. My claim of performing all types of window operations support with pandas is valid, if and only if you have knowledge of other relevant pandas functions.
Anyway, our job is to make things easier by providing examples. According to the Pandas official documentation, Pandas provide window functions for the following 4 types of windowing operations.
- Rolling window operations
- Weighted window operations
- Expanding window operations
- Exponentially Weighted window
3. Pandas Rolling Window Function
Rolling Window Operation is the most common and frequent type of operation that statistics and data analysts do. Most of the time, this operation is enough to fulfil our needs. Pandas provide functions with different parameters to perform rolling window operations.
We will look into each one step by step but first, you need to have a clear picture of Rolling window operation in general.
3.1 What is Rolling Window Operation
As from the word “Rolling”, which means sliding in a sense, Rolling windows operations means applying some formula to a range of rows by sliding with that fixed number, from the initial value all the way to the final value.
Just like mentioned in the section 1 example, let’s say we want to find the sum of each 3 rows by sliding to the end of the column. We consider values from 2 previous rows and one new added row value. This is called rolling window operation.
Pandas Library provides a built-in function for Rolling Window operations. These window functions are helpful in calculating different values and finding patterns and trends in the data by seeing meaningful full insights.
Pandas rolling windows function support a window of a fixed number. This means that you need to specify the window size before you apply the rolling operation.
3.2 Example #1: Rolling window function predefined methods
In the following example all we do is find the sum of each 2 days and create another column for it. If this does not make sense right now please continue it will make sense at the end when we create our own custom function.
import pandas as pd df = pd.read_csv('stockdata.csv') # converting our Low Column to a DataFrame target_col_df = pd.to_numeric(df["High"]) # Creating another column with sum of previous 2 days High Price df['sum_of_previous_2_day_high_price'] = target_col_df.rolling(window=2).sum() df.head(10)
Yields below output. See the highlighted values
3.3 Example #2: Rolling window function with apply() function
In the previous example, we have seen that we can apply only a set of functions to the Rolling object. Here in this example, you will see you call our custom function.
We can use the apply() function to pass an extra argument func that can perform more general rolling calculations. See the following example, where we find the difference between the value and the index of that value in the column.
import pandas as pd df = pd.read_csv('stockdata.csv') # for example, it jsut subtract index from the value def sub_t(x): return x - x.index # converting our Low Column to a DataFrame target_col_df = pd.to_numeric(df["Open"]) # Creating a Raw Column having Open Value - Index of the value df['Raw'] = target_col_df.rolling(window=1).apply(sub_t) df.head(10)
Yields below output.
4. Weighted Window Function
There might be cases where you prefer value with some weightage, in these cases, the Pandas weighted windows functions are making these operations much easier.
In cases, where the rolling window function fails, you can try the Weighted window function. This is a bit more advanced than using the rolling window function but not complicated.
The weighted window operation in Pandas can be achieved by using the win_type argument in .rolling() function. This function is dependent on SciPy library ( another python library for scientific calculations).
4.1 Example: Using Weighted Window Function in Pandas
Using the Pandas Weighted window function with “triang” window type. We have used the same dataframe as used in the above examples. Keep in mind we have set the window size as 4 but you can make it according to your needs.
In the below Example, We have used the “triang” window type, Feel free to replace it with the “gaussian” window type.
import pandas as pd df = pd.read_csv('stockdata.csv') # converting our Low Column to a DataFrame just for processing target_col_df = pd.to_numeric(df["Open"]) # we have created another col with weighted mean using "triang" df['Weighted_mean'] = target_col_df.rolling(window=4, center=True, win_type="triang").mean() df.head(10)
Yields below output.
5. Expanding Window Function
After the Rolling and weighted window, next in the list are expanding window functions. The Expanding window Operation is the Same as the rolling window except it expands with each iteration. It yields the value of the aggregation statistical function provided.
In simpler words, an Expanding window operation provides the expanding transformation of a window. To better understand, we must look into an example of expanding window function.
The most important thing about Expanding Window Opeariont is, Expanding windows has a fixed lower bound. Only the upper bound of the window is rolled forward (the window gets bigger).
5.1 Example: Using Expanding window function
There are tons of functions for expanding windows function, we will use one for the explanation and then you can apply one that suits your need.
import pandas as pd df = pd.read_csv('stockdata.csv') # converting our Low Column to a DataFrame just for processing target_col_df = pd.to_numeric(df["Low"]) # Using the Expanding function mean() df['Expanding'] = target_col_df.expanding(3).sum() df.head(10)
Yields below output.
6. Exponentially Weighted window functions
The Exponentially weighted window function is very much the same as the expanding window function, however, in the Exponentially weighted window, the upcoming point is exponentially weighted down in relation to the current value.
This is a bit more computationally expensive window function, however, can be used in different types of statistical calculations. In pandas, we can use the pd.ewm() function to get the object of the Exponentially Weighted window. We can then call a bunch of methods.
6.1 Example: Using Exponentially Weighted functions
In the following example, we have found the mean values of the “Low” price column. We have used the using Exponentially Weighted window functions.
import pandas as pd df = pd.read_csv('stockdata.csv') # converting our Low Column to a DataFrame just for processing target_col_df = pd.to_numeric(df["Low"]) # Using the Expanding function mean() df['EWM'] = target_col_df.ewm(com=3).mean() df.head(10)
Yields below output.
7. Summary and Conclusion
Now, you have a clear idea of Pandas window functions. You can now convert windowing operations from SQL, Spark or any other programming language to Pandas. If you want to add something or have a query in your mind, please feel free to comment. We will instantly answer your questions.