In Pandas, the shift()
function is used to shift the values in a DataFrame or Series along a particular axis. This is useful for creating lagged versions of data, which is often needed in time series analysis.
In this article, I will explain the Pandas DataFrame shift()
function by using its syntax, parameters, usage, and how to return a Series or DataFrame with data shifted by the designated number of periods.
Key Points –
- The
shift()
function is used to shift the values in a DataFrame or Series by a specified number of periods along the desired axis, often used for time-series data manipulation. - When working with time series data, the
freq
parameter can be used to shift the index by a specific frequency (e.g., days, months), enhancing its utility in time-based analyses. - By default,
shift()
introducesNaN
values in positions where data is moved, but these can be replaced with a specifiedfill_value
. - The function returns a DataFrame or Series with the same shape and type as the caller, but with the data shifted.
Pandas DataFrame shift() Introduction
Let’s know the syntax of the Pandas DataFrame shift() function.
# Syntax of Pandas dataframe shift()
DataFrame.shift(periods=1, freq=None, axis=0, fill_value=None)
Parameters of the DataFrame shift()
Following are the parameters of the DataFrame shift() function.
periods
– int. Number of periods to shift. Positive values shift data downward or to the right, and negative values shift data upward or to the left. Default is 1.freq
– DateOffset, timedelta, or str. Optional frequency string or DateOffset object to shift the index by a specific frequency increment. Only applicable to time series data with a DateTimeIndex.axis
– {0 or ‘index’, 1 or ‘columns’}. Axis along which to shift. 0 or ‘index’ for shifting the index (rows), 1 or ‘columns’ for shifting the columns. Default is 0.fill_value
– scalar, optional. The scalar value to use for newly introduced missing values. Default is None, which introduces NaN values.
Return Value
It returns a DataFrame or Series with the same shape and data type, but with the values shifted.
Usage of Pandas DataFrame shift() Function
The shift()
function in Pandas is used to shift the values in a DataFrame or Series by a specified number of periods along a particular axis. It is particularly useful in time series analysis, data preprocessing, and creating lagged variables.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Column1
, Column2
.
# Create DataFrame
import pandas as pd
import numpy as np
data = {'Column1': [2, 4, 6, 8, 10],
'Column2': [3, 5, 7, 9, 11]}
df = pd.DataFrame(data)
print("Original DataFrame:\n",df)
Yields below output.
Shifting Rows Downward by 1 Period (Default)
To shift rows downward by 1 period using the default settings of the shift()
function in Pandas, you can simply call the shift()
method on your DataFrame or Series without any additional arguments.
# Shifting rows downward by 1 period
df2 = df.shift()
print("Shifted DataFrame:\n", df2)
Here,
- The
shift()
function is called on the DataFramedf
. - By default,
shift()
shifts the rows downward by 1 period (periods=1
). - The top row is filled with NaN values because there are no preceding values to fill these positions.
- All other rows are shifted down by one position, preserving their original order.
Shifting Rows Upward by 2 Periods
Alternatively, to shift rows upward by 2 periods in a DataFrame, you can use the shift()
function with the periods
parameter set to -2
.
# Shifting rows upward by 2 periods
df2 = df.shift(periods=-2)
print("Shifted DataFrame:\n", df2)
# Output:
# Shifted DataFrame:
# Column1 Column2
# 0 6.0 7.0
# 1 8.0 9.0
# 2 10.0 11.0
# 3 NaN NaN
# 4 NaN NaN
Here,
- The
periods
parameter is set to-2
to shift rows upward by two periods. - The last two rows are filled with
NaN
values because there are no succeeding values to fill these positions. - All other rows are shifted up by two positions, preserving their original order.
Shifting Columns to the Right
To shift columns to the right in a DataFrame, you can use the shift()
function with the axis
parameter set to 1
.
# Shifting columns to the right by 1 period
df2 = df.shift(periods=1, axis=1)
print("Shifted DataFrame:\n", df2)
# Output:
# Shifted DataFrame:
# Column1 Column2
# 0 NaN 2.0
# 1 NaN 4.0
# 2 NaN 6.0
# 3 NaN 8.0
# 4 NaN 10.0
Here,
- The
periods
parameter is set to1
to shift by one period. - The
axis
parameter is set to1
to indicate that the shift should be along the columns. - The first column (
Column1
) is filled withNaN
values because there are no preceding columns to fill these positions. - All other columns are shifted to the right by one position, preserving their original order.
Shifting with a Fill Value
When using the shift()
function in Pandas, you can specify a fill_value
parameter to replace missing values that are introduced as a result of shifting.
# Shifting rows downward by 1 period
# With a fill value of 5
df2 = df.shift(fill_value=5)
print("Shifted DataFrame:\n", df2)
# Using a fill value
df2 = df.shift(periods=1, fill_value=5)
print("Shifted DataFrame:\n", df2)
# Output:
# Shifted DataFrame:
# Column1 Column2
# 0 5 5
# 1 2 3
# 2 4 5
# 3 6 7
# 4 8 9
Here,
- The
shift()
function is called on the DataFramedf
. - By default,
shift()
shifts rows downward by 1 period (periods=1
). - The
fill_value=5
parameter ensures that anyNaN
values introduced by the shift operation are replaced with5
. - As a result, the first row (
NaN
) in the shifted DataFrame is filled with5
because there are no preceding rows to fill these positions.
Shifting a Time Series with Frequency
Similarly, when working with time series data, the shift()
function can be used with the freq
parameter to shift the index by a specific frequency increment. This is particularly useful for aligning time series data or creating lagged features with a clear temporal interpretation.
import pandas as pd
# Creating a sample time series DataFrame
dates = pd.date_range('20220202', periods=5)
df_time = pd.DataFrame({'Value': [1, 2, 3, 4, 5]}, index=dates)
# Shifting the time series by 1 day
df2 = df_time.shift(periods=1, freq='D')
print("Shifting the time series:\n", df2)
# Output:
# Shifting the time series:
# Value
# 2022-02-03 1
# 2022-02-04 2
# 2022-02-05 3
# 2022-02-06 4
# 2022-02-07 5
Here,
- A sample time series DataFrame
df_time
is created with a date range as the index. - The
shift()
function is called on the DataFramedf_time
. - The
periods
parameter is set to1
to shift the data by one period. - The
freq
parameter is set to'D'
to shift the index by one day. - As a result, each row’s index is moved one day forward, preserving the values in their original order.
FAQ on Pandas DataFrame shift() Function
The shift()
function in Pandas shifts the values in a DataFrame or Series by a specified number of periods along a given axis. This can be useful for creating lagged versions of data, aligning data, and calculating differences between consecutive data points.
To shift columns to the right by 1 period in a DataFrame, you can use the shift()
function with the axis
parameter set to 1
.
By default, shift()
introduces NaN
values in positions where data is shifted. You can replace these missing values with a specified scalar using the fill_value
parameter.
To shift rows downward by 1 period in a DataFrame, you can use the shift()
function with its default parameters.
You can use shift()
with a MultiIndex DataFrame. The function will apply the shift to each level of the index based on the specified parameters.
Conclusion
In this article, you have explored the Pandas DataFrame shift()
function, including its syntax, parameters, and usage. You also learned how to return a DataFrame or Series with the same shape and data type as the original, but with values shifted based on the specified parameters.
Happy Learning!!
Related Articles
- Pandas DataFrame insert() Function
- Pandas DataFrame corr() Method
- Pandas DataFrame sum() Method
- Pandas DataFrame div() Function
- Pandas DataFrame copy() Function
- Pandas DataFrame assign() Method
- Pandas DataFrame tail() Method
- Pandas DataFrame pivot() Method
- Pandas DataFrame explode() Method
- Pandas DataFrame nunique() Method
- Pandas DataFrame clip() Method
- Pandas DataFrame median() Method