Pandas DataFrame.reindex()
function is used to change the row indexes and the column labels. Pandas reindex conforms DataFrame to a new index with optional filling logic and to Place NA/NaN in locations having no value in the previous index.
Related: Pandas Reset Index from starting zero (0)
This function takes several parameters like labels
, index
, columns
, axis
, method
, copy
, level
, fill_value
, limit
, and tolerance
and returns a DataFrame with a changed index/reindexed.
In this article, I will explain the syntax, usage, and explanation with examples of how to use reindex() with single and multiple rows or columns of the DataFrame.
Key Points –
- The
reindex()
function is used to conform a DataFrame to a new index, allowing for reordering, adding, or removing index labels. - The primary parameter is
index
, which specifies the new index labels. Additional parameters includecolumns
for reindexing columns, andfill_value
for filling in missing values. - You can specify which axis to reindex using the
axis
parameter (0 for rows, 1 for columns). - The
reindex()
function can sort the DataFrame based on the new index if thesort
parameter is set toTrue
. - By default,
reindex()
returns a new object; however, you can modify the original DataFrame in place by setting theinplace
parameter toTrue
. - When reindexing, existing data aligns with the new index labels. If an index label does not exist in the original DataFrame, it will be filled with
NaN
or a specifiedfill_value
.
1. Quick Examples of DataFrame reindex() Function
If you are in a hurry, below are some quick examples of how to use reindex() function in DataFrame.
# Quick examples of DataFrame reindex() function
# Example 1: Use reindex() function
# To reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])
# Example 2: Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")
# Example 3: Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])
# Example 4: Use reindex the dataframe
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])
# Example 5: Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")
# Example 6: Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])
2. Syntax of DataFrame.reindex()
Following is the syntax of DataFrame.reindex() function.
# Syntax of Pandas DataFrame.reindex()
DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=nan, limit=None, tolerance=None)
2.1 Parameters of reindex()
Following are the parameters of pandas reindex() function.
labels
– New labels/index to conform to the axis that is specified by ‘axis’.index, columns
– This is also an optional parameter that refers to the new labels/index. It generally prefers an index object for avoiding duplicate data.axis
– This is an optional parameter that axis to target. Can be either the axis name (‘index’, ‘columns’) or numbers (0, 1).method
– {None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}, optional, default None. Used for filling the holes in the reindexed DataFrame. For increasing/decreasing indexes only.copy
– bool, default True: Whether to return a new object (a copy), even if the passed indexes are the same.level
– int or name: It is used to broadcast across the level, matching Index values on the passed MultiIndex level.fill_value
– Its default value is np.NaN and used to fill existing missing (NaN) values. and any new element needed for successful DataFrame alignment, with this value before computation.limit
– It defines the maximum number of consecutive elements to forward or backward fill.tolerance
– This is also an optional parameter that determines the maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] – target) <= tolerance.
2.2 Return value of reindex()
It returns a DataFrame with a changed index/reindexed.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses
, Fee
, Duration
and Discount
.
# Return value of reindex()
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee' :['22000','25000','24000','26000'],
'Duration':['30days','50days','40days','60days'],
'Discount':['1000','2300','2500','1400']
})
df = pd.DataFrame(technologies, index = ['r1', 'r2', 'r3', 'r4'])
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 22000 30days 1000
r2 PySpark 25000 50days 2300
r3 Hadoop 24000 40days 2500
r4 Pandas 26000 60days 1400
3. Pandas Reindex DataFrame Rows
We can use Pandas reindex()
function to reindex the DataFrame. When we reindex a row using reindex() function, it default assigns NaN values for the new index row. Missing values can fill by passing a value to the keyword fill_value
.
# Use reindex() function to reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 22000 30days 1000
index NaN NaN NaN NaN
r3 Hadoop 24000 40days 2500
r4 Pandas 26000 60days 1400
4. Fill Missing Values with reindex()
Notice, we have NaN values present in the new rows after reindexing, we can use the argument fill_value
to the function to fill values instead on NaN. We can fill the NaN/null values using the parameter fill_value="Java"
in the DataFrame.reindex()
function.
# Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 22000 30days 1000
index Java Java Java Java
r3 Hadoop 24000 40days 2500
r4 Pandas 26000 60days 1400
5. Pandas Reindex DataFrame Columns
We can also reindex the column of the Pandas DataFrame using the DataFrame.reindex()
function. The specified column index, which is not in the original DataFrame will be filled by the NaN values automatically. For example.
# Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])
print(df2)
# Use reindex the DataFrame
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])
print(df2)
Yields below output.
# Output:
Courses Fee Duration Percentage
r1 Spark 22000 30days NaN
r2 PySpark 25000 50days NaN
r3 Hadoop 24000 40days NaN
r4 Pandas 26000 60days NaN
We can fill the NaN/null values using the parameter fill_value=30%
in the DataFrame.reindex()
function. After changing the column name, if there is a NaN value, the NaN values will be filled by the value 30%
.
# Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")
print(df2)
Yields below output.
# Output:
Courses Fee Duration Percentage
r1 Spark 22000 30days 30%
r2 PySpark 25000 50days 30%
r3 Hadoop 24000 40days 30%
r4 Pandas 26000 60days 30%
6. Chang the Rows Order using reindex()
We can use reindex()
function to change the order of the Pandas DataFrame rows. For that we need to pass the specified list of indexes into this function, it will return DataFrame with the specified index.
# Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r3 Hadoop 24000 40days 2500
r4 Pandas 26000 60days 1400
r2 PySpark 25000 50days 2300
r1 Spark 22000 30days 1000
6. Complete Example of reindex() Function
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee' :['22000','25000','24000','26000'],
'Duration':['30days','50days','40days','60days'],
'Discount':['1000','2300','2500','1400']
})
df = pd.DataFrame(technologies, index = ['r1', 'r2', 'r3', 'r4'])
print(df)
# Use reindex() function to reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])
print(df2)
# Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")
print(df2)
# Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])
print(df2)
# Use reindex the DataFrame
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])
print(df2)
# Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")
print(df2)
# Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])
print(df2)
FAQ on Pandas DataFrame reindex() Function
The reindex()
function is used to change the index or columns of a DataFrame to match a specified set of labels, adding missing labels or removing existing ones as needed.
To reindex rows in a Pandas DataFrame, use the reindex()
method and specify the desired sequence of row labels.
You can reindex both rows and columns simultaneously in a Pandas DataFrame by specifying values for the index
and columns
parameters in the reindex()
method.
Use the reindex()
method with the fill_value
parameter to specify a value (other than NaN
) to be used for missing labels.
reindex()
changes or adjusts the existing index/columns to match a specified sequence, while set_index()
replaces the index with one or more columns from the DataFrame.
The reindex_like()
method reindexes a DataFrame to match the structure (index and columns) of another DataFrame, simplifying the process when copying structures.
Conclusion
In this article, I have explained how to change reindex of a Pandas DataFrame using DataFrame.reindex()
function with examples. Pandas reindex conforms DataFrame to a new index with optional filling logic and to Place NA/NaN in locations having no value in the previous index.
Happy Learning !!
Related Articles
- Pandas Filter by Index
- Pandas Drop Rows by Index
- Convert Pandas Index to List
- Pandas set index name to DataFrame
- Pandas Get Index from DataFrame?
- How to Get Index of Series in Pandas
- Convert Pandas DatetimeIndex to String
- Pandas.Index.drop_duplicates() Explained
- How to Rename Column by Index in Pandas
- Pandas Set Index to Column in DataFrame
- Pandas Set Column as Index in DataFrame
- Pandas Set Index to Column in DataFrame
- Get Count of Each Row of Pandas DataFrame
- Pandas Get Column Name by Index or Position
- Series.reindex()-Change the Index Order in Pandas Series
- Pandas Set Value to Particular Cell in DataFrame Using Index