Pandas DataFrame.reindex()
function is used to change the row indexes and the column labels. Pandas reindex conforms DataFrame to a new index with optional filling logic and to Place NA/NaN in locations having no value in the previous index.
Related: Pandas Reset Index from starting zero (0)
This function takes several parameters like labels
, index
, columns
, axis
, method
, copy
, level
, fill_value
, limit
, and tolerance
and returns a DataFrame with a changed index/reindexed.
In this article, I will explain the syntax, usage, and explanation with examples of how to use reindex() with single and multiple rows or columns of the DataFrame.
1. Quick Examples of DataFrame reindex() Function
If you are in a hurry, below are some quick examples of how to use reindex() function in DataFrame.
# Below are the quick examples
# Example 1: Use reindex() function
# To reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])
# Example 2: Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")
# Example 3: Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])
# Example 4: Use reindex the DataFrame
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])
# Example 5: Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")
# Example 6: Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])
2. Syntax of DataFrame.reindex()
Following is the syntax of DataFrame.reindex() function.
# Syntax of Pandas DataFrame.reindex()
DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=nan, limit=None, tolerance=None)
2.1 Parameters of reindex()
Following are the parameters of pandas reindex() function.
labels
– New labels/index to conform to the axis that is specified by ‘axis’.index, columns
– This is also an optional parameter that refers to the new labels/index. It generally prefers an index object for avoiding duplicate data.axis
– This is an optional parameter that axis to target. Can be either the axis name (‘index’, ‘columns’) or numbers (0, 1).method
– {None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}, optional, default None. Used for filling the holes in the reindexed DataFrame. For increasing/decreasing indexes only.copy
– bool, default True: Whether to return a new object (a copy), even if the passed indexes are the same.level
– int or name: It is used to broadcast across the level, matching Index values on the passed MultiIndex level.fill_value
– Its default value is np.NaN and used to fill existing missing (NaN) values. and any new element needed for successful DataFrame alignment, with this value before computation.limit
– It defines the maximum number of consecutive elements to forward or backward fill.tolerance
– This is also an optional parameter that determines the maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] – target) <= tolerance.
2.2 Return value of reindex()
It returns a DataFrame with a changed index/reindexed.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses
, Fee
, Duration
and Discount
.
# Return value of reindex()
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee' :['22000','25000','24000','26000'],
'Duration':['30days','50days','40days','60days'],
'Discount':['1000','2300','2500','1400']
})
df = pd.DataFrame(technologies, index = ['r1', 'r2', 'r3', 'r4'])
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 22000 30days 1000
r2 PySpark 25000 50days 2300
r3 Hadoop 24000 40days 2500
r4 Pandas 26000 60days 1400
3. Pandas Reindex DataFrame Rows
We can use Pandas reindex()
function to reindex the DataFrame. When we reindex a row using reindex() function, it default assigns NaN values for the new index row. Missing values can fill by passing a value to the keyword fill_value
.
# Use reindex() function to reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 22000 30days 1000
index NaN NaN NaN NaN
r3 Hadoop 24000 40days 2500
r4 Pandas 26000 60days 1400
4. Fill Missing Values with reindex()
Notice, we have NaN values present in the new rows after reindexing, we can use the argument fill_value
to the function to fill values instead on NaN. We can fill the NaN/null values using the parameter fill_value="Java"
in the DataFrame.reindex()
function.
# Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 22000 30days 1000
index Java Java Java Java
r3 Hadoop 24000 40days 2500
r4 Pandas 26000 60days 1400
5. Pandas Reindex DataFrame Columns
We can also reindex the column of the Pandas DataFrame using the DataFrame.reindex()
function. The specified column index, which is not in the original DataFrame will be filled by the NaN values automatically. For example.
# Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])
print(df2)
# Use reindex the DataFrame
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])
print(df2)
Yields below output.
# Output:
Courses Fee Duration Percentage
r1 Spark 22000 30days NaN
r2 PySpark 25000 50days NaN
r3 Hadoop 24000 40days NaN
r4 Pandas 26000 60days NaN
We can fill the NaN/null values using the parameter fill_value=30%
in the DataFrame.reindex()
function. After changing the column name, if there is a NaN value, the NaN values will be filled by the value 30%
.
# Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")
print(df2)
Yields below output.
# Output:
Courses Fee Duration Percentage
r1 Spark 22000 30days 30%
r2 PySpark 25000 50days 30%
r3 Hadoop 24000 40days 30%
r4 Pandas 26000 60days 30%
6. Chang the Rows Order using reindex()
We can use reindex()
function to change the order of the Pandas DataFrame rows. For that we need to pass the specified list of indexes into this function, it will return DataFrame with the specified index.
# Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount
r3 Hadoop 24000 40days 2500
r4 Pandas 26000 60days 1400
r2 PySpark 25000 50days 2300
r1 Spark 22000 30days 1000
6. Complete Example of reindex() Function
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee' :['22000','25000','24000','26000'],
'Duration':['30days','50days','40days','60days'],
'Discount':['1000','2300','2500','1400']
})
df = pd.DataFrame(technologies, index = ['r1', 'r2', 'r3', 'r4'])
print(df)
# Use reindex() function to reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])
print(df2)
# Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")
print(df2)
# Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])
print(df2)
# Use reindex the DataFrame
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])
print(df2)
# Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")
print(df2)
# Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])
print(df2)
7. Conclusion
In this article, I have explained how to change reindex of a Pandas DataFrame using DataFrame.reindex()
function with examples. Pandas reindex conforms DataFrame to a new index with optional filling logic and to Place NA/NaN in locations having no value in the previous index.
Happy Learning !!
Related Articles
- Pandas Set Value to Particular Cell in DataFrame Using Index
- Pandas Set Column as Index in DataFrame
- Pandas Set Index to Column in DataFrame
- Get Count of Each Row of Pandas DataFrame
- Pandas Filter by Index
- Pandas Drop Rows by Index
- Pandas Get Index from DataFrame?
- Series.reindex()-Change the Index Order in Pandas Series
- How to Get Index of Series in Pandas
- Convert Pandas DatetimeIndex to String
- Pandas Get Column Name by Index or Position
- Pandas.Index.drop_duplicates() Explained
- How to Rename Column by Index in Pandas
- Pandas set index name to DataFrame
- Convert Pandas Index to List
- Pandas Set Index to Column in DataFrame