Pandas DataFrame reindex() Function

  • Post author:
  • Post category:Pandas
  • Post last modified:January 10, 2024
  • Reading time:13 mins read

Pandas DataFrame.reindex() function is used to change the row indexes and the column labels. Pandas reindex conforms DataFrame to a new index with optional filling logic and to Place NA/NaN in locations having no value in the previous index.

Related: Pandas Reset Index from starting zero (0)

This function takes several parameters like labels, index, columns, axis, method, copy, level, fill_value, limit, and tolerance and returns a DataFrame with a changed index/reindexed.

In this article, I will explain the syntax, usage, and explanation with examples of how to use reindex() with single and multiple rows or columns of the DataFrame.

1. Quick Examples of DataFrame reindex() Function

If you are in a hurry, below are some quick examples of how to use reindex() function in DataFrame.


# Below are the quick examples

# Example 1: Use reindex() function 
# To reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])

# Example 2: Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")

# Example 3: Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])

# Example 4: Use reindex the DataFrame
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])

# Example 5: Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")

# Example 6: Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])

2. Syntax of DataFrame.reindex()

Following is the syntax of DataFrame.reindex() function.


# Syntax of Pandas DataFrame.reindex()
DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=nan, limit=None, tolerance=None)

2.1 Parameters of reindex()

Following are the parameters of pandas reindex() function.

  • labels – New labels/index to conform to the axis that is specified by ‘axis’.
  • index, columns – This is also an optional parameter that refers to the new labels/index. It generally prefers an index object for avoiding duplicate data.
  • axis – This is an optional parameter that axis to target. Can be either the axis name (‘index’, ‘columns’) or numbers (0, 1).
  • method – {None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}, optional, default None. Used for filling the holes in the reindexed DataFrame. For increasing/decreasing indexes only.
  • copy – bool, default True: Whether to return a new object (a copy), even if the passed indexes are the same.
  • level – int or name: It is used to broadcast across the level, matching Index values on the passed MultiIndex level.
  • fill_value – Its default value is np.NaN and used to fill existing missing (NaN) values. and any new element needed for successful DataFrame alignment, with this value before computation.
  • limit – It defines the maximum number of consecutive elements to forward or backward fill.
  • tolerance – This is also an optional parameter that determines the maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] – target) <= tolerance.

2.2 Return value of reindex()

It returns a DataFrame with a changed index/reindexed.

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFeeDuration and Discount.


# Return value of reindex()  
import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']
              })
df = pd.DataFrame(technologies, index = ['r1', 'r2', 'r3', 'r4'])
print(df)

Yields below output.


# Output:
    Courses    Fee Duration Discount
r1    Spark  22000   30days     1000
r2  PySpark  25000   50days     2300
r3   Hadoop  24000   40days     2500
r4   Pandas  26000   60days     1400

3. Pandas Reindex DataFrame Rows

We can use Pandas reindex() function to reindex the DataFrame. When we reindex a row using reindex() function, it default assigns NaN values for the new index row. Missing values can fill by passing a value to the keyword fill_value.


# Use reindex() function to reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])
print(df2)

Yields below output.


# Output:
       Courses    Fee Duration Discount
r1       Spark  22000   30days     1000
index     NaN    NaN      NaN      NaN
r3      Hadoop  24000   40days     2500
r4      Pandas  26000   60days     1400

4. Fill Missing Values with reindex()

Notice, we have NaN values present in the new rows after reindexing, we can use the argument fill_value to the function to fill values instead on NaN. We can fill the NaN/null values using the parameter fill_value="Java" in the DataFrame.reindex() function.


# Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")
print(df2)

Yields below output.


# Output:
      Courses    Fee Duration Discount
r1      Spark  22000   30days     1000
index    Java   Java     Java     Java
r3     Hadoop  24000   40days     2500
r4     Pandas  26000   60days     1400

5. Pandas Reindex DataFrame Columns

We can also reindex the column of the Pandas DataFrame using the DataFrame.reindex() function. The specified column index, which is not in the original DataFrame will be filled by the NaN values automatically. For example.


# Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])
print(df2)

# Use reindex the DataFrame
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])
print(df2)

Yields below output.


# Output:
    Courses    Fee Duration  Percentage
r1    Spark  22000   30days         NaN
r2  PySpark  25000   50days         NaN
r3   Hadoop  24000   40days         NaN
r4   Pandas  26000   60days         NaN

We can fill the NaN/null values using the parameter fill_value=30% in the DataFrame.reindex() function. After changing the column name, if there is a NaN value, the NaN values will be filled by the value 30%.


# Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")
print(df2)

Yields below output.


# Output:
    Courses    Fee Duration Percentage
r1    Spark  22000   30days        30%
r2  PySpark  25000   50days        30%
r3   Hadoop  24000   40days        30%
r4   Pandas  26000   60days        30%

6. Chang the Rows Order using reindex()

We can use reindex() function to change the order of the Pandas DataFrame rows. For that we need to pass the specified list of indexes into this function, it will return DataFrame with the specified index.


# Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])
print(df2)

Yields below output.


# Output:
    Courses    Fee Duration Discount
r3   Hadoop  24000   40days     2500
r4   Pandas  26000   60days     1400
r2  PySpark  25000   50days     2300
r1    Spark  22000   30days     1000

6. Complete Example of reindex() Function


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']
              })
df = pd.DataFrame(technologies, index = ['r1', 'r2', 'r3', 'r4'])
print(df)

# Use reindex() function to reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])
print(df2)

# Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")
print(df2)

# Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])
print(df2)

# Use reindex the DataFrame
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])
print(df2)

# Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")
print(df2)

# Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])
print(df2)

7. Conclusion

In this article, I have explained how to change reindex of a Pandas DataFrame using DataFrame.reindex() function with examples. Pandas reindex conforms DataFrame to a new index with optional filling logic and to Place NA/NaN in locations having no value in the previous index.

Happy Learning !!

Related Articles

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply