• Post author:
  • Post category:Pandas
  • Post last modified:December 4, 2024
  • Reading time:19 mins read
You are currently viewing Pandas DataFrame reindex() Function

Pandas DataFrame.reindex() function is used to change the row indexes and the column labels. Pandas reindex conforms DataFrame to a new index with optional filling logic and to Place NA/NaN in locations having no value in the previous index.

Advertisements

Related: Pandas Reset Index from starting zero (0)

This function takes several parameters like labels, index, columns, axis, method, copy, level, fill_value, limit, and tolerance and returns a DataFrame with a changed index/reindexed.

In this article, I will explain the syntax, usage, and explanation with examples of how to use reindex() with single and multiple rows or columns of the DataFrame.

Key Points –

  • The reindex() function is used to conform a DataFrame to a new index, allowing for reordering, adding, or removing index labels.
  • The primary parameter is index, which specifies the new index labels. Additional parameters include columns for reindexing columns, and fill_value for filling in missing values.
  • You can specify which axis to reindex using the axis parameter (0 for rows, 1 for columns).
  • The reindex() function can sort the DataFrame based on the new index if the sort parameter is set to True.
  • By default, reindex() returns a new object; however, you can modify the original DataFrame in place by setting the inplace parameter to True.
  • When reindexing, existing data aligns with the new index labels. If an index label does not exist in the original DataFrame, it will be filled with NaN or a specified fill_value.

1. Quick Examples of DataFrame reindex() Function

If you are in a hurry, below are some quick examples of how to use reindex() function in DataFrame.


# Quick examples of DataFrame reindex() function

# Example 1: Use reindex() function 
# To reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])

# Example 2: Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")

# Example 3: Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])

# Example 4: Use reindex the dataframe
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])

# Example 5: Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")

# Example 6: Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])

2. Syntax of DataFrame.reindex()

Following is the syntax of DataFrame.reindex() function.


# Syntax of Pandas DataFrame.reindex()
DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=None, level=None, fill_value=nan, limit=None, tolerance=None)

2.1 Parameters of reindex()

Following are the parameters of pandas reindex() function.

  • labels – New labels/index to conform to the axis that is specified by ‘axis’.
  • index, columns – This is also an optional parameter that refers to the new labels/index. It generally prefers an index object for avoiding duplicate data.
  • axis – This is an optional parameter that axis to target. Can be either the axis name (‘index’, ‘columns’) or numbers (0, 1).
  • method – {None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}, optional, default None. Used for filling the holes in the reindexed DataFrame. For increasing/decreasing indexes only.
  • copy – bool, default True: Whether to return a new object (a copy), even if the passed indexes are the same.
  • level – int or name: It is used to broadcast across the level, matching Index values on the passed MultiIndex level.
  • fill_value – Its default value is np.NaN and used to fill existing missing (NaN) values. and any new element needed for successful DataFrame alignment, with this value before computation.
  • limit – It defines the maximum number of consecutive elements to forward or backward fill.
  • tolerance – This is also an optional parameter that determines the maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] – target) <= tolerance.

2.2 Return value of reindex()

It returns a DataFrame with a changed index/reindexed.

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFeeDuration and Discount.


# Return value of reindex()  
import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']
              })
df = pd.DataFrame(technologies, index = ['r1', 'r2', 'r3', 'r4'])
print(df)

Yields below output.


# Output:
    Courses    Fee Duration Discount
r1    Spark  22000   30days     1000
r2  PySpark  25000   50days     2300
r3   Hadoop  24000   40days     2500
r4   Pandas  26000   60days     1400

3. Pandas Reindex DataFrame Rows

We can use Pandas reindex() function to reindex the DataFrame. When we reindex a row using reindex() function, it default assigns NaN values for the new index row. Missing values can fill by passing a value to the keyword fill_value.


# Use reindex() function to reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])
print(df2)

Yields below output.


# Output:
       Courses    Fee Duration Discount
r1       Spark  22000   30days     1000
index     NaN    NaN      NaN      NaN
r3      Hadoop  24000   40days     2500
r4      Pandas  26000   60days     1400

4. Fill Missing Values with reindex()

Notice, we have NaN values present in the new rows after reindexing, we can use the argument fill_value to the function to fill values instead on NaN. We can fill the NaN/null values using the parameter fill_value="Java" in the DataFrame.reindex() function.


# Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")
print(df2)

Yields below output.


# Output:
      Courses    Fee Duration Discount
r1      Spark  22000   30days     1000
index    Java   Java     Java     Java
r3     Hadoop  24000   40days     2500
r4     Pandas  26000   60days     1400

5. Pandas Reindex DataFrame Columns

We can also reindex the column of the Pandas DataFrame using the DataFrame.reindex() function. The specified column index, which is not in the original DataFrame will be filled by the NaN values automatically. For example.


# Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])
print(df2)

# Use reindex the DataFrame
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])
print(df2)

Yields below output.


# Output:
    Courses    Fee Duration  Percentage
r1    Spark  22000   30days         NaN
r2  PySpark  25000   50days         NaN
r3   Hadoop  24000   40days         NaN
r4   Pandas  26000   60days         NaN

We can fill the NaN/null values using the parameter fill_value=30% in the DataFrame.reindex() function. After changing the column name, if there is a NaN value, the NaN values will be filled by the value 30%.


# Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")
print(df2)

Yields below output.


# Output:
    Courses    Fee Duration Percentage
r1    Spark  22000   30days        30%
r2  PySpark  25000   50days        30%
r3   Hadoop  24000   40days        30%
r4   Pandas  26000   60days        30%

6. Chang the Rows Order using reindex()

We can use reindex() function to change the order of the Pandas DataFrame rows. For that we need to pass the specified list of indexes into this function, it will return DataFrame with the specified index.


# Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])
print(df2)

Yields below output.


# Output:
    Courses    Fee Duration Discount
r3   Hadoop  24000   40days     2500
r4   Pandas  26000   60days     1400
r2  PySpark  25000   50days     2300
r1    Spark  22000   30days     1000

6. Complete Example of reindex() Function


import pandas as pd
import numpy as np
technologies= ({
    'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']
              })
df = pd.DataFrame(technologies, index = ['r1', 'r2', 'r3', 'r4'])
print(df)

# Use reindex() function to reindex pandas dataframe
df2 = df.reindex(['r1', 'index', 'r3','r4'])
print(df2)

# Filling the missing values with Java
df2 = df.reindex(['r1', 'index', 'r3','r4'], fill_value = "Java")
print(df2)

# Use reindex() function to reindex column axis
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"])
print(df2)

# Use reindex the DataFrame
df2 = df.reindex(index=['r1', 'r2','r3','r4'], columns=["Courses", "Fee", "Duration", "Percentage"])
print(df2)

# Reindex the columns and fill in the missing values
df2 = df.reindex(columns =["Courses", "Fee", "Duration", "Percentage"], fill_value = "30%")
print(df2)

# Change reindex() function in pandas dataframe
df2 = df.reindex(['r3', 'r4','r2','r1'])
print(df2)

FAQ on Pandas DataFrame reindex() Function

What is the purpose of the reindex() function in Pandas?

The reindex() function is used to change the index or columns of a DataFrame to match a specified set of labels, adding missing labels or removing existing ones as needed.

How do I reindex rows in a DataFrame?

To reindex rows in a Pandas DataFrame, use the reindex() method and specify the desired sequence of row labels.

Can I reindex both rows and columns simultaneously?

You can reindex both rows and columns simultaneously in a Pandas DataFrame by specifying values for the index and columns parameters in the reindex() method.

How can I avoid adding missing labels?

Use the reindex() method with the fill_value parameter to specify a value (other than NaN) to be used for missing labels.

How does reindex() differ from set_index()?

reindex() changes or adjusts the existing index/columns to match a specified sequence, while set_index() replaces the index with one or more columns from the DataFrame.

How is reindex_like() different from reindex()?

The reindex_like() method reindexes a DataFrame to match the structure (index and columns) of another DataFrame, simplifying the process when copying structures.

Conclusion

In this article, I have explained how to change reindex of a Pandas DataFrame using DataFrame.reindex() function with examples. Pandas reindex conforms DataFrame to a new index with optional filling logic and to Place NA/NaN in locations having no value in the previous index.

Happy Learning !!

Related Articles

References