Pandas – Convert Index to Column in DataFrame

You can use reset_index() to create/convert the index/multi-index to a column of pandas DataFrame. Besides this, there are other ways as well. If you are not aware by default, pandas add an index to each row of the pandas DataFrame. This index value starts with zero for the first row and increments by 1 for each row (sequence index value for each row). You can also assign a custom index to DataFrame according to your need.

In this article, I will explain different ways to convert the index to the DataFrame column with examples like adding default index and custom index as a column to DataFrame.

1. Quick Examples of Convert Index to Column in pandas DataFrame

Below are some useful examples of how to convert an index to a column in pandas DataFrame.


# Create a new column with index values
df['index_column'] = df.index

#using reset_index() to convert index into column
df2=df.reset_index()

#Using reset_index, inplace=True
df.reset_index(inplace=True)

# When you have multiple index 
# Specify which index you wanted to convert to column
df.reset_index(inplace=True, level=['Fee'])

# Preserve Index as-is and add one index as column
df2=df.assign(Fee=df.index.get_level_values('Fee'))

# Preserve Index as-is and add one index as column
# Using assign operator.
df['Fee'] = df.index.get_level_values('Fee') 

Let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30day','40days','35days','40days','60days','50days','55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000]
              }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  20000    30day      1000
1  PySpark  25000   40days      2300
2   Hadoop  26000   35days      1500
3   Python  22000   40days      1200
4   pandas  24000   60days      2500
5   Oracle  21000   50days      2100
6     Java  22000   55days      2000

2. Create a New Index Column in DataFrame

One simple way to convert an index to a column is by assigning an index as a new column to DataFrame. DataFrame.index property returns a Series object of an index, assign this to DataFrame as shown below which creates a new column with index value on DataFrame.


# Create a new column with index values
df['index'] = df.index
print(df)

Yields below output. It adds a new column index_column with index values to DataFrame.


    Courses    Fee Duration  Discount index_column
0     Spark  20000    30day      1000            0
1   PySpark  25000   40days      2300            1
2    Hadoop  26000   35days      1500            2
3    Python  22000   40days      1200            3
4    pandas  24000   60days      2500            4
5    Oracle  21000   50days      2100            5
6      Java  22000   55days      2000            6

You can also add a new custom column as an index and convert it to a column


# Create DataFrame with custom index
index=['r0','r1','r2','r3','r4','r5','r6']
df = pd.DataFrame(technologies,index=index)
df['index_column'] = df.index
print(df)

Results


    Courses    Fee Duration  Discount index_column
r0    Spark  20000    30day      1000           r0
r1  PySpark  25000   40days      2300           r1
r2   Hadoop  26000   35days      1500           r2
r3   Python  22000   40days      1200           r3
r4   pandas  24000   60days      2500           r4
r5   Oracle  21000   50days      2100           r5
r6     Java  22000   55days      2000           r6

3. Convert the Index to Column Using DataFrame.reset_index()

Another way is by using the Dataframe.reset_index() function to convert the index as a column in Pandas DataFrame.


# Using reset_index to convert index to column
df = pd.DataFrame(technologies,index=index)
df2=df.reset_index()
print(df2)

Yields below output. This adds a new column index to DataFrame and returns a copy of the DataFrame instead of updating the existing DataFrame.


  index  Courses    Fee Duration  Discount
0    r0    Spark  20000    30day      1000
1    r1  PySpark  25000   40days      2300
2    r2   Hadoop  26000   35days      1500
3    r3   Python  22000   40days      1200
4    r4   pandas  24000   60days      2500
5    r5   Oracle  21000   50days      2100
6    r6     Java  22000   55days      2000

Use inplace=True parameter to reflects the change in the DataFrame to stay permanent. If you have multiple indexes, this converts all index levels to columns.


# Convert the index to column using reset_index
df.reset_index(inplace=True)
print(df)

Yields same output as above.

4. Convert the Index to Column Using Rename_Axis and Reset_Index

You can also convert the index to the column using rename_axis() and reset_index() function. You should first rename your index to the custom label and then apply reset_index().


# Using Rename_Axis and Reset_Index
df = pd.DataFrame(technologies,index=index)
df2 = df.rename_axis('index_column').reset_index()
print(df2)

Yields below output.


  index_column  Courses    Fee Duration  Discount
0           r0    Spark  20000    30day      1000
1           r1  PySpark  25000   40days      2300
2           r2   Hadoop  26000   35days      1500
3           r3   Python  22000   40days      1200
4           r4   pandas  24000   60days      2500
5           r5   Oracle  21000   50days      2100
6           r6     Java  22000   55days      2000

5. Convert Multi_Index to Multiple Columns

Let’s look at another DataFrame with two levels in its index or multi-index.


# Create new DataFrame with multi index
import pandas as pd
new_index = pd.MultiIndex.from_tuples([("Spark", 20000),
                                       ("PySpark", 25000),
                                       ("Hadoop", 26000),
                                       ("Python", 22000),
                                       ("pandas", 24000),
                                       ("Oracle",21000),
                                       ("Java", 23000)],
                                       names=['Courses','Fee'])
Data = {
    'Duration':['30day','40days','35days','40days','60days','50days','55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000],
    'Address':["Mumbai","Delhi","Hyderabad","Bangalore","Chennai","Kolkata","Lucknow"]
       }
df = pd.DataFrame(Data, columns = ['Duration','Discount','Address'],
                  index=new_index)
print(df)

Yields below results with two Indexes Courses and Fee


              Duration  Discount    Address
Courses Fee                                
Spark   20000    30day      1000     Mumbai
PySpark 25000   40days      2300      Delhi
Hadoop  26000   35days      1500  Hyderabad
Python  22000   40days      1200  Bangalore
pandas  24000   60days      2500    Chennai
Oracle  21000   50days      2100    Kolkata
Java    23000   55days      2000    Lucknow

Let’s convert these multi-indexes as columns to DataFrame. The reset_index() method with the default parameters converts all index levels to columns and uses a simple RangeIndex as a new index.


# Convert all multi indexes to columns
df.reset_index(inplace=True)

Yields below output. As you see it converted indexes Courses and Fee as DataFrame columns.


   Courses    Fee Duration  Discount    Address
0    Spark  20000    30day      1000     Mumbai
1  PySpark  25000   40days      2300      Delhi
2   Hadoop  26000   35days      1500  Hyderabad
3   Python  22000   40days      1200  Bangalore
4   pandas  24000   60days      2500    Chennai
5   Oracle  21000   50days      2100    Kolkata
6     Java  23000   55days      2000    Lucknow

6. Using Multiple Columns to Convert Multi_Index and Level

Use the level parameter to control which index to convert when you have multiple indexes on DataFrame. If possible, use the level name, which is more explicit.


# Using multiple columns to convert multi_index and level
df = pd.DataFrame(data,index=new_index)
df.reset_index(inplace=True, level=['Fee'])
print(df)

Yields below output


           Fee Duration  Discount    Address
Courses                                     
Spark    20000    30day      1000     Mumbai
PySpark  25000   40days      2300      Delhi
Hadoop   26000   35days      1500  Hyderabad
Python   22000   40days      1200  Bangalore
pandas   24000   60days      2500    Chennai
Oracle   21000   50days      2100    Kolkata
Java     23000   55days      2000    Lucknow

In case if you wanted to persist the index as-is and add one of the indexes as a column


df = pd.DataFrame(data,index=new_index)
df2=df.assign(Fee=df.index.get_level_values('Fee'))
print(df2)

# Also use
df['Fee'] = df.index.get_level_values('Fee')
print(df) 

Yields below output.


              Duration  Discount    Address    Fee
Courses Fee                                       
Spark   20000    30day      1000     Mumbai  20000
PySpark 25000   40days      2300      Delhi  25000
Hadoop  26000   35days      1500  Hyderabad  26000
Python  22000   40days      1200  Bangalore  22000
pandas  24000   60days      2500    Chennai  24000
Oracle  21000   50days      2100    Kolkata  21000
Java    23000   55days      2000    Lucknow  23000

7. Compelte Example


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30day','40days','35days','40days','60days','50days','55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000]
              }

df = pd.DataFrame(technologies)
print(df)

# Create a new index
df['index_column'] = df.index
print(df)

index=['r0','r1','r2','r3','r4','r5','r6']
df = pd.DataFrame(technologies,index=index)
df['index_column'] = df.index
print(df)

#using reset_index
df = pd.DataFrame(technologies,index=index)
df2=df.reset_index()
print(df2)

#Using reset_index, inplace=True
df = pd.DataFrame(technologies,index=index)
df.reset_index(inplace=True)
print(df)


# Using rename_axis() and reset_index()
df = pd.DataFrame(technologies,index=index)
df = df.rename_axis('index_column').reset_index()
print(df)

# Create new DataFrame with multi index
import pandas as pd
new_index = pd.MultiIndex.from_tuples([("Spark", 20000),
                                       ("PySpark", 25000),
                                       ("Hadoop", 26000),
                                       ("Python", 22000),
                                       ("pandas", 24000),
                                       ("Oracle",21000),
                                       ("Java", 23000)],
                                       names=['Courses','Fee'])
data = {
    'Duration':['30day','40days','35days','40days','60days','50days','55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000],
    'Address':["Mumbai","Delhi","Hyderabad","Bangalore","Chennai","Kolkata","Lucknow"]
       }
df = pd.DataFrame(data, columns = ['Duration','Discount','Address'],
                  index=new_index)
print(df)

# convert multi_index
df.reset_index(inplace=True)
print(df)

# Using multiple columns to convert multi_index and level
df = pd.DataFrame(data,index=new_index)
df.reset_index(inplace=True, level=['Fee'])
print(df)

Conclusion

In this article, you have learned how to convert an index to a column in pandas DataFrame using index(), rename_axis() and reset_index() function. Also, you have learned how to convert multiple indexes into columns use these methods.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

Pandas – Convert Index to Column in DataFrame