Working with MultiIndex in pandas DataFrame

  • Post author:
  • Post category:Pandas
  • Post last modified:October 5, 2023

MiltiIndex is also referred to as Hierarchical/multi-level index/advanced indexing in pandas enables us to create an index on multiple columns and store data in an arbitrary number of dimensions. MultiIndex gives us a way to see and process data that we have never seen before and opens the door to sophisticated data analysis and manipulation with Series and DataFrame.

In this article, I will explain working on MultiIndex pandas DataFrame with several examples like creating Multi index DataFrame, converting Multi index to columns, dropping level from multi-index e.t.c

pandas MultiIndex Key Points –

  • MultiIndex is an array of tuples where each tuple is unique.
  • You can create MultiIndex from list of arrays, arry of tuples, dataframe e.t.c
  • The Index constructor will attempt to return a MultiIndex when it is passed a list of tuples.
  • You can have Multi-level for both Index and Column labels.
  • Multi-level columns are used when you wanted to group columns together.

1. Create MultiIndex pandas DataFrame (Multi level Index)

A multi-level index DataFrame is a type of DataFrame that contains multiple level or hierarchical indexing. You can create a MultiIndex (multi-level index) in the following ways.

  • From a list of arrays using  MultiIndex.from_arrays()
  • From an array of tuples using MultiIndex.from_tuples()
  • From a crossed set of iterables using MultiIndex.from_product()
  • From DataFrame using MultiIndex.from_frame()

The following example demonstrates steps to create MultiIndexes DataFrame for both index and columns using pandas.MultiIndex.from_tuples().

Step 1: Create MultiIndex for Index


# Create MultiIndex pandas DataFrame (Multi level Index)
import pandas as pd
multi_index = pd.MultiIndex.from_tuples([("r0", "rA"),
                                       ("r1", "rB")],
                                       names=['Courses','Fee'])

Step 2: Create Create MultiIndex for Column


cols = pd.MultiIndex.from_tuples([("Gasoline", "Toyoto"), 
                                  ("Gasoline", "Ford"), 
                                  ("Electric", "Tesla"),
                                  ("Electric", "Nio")])

Step 3: Create DataFrame


data=[[100,300, 900,400 ], [200,500, 300,600]]

df = pd.DataFrame(data, columns=cols,index=multi_index)
print(df)

Yields below DataFrame with Multilevel index for rows and columns.

pandas multiIndex columns
pandas MultiIndex

2. pandas MultiIndex to Columns

Use pandas DataFrame.reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.


# Convert Multi-index to Columns
df2=df.reset_index()
print(df2)

Yields below output.


# Output:
  indx1 indx2 Gasoline      Electric     
                Toyoto Ford    Tesla  Nio
0    r0    rA      100  300      900  400
1    r1    rB      200  500      300  600

If you have column names the same as Index, you will get an error. You can get over this by changing the multi-index names first.


df.index = df.index.set_names(['new_index1', 'new_index2'])

3. MultiIndex to Single Index

Sometimes you may require to convert MultiIndex (multi-level) to a single Index. You can do this either by keeping one Index and converting the rest to columns or dropping them.


# MultiIndex to Single Index
df2 = df.reset_index(level=[1])
print(df2)

Yields below output. This example keeps one Index indx1 and transforms the indx2 to column. If you have many indexes, you can pass them as a list to level param.


# Output:
      indx2 Gasoline      Electric     
              Toyoto Ford    Tesla  Nio
indx1                                  
r0       rA      100  300      900  400
r1       rB      200  500      300  600

You can also drop the Index.


# MultiIndex to Single Index by dropping
df2 = df.reset_index(level=[1], drop=True)
print(df2)

4. pandas Flatten MultiIndex Columns

If you noticed, our pandas DataFrame contains MultiIndex columns, you can flatten this to a single level by accessing the level and assigning it to columns.


# Flattern MultiIndex columns
df.columns = df.columns.get_level_values(1)
print(df)

Yields below output.


# Output:
             Toyoto  Ford  Tesla  Nio
indx1 indx2                          
r0    rA        100   300    900  400
r1    rB        200   500    300  600

5. Drop Multilevel Index

You can drop levels from multi-level row or column index using DataFrame.columns.droplevel() and MultiIndex.droplevel() methods.

Using MultiIndex.droplevel() you can drop single or more levels from multi-level rows/column index. Use axis=1 param to drop columns. To drop row-level use axis=0. The below example drops first index from DataFrame.


# Drop Index from MultiIndex
df=df.droplevel(0, axis=0) 
print(df)

Yields below output.


# Output:
       Toyoto  Ford  Tesla  Nio
indx2                          
rA        100   300    900  400
rB        200   500    300  600

6. Complete Example of pandas MultiIndex


import pandas as pd
# Create Row Level MultiIndex 
new_index = pd.MultiIndex.from_tuples([("r0", "rA"),
                                       ("r1", "rB")],
                                       names=['indx1','indx2'])

# Create Column Level MultiIndex 
cols = pd.MultiIndex.from_tuples([("Gasoline", "Toyoto"), 
                                  ("Gasoline", "Ford"), 
                                  ("Electric", "Tesla"),
                                  ("Electric", "Nio")])

# Create MultiIndex DataFrame
data=[[100,300, 900,400 ], [200,500, 300,600]]
df = pd.DataFrame(data, columns=cols,index=new_index)
print(df)

# Convert MultiIndex to Columns
df2=df.reset_index()
print(df2)

# Convert MuliIndex to Single index
df2 = df.reset_index(level=[1])
print(df2)

# Drop Index
df2 = df.reset_index(level=[1], drop=True)
print(df2)

# Flattern MultiIndex columns
df.columns = df.columns.get_level_values(1)
print(df)

# Drop Index from MultiIndex
df=df.droplevel(0, axis=0) 
print(df)

Conclusion

In this article, you have learned what is pandas MultiIndex, how to create it, converting muli index to columns, flatten MultiIndex columns, drop the index, and transform to a Single index with examples.

Happy Learning !!

References

Naveen

I am a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, I have honed my expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. My journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. I have started this SparkByExamples.com to share my experiences with the data as I come across. You can learn more about me at LinkedIn

Leave a Reply

You are currently viewing Working with MultiIndex in pandas DataFrame