• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:15 mins read
You are currently viewing Working with MultiIndex in Pandas DataFrame

MiltiIndex is also referred to as Hierarchical/multi-level index/advanced indexing in Pandas enables us to create an index on multiple columns and store data in an arbitrary number of dimensions. MultiIndex gives us a way to see and process data that we have never seen before and opens the door to sophisticated data analysis and manipulation with Series and DataFrame.

Advertisements

In this article, I will explain working on MultiIndex Pandas DataFrame with several examples like creating Multi index DataFrame, converting Multi index to columns, dropping level from multi-index e.t.c

Pandas MultiIndex Key Points –

  • MultiIndex is an array of tuples where each tuple is unique.
  • You can create MultiIndex from a list of arrays, array of tuples, DataFrame e.t.c
  • The Index constructor will attempt to return a MultiIndex when it is passed a list of tuples.
  • You can have Multi-level for both Index and Column labels.
  • Multi-level columns are used when you want to group columns together.

1. Create MultiIndex pandas DataFrame (Multi level Index)

A multi-level index DataFrame is a type of DataFrame that contains multiple levels or hierarchical indexing. You can create a MultiIndex (multi-level index) in the following ways.

  • From a list of arrays using  MultiIndex.from_arrays()
  • From an array of tuples using MultiIndex.from_tuples()
  • From a crossed set of iterables using MultiIndex.from_product()
  • From DataFrame using MultiIndex.from_frame()

The following example demonstrates steps to create MultiIndexes DataFrame for both index and columns using pandas.MultiIndex.from_tuples().

Step 1: Create MultiIndex for Index


# Create MultiIndex Pandas DataFrame (Multi level Index)
import pandas as pd
multi_index = pd.MultiIndex.from_tuples([("r0", "rA"),
                                       ("r1", "rB")],
                                       names=['Courses','Fee'])

Step 2: Create Create MultiIndex for Column


cols = pd.MultiIndex.from_tuples([("Gasoline", "Toyoto"), 
                                  ("Gasoline", "Ford"), 
                                  ("Electric", "Tesla"),
                                  ("Electric", "Nio")])

Step 3: Create DataFrame


data=[[100,300, 900,400 ], [200,500, 300,600]]

df = pd.DataFrame(data, columns=cols,index=multi_index)
print("Create DataFrame:\n", df)

Yields below DataFrame with Multilevel index for rows and columns.

Pandas multiindex

2. Pandas MultiIndex to Columns

Use Pandas DataFrame.reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.


# Convert Multi-index to Columns
df2=df.reset_index()
print("Convert multi level indexes to columns:\n", df2)

Yields below output.

Pandas multiindex

If you have column names the same as Index, you will get an error. You can get over this by changing the multi-index names first.


df.index = df.index.set_names(['new_index1', 'new_index2'])

3. MultiIndex to Single Index

Sometimes you may be required to convert MultiIndex (multi-level) to a single Index. You can do this either by keeping one Index and converting the rest to columns or dropping them.


# MultiIndex to Single Index
df2 = df.reset_index(level=[1])
print("Convert multi level indexes to single index:\n", df2)

Yields below output. This example keeps one Index indx1 and transforms the indx2 into a column. If you have many indexes, you can pass them as a list to level param.


# Output:
# Convert multi level indexes to single index:
      indx2 Gasoline      Electric     
              Toyoto Ford    Tesla  Nio
indx1                                  
r0       rA      100  300      900  400
r1       rB      200  500      300  600

You can also drop the Index.


# MultiIndex to Single Index by dropping
df2 = df.reset_index(level=[1], drop=True)
print(df2)

4. Pandas Flatten MultiIndex Columns

If you noticed, our Pandas DataFrame contains MultiIndex columns, you can flatten this to a single level by accessing the level and assigning it to columns.


# Flattern MultiIndex columns
df.columns = df.columns.get_level_values(1)
print(df)

Yields below output.


# Output:
             Toyoto  Ford  Tesla  Nio
indx1 indx2                          
r0    rA        100   300    900  400
r1    rB        200   500    300  600

5. Drop Multilevel Index

You can drop levels from multi-level row or column indexes using DataFrame.columns.droplevel() and MultiIndex.droplevel() methods.

Using MultiIndex.droplevel() you can drop single or multiple levels from multi-level rows/column index. Use axis=1 param to drop columns. To drop row-level use axis=0. The below example drops the first index from DataFrame.


# Drop Index from MultiIndex
df=df.droplevel(0, axis=0) 
print(df)

Yields below output.


# Output:
       Toyoto  Ford  Tesla  Nio
indx2                          
rA        100   300    900  400
rB        200   500    300  600

6. Complete Example of Pandas MultiIndex


import pandas as pd
# Create Row Level MultiIndex 
new_index = pd.MultiIndex.from_tuples([("r0", "rA"),
                                       ("r1", "rB")],
                                       names=['indx1','indx2'])

# Create Column Level MultiIndex 
cols = pd.MultiIndex.from_tuples([("Gasoline", "Toyoto"), 
                                  ("Gasoline", "Ford"), 
                                  ("Electric", "Tesla"),
                                  ("Electric", "Nio")])

# Create MultiIndex DataFrame
data=[[100,300, 900,400 ], [200,500, 300,600]]
df = pd.DataFrame(data, columns=cols,index=new_index)
print(df)

# Convert MultiIndex to Columns
df2=df.reset_index()
print(df2)

# Convert MuliIndex to Single index
df2 = df.reset_index(level=[1])
print(df2)

# Drop Index
df2 = df.reset_index(level=[1], drop=True)
print(df2)

# Flattern MultiIndex columns
df.columns = df.columns.get_level_values(1)
print(df)

# Drop Index from MultiIndex
df=df.droplevel(0, axis=0) 
print(df)

Frequently Asked Questions of Pandas MultiIndex in Pandas?

What is a MultiIndex in Pandas?

A MultiIndex, also known as a hierarchical index, is a powerful feature in Pandas that allows you to have multiple levels of index or column labels for a DataFrame. You can use a multiindex structure to represent higher-dimensional data in a more structured way.

How do I create a MultiIndex in a DataFrame?

You can create a MultiIndex using the MultiIndex.from_arrays, MultiIndex.from_tuples, or MultiIndex.from_product methods. Alternatively, you can set a MultiIndex when creating the DataFrame using the set_index method.

How do I select data from a DataFrame with MultiIndex?

You can use the .loc attribute to select the data based on the values of the MultiIndex. For example,
df.loc[(‘Index1’, ‘Index2’)] (Select a specific row with both indices ‘Index1’ and ‘Index2)’

How do I reset the index of a DataFrame with MultiIndex?

You can use the reset_index() method to reset the index of a DataFrame with MultiIndex. This will move the index levels back to columns and generate a default integer index. For example, df_reset = df.reset_index()

How do I perform group operations on a DataFrame with MultiIndex?

You can use the groupby() method, specifying the level(s) on which you want to group the data. For example, group_data = df.groupby(level='first').sum()

Conclusion

In this article, you have learned what is Pandas MultiIndex, how to create it, how to convert the muli index to columns, flatten MultiIndex columns, drop the index, and transform it to a Single index with examples.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium