Site icon Spark By {Examples}

Working with MultiIndex in Pandas DataFrame

Pandas multiindex

MiltiIndex is also referred to as Hierarchical/multi-level index/advanced indexing in Pandas enables us to create an index on multiple columns and store data in an arbitrary number of dimensions. MultiIndex gives us a way to see and process data that we have never seen before and opens the door to sophisticated data analysis and manipulation with Series and DataFrame.

In this article, I will explain working on MultiIndex Pandas DataFrame with several examples like creating Multi index DataFrame, converting Multi index to columns, dropping level from multi-index e.t.c

Pandas MultiIndex Key Points –

1. Create MultiIndex pandas DataFrame (Multi level Index)

A multi-level index DataFrame is a type of DataFrame that contains multiple levels or hierarchical indexing. You can create a MultiIndex (multi-level index) in the following ways.

The following example demonstrates steps to create MultiIndexes DataFrame for both index and columns using pandas.MultiIndex.from_tuples().

Step 1: Create MultiIndex for Index


# Create MultiIndex Pandas DataFrame (Multi level Index)
import pandas as pd
multi_index = pd.MultiIndex.from_tuples([("r0", "rA"),
                                       ("r1", "rB")],
                                       names=['Courses','Fee'])

Step 2: Create Create MultiIndex for Column


cols = pd.MultiIndex.from_tuples([("Gasoline", "Toyoto"), 
                                  ("Gasoline", "Ford"), 
                                  ("Electric", "Tesla"),
                                  ("Electric", "Nio")])

Step 3: Create DataFrame


data=[[100,300, 900,400 ], [200,500, 300,600]]

df = pd.DataFrame(data, columns=cols,index=multi_index)
print("Create DataFrame:\n", df)

Yields below DataFrame with Multilevel index for rows and columns.

Pandas multiindex

2. Pandas MultiIndex to Columns

Use Pandas DataFrame.reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.


# Convert Multi-index to Columns
df2=df.reset_index()
print("Convert multi level indexes to columns:\n", df2)

Yields below output.

Pandas multiindex

If you have column names the same as Index, you will get an error. You can get over this by changing the multi-index names first.


df.index = df.index.set_names(['new_index1', 'new_index2'])

3. MultiIndex to Single Index

Sometimes you may be required to convert MultiIndex (multi-level) to a single Index. You can do this either by keeping one Index and converting the rest to columns or dropping them.


# MultiIndex to Single Index
df2 = df.reset_index(level=[1])
print("Convert multi level indexes to single index:\n", df2)

Yields below output. This example keeps one Index indx1 and transforms the indx2 into a column. If you have many indexes, you can pass them as a list to level param.


# Output:
# Convert multi level indexes to single index:
      indx2 Gasoline      Electric     
              Toyoto Ford    Tesla  Nio
indx1                                  
r0       rA      100  300      900  400
r1       rB      200  500      300  600

You can also drop the Index.


# MultiIndex to Single Index by dropping
df2 = df.reset_index(level=[1], drop=True)
print(df2)

4. Pandas Flatten MultiIndex Columns

If you noticed, our Pandas DataFrame contains MultiIndex columns, you can flatten this to a single level by accessing the level and assigning it to columns.


# Flattern MultiIndex columns
df.columns = df.columns.get_level_values(1)
print(df)

Yields below output.


# Output:
             Toyoto  Ford  Tesla  Nio
indx1 indx2                          
r0    rA        100   300    900  400
r1    rB        200   500    300  600

5. Drop Multilevel Index

You can drop levels from multi-level row or column indexes using DataFrame.columns.droplevel() and MultiIndex.droplevel() methods.

Using MultiIndex.droplevel() you can drop single or multiple levels from multi-level rows/column index. Use axis=1 param to drop columns. To drop row-level use axis=0. The below example drops the first index from DataFrame.


# Drop Index from MultiIndex
df=df.droplevel(0, axis=0) 
print(df)

Yields below output.


# Output:
       Toyoto  Ford  Tesla  Nio
indx2                          
rA        100   300    900  400
rB        200   500    300  600

6. Complete Example of Pandas MultiIndex


import pandas as pd
# Create Row Level MultiIndex 
new_index = pd.MultiIndex.from_tuples([("r0", "rA"),
                                       ("r1", "rB")],
                                       names=['indx1','indx2'])

# Create Column Level MultiIndex 
cols = pd.MultiIndex.from_tuples([("Gasoline", "Toyoto"), 
                                  ("Gasoline", "Ford"), 
                                  ("Electric", "Tesla"),
                                  ("Electric", "Nio")])

# Create MultiIndex DataFrame
data=[[100,300, 900,400 ], [200,500, 300,600]]
df = pd.DataFrame(data, columns=cols,index=new_index)
print(df)

# Convert MultiIndex to Columns
df2=df.reset_index()
print(df2)

# Convert MuliIndex to Single index
df2 = df.reset_index(level=[1])
print(df2)

# Drop Index
df2 = df.reset_index(level=[1], drop=True)
print(df2)

# Flattern MultiIndex columns
df.columns = df.columns.get_level_values(1)
print(df)

# Drop Index from MultiIndex
df=df.droplevel(0, axis=0) 
print(df)

Frequently Asked Questions of Pandas MultiIndex in Pandas?

What is a MultiIndex in Pandas?

A MultiIndex, also known as a hierarchical index, is a powerful feature in Pandas that allows you to have multiple levels of index or column labels for a DataFrame. You can use a multiindex structure to represent higher-dimensional data in a more structured way.

How do I create a MultiIndex in a DataFrame?

You can create a MultiIndex using the MultiIndex.from_arrays, MultiIndex.from_tuples, or MultiIndex.from_product methods. Alternatively, you can set a MultiIndex when creating the DataFrame using the set_index method.

How do I select data from a DataFrame with MultiIndex?

You can use the .loc attribute to select the data based on the values of the MultiIndex. For example,
df.loc[(‘Index1’, ‘Index2’)] (Select a specific row with both indices ‘Index1’ and ‘Index2)’

How do I reset the index of a DataFrame with MultiIndex?

You can use the reset_index() method to reset the index of a DataFrame with MultiIndex. This will move the index levels back to columns and generate a default integer index. For example, df_reset = df.reset_index()

How do I perform group operations on a DataFrame with MultiIndex?

You can use the groupby() method, specifying the level(s) on which you want to group the data. For example, group_data = df.groupby(level='first').sum()

Conclusion

In this article, you have learned what is Pandas MultiIndex, how to create it, how to convert the muli index to columns, flatten MultiIndex columns, drop the index, and transform it to a Single index with examples.

Happy Learning !!

References

Exit mobile version