MiltiIndex is also referred to as Hierarchical/multi-level index/advanced indexing in Pandas enables us to create an index on multiple columns and store data in an arbitrary number of dimensions. MultiIndex gives us a way to see and process data that we have never seen before and opens the door to sophisticated data analysis and manipulation with Series and DataFrame.
In this article, I will explain working on MultiIndex Pandas DataFrame with several examples like creating Multi index DataFrame, converting Multi index to columns, dropping level from multi-index e.t.c
Pandas MultiIndex Key Points –
MultiIndex
is an array of tuples where each tuple is unique.- You can create MultiIndex from a list of arrays, array of tuples, DataFrame e.t.c
- The
Index
constructor will attempt to return aMultiIndex
when it is passed a list of tuples. - You can have Multi-level for both Index and Column labels.
- Multi-level columns are used when you want to group columns together.
1. Create MultiIndex pandas DataFrame (Multi level Index)
A multi-level index DataFrame is a type of DataFrame that contains multiple levels or hierarchical indexing. You can create a MultiIndex (multi-level index) in the following ways.
- From a list of arrays using
MultiIndex.from_arrays()
- From an array of tuples using
MultiIndex.from_tuples()
- From a crossed set of iterables using
MultiIndex.from_product()
- From DataFrame using
MultiIndex.from_frame()
The following example demonstrates steps to create MultiIndexes DataFrame for both index and columns using pandas.MultiIndex.from_tuples()
.
Step 1: Create MultiIndex for Index
# Create MultiIndex Pandas DataFrame (Multi level Index)
import pandas as pd
multi_index = pd.MultiIndex.from_tuples([("r0", "rA"),
("r1", "rB")],
names=['Courses','Fee'])
Step 2: Create Create MultiIndex for Column
cols = pd.MultiIndex.from_tuples([("Gasoline", "Toyoto"),
("Gasoline", "Ford"),
("Electric", "Tesla"),
("Electric", "Nio")])
Step 3: Create DataFrame
data=[[100,300, 900,400 ], [200,500, 300,600]]
df = pd.DataFrame(data, columns=cols,index=multi_index)
print("Create DataFrame:\n", df)
Yields below DataFrame with Multilevel index for rows and columns.
2. Pandas MultiIndex to Columns
Use Pandas DataFrame.reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False
which will keep the index values as columns and set the new index to DataFrame starting from zero.
# Convert Multi-index to Columns
df2=df.reset_index()
print("Convert multi level indexes to columns:\n", df2)
Yields below output.
If you have column names the same as Index, you will get an error. You can get over this by changing the multi-index names first.
df.index = df.index.set_names(['new_index1', 'new_index2'])
3. MultiIndex to Single Index
Sometimes you may be required to convert MultiIndex (multi-level) to a single Index. You can do this either by keeping one Index and converting the rest to columns or dropping them.
# MultiIndex to Single Index
df2 = df.reset_index(level=[1])
print("Convert multi level indexes to single index:\n", df2)
Yields below output. This example keeps one Index indx1
and transforms the indx2
into a column. If you have many indexes, you can pass them as a list to level
param.
# Output:
# Convert multi level indexes to single index:
indx2 Gasoline Electric
Toyoto Ford Tesla Nio
indx1
r0 rA 100 300 900 400
r1 rB 200 500 300 600
You can also drop the Index.
# MultiIndex to Single Index by dropping
df2 = df.reset_index(level=[1], drop=True)
print(df2)
4. Pandas Flatten MultiIndex Columns
If you noticed, our Pandas DataFrame contains MultiIndex columns, you can flatten this to a single level by accessing the level and assigning it to columns.
# Flattern MultiIndex columns
df.columns = df.columns.get_level_values(1)
print(df)
Yields below output.
# Output:
Toyoto Ford Tesla Nio
indx1 indx2
r0 rA 100 300 900 400
r1 rB 200 500 300 600
5. Drop Multilevel Index
You can drop levels from multi-level row or column indexes using DataFrame.columns.droplevel()
and MultiIndex.droplevel()
methods.
Using MultiIndex.droplevel()
you can drop single or multiple levels from multi-level rows/column index. Use axis=1
param to drop columns. To drop row-level use axis=0
. The below example drops the first index from DataFrame.
# Drop Index from MultiIndex
df=df.droplevel(0, axis=0)
print(df)
Yields below output.
# Output:
Toyoto Ford Tesla Nio
indx2
rA 100 300 900 400
rB 200 500 300 600
6. Complete Example of Pandas MultiIndex
import pandas as pd
# Create Row Level MultiIndex
new_index = pd.MultiIndex.from_tuples([("r0", "rA"),
("r1", "rB")],
names=['indx1','indx2'])
# Create Column Level MultiIndex
cols = pd.MultiIndex.from_tuples([("Gasoline", "Toyoto"),
("Gasoline", "Ford"),
("Electric", "Tesla"),
("Electric", "Nio")])
# Create MultiIndex DataFrame
data=[[100,300, 900,400 ], [200,500, 300,600]]
df = pd.DataFrame(data, columns=cols,index=new_index)
print(df)
# Convert MultiIndex to Columns
df2=df.reset_index()
print(df2)
# Convert MuliIndex to Single index
df2 = df.reset_index(level=[1])
print(df2)
# Drop Index
df2 = df.reset_index(level=[1], drop=True)
print(df2)
# Flattern MultiIndex columns
df.columns = df.columns.get_level_values(1)
print(df)
# Drop Index from MultiIndex
df=df.droplevel(0, axis=0)
print(df)
Frequently Asked Questions of Pandas MultiIndex in Pandas?
A MultiIndex, also known as a hierarchical index, is a powerful feature in Pandas that allows you to have multiple levels of index or column labels for a DataFrame. You can use a multiindex structure to represent higher-dimensional data in a more structured way.
You can create a MultiIndex using the MultiIndex.from_arrays
, MultiIndex.from_tuples
, or MultiIndex.from_product
methods. Alternatively, you can set a MultiIndex when creating the DataFrame using the set_index
method.
You can use the .loc
attribute to select the data based on the values of the MultiIndex. For example,
df.loc[(‘Index1’, ‘Index2’)] (Select a specific row with both indices ‘Index1’ and ‘Index2)’
You can use the reset_index()
method to reset the index of a DataFrame with MultiIndex. This will move the index levels back to columns and generate a default integer index. For example, df_reset = df.reset_index()
You can use the groupby()
method, specifying the level(s) on which you want to group the data. For example, group_data = df.groupby(level='first').sum()
Conclusion
In this article, you have learned what is Pandas MultiIndex, how to create it, how to convert the muli index to columns, flatten MultiIndex columns, drop the index, and transform it to a Single index with examples.
Happy Learning !!
Related Articles
- Drop Rows From Pandas DataFrame Examples
- Drop Single & Multiple Columns From Pandas DataFrame
- Change the Order of Pandas DataFrame Columns
- Pandas groupby() and sum() With Examples
- Difference Between loc and iloc in Pandas DataFrame
- How to Plot Columns of Pandas DataFrame
- Pandas Window Functions Explained
- Pandas Handle Missing Data in Dataframe