You can set pandas column as index by using DataFrame.set_index()
method and DataFrame.index
property. What is an Index in pandas? The row label of DataFrame is an Index. In this article, I will explain how to use a column as an index with some simple examples. By default, an index is created for DataFrame. But, you can set an Index value while creating a DataFrame or set a specific existing column of DataFrame as an index.
Key Points –
- The
set_index()
function is used to designate one or more columns as the index of a DataFrame, effectively replacing the default integer index. - The
set_index()
function can be used with theinplace
parameter to modify the DataFrame directly, or it can return a new DataFrame without changing the original. - By default, the original columns used for the index are dropped from the DataFrame unless the
drop
parameter is set toFalse
. - You can pass multiple column names to
set_index()
to create a MultiIndex (hierarchical index), allowing for more complex data slicing. - Setting a column as the index does not automatically sort the DataFrame by that column; you may need to use
sort_index()
if a sorted index is required.
1. Quick Examples to Set DataFrame Column as Row Index
If you are in a hurry below are some quick examples.
# Quick examples to set dataframe column as row index
# Using set_index() method
df.set_index('Fee', inplace=True)
# Using Set_index() by Transform() method
df2=df.set_index('Courses').T
# Set Column as index
df.index = df['Courses']
# Drop column after setting it as index
df2=df.drop('Courses', axis=1)
Now, let’s create a pandas DataFrame and execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create a Pandas DataFrame.
import pandas as pd
df = pd.DataFrame([["Spark",20000,'40days'],
["PySpark",22000,'30days'],
["Python",25000,'35days']],
columns=['Courses','Fee','Duration'])
print(df)
Yields below output. This DataFrame was created with a default index.
# Output:
Courses Fee Duration
0 Spark 20000 40days
1 PySpark 22000 30days
2 Python 25000 35days
2. Using set_index() Method
Use DataFrame.set_index()
method to set the existing column of DataFrame as an index. On DataFrame, the row label is an Index. If your DataFrame has already had an Index, this replaces the existing index or expands on it.
You can set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length).
Syntax:
# Syntax for set_index() method.
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
Usage with Example:
In the below example, I am setting Fee
column as an index.
# Using set_index() method.
df.set_index('Fee', inplace=True)
print(df)
Yields below output. If inplace=True
is not provided, set_index()
returns the modified DataFrame as a result. Using this approach, it automatically removes the column from the DataFrame after setting it as Index.
# Output:
Courses Duration
Fee
20000 Spark 40days
22000 PySpark 30days
25000 Python 35days
3. Using Set_index() by Transform() Method
Set_index()
method is also used to transpose the columns. There is a transpose (data.T
) method in pandas that will help you do it.
# Using Set_index() by Transform() method.
df2=df.set_index('Courses').T
print(df2)
Yields below output.
# Output:
Courses Spark PySpark Python
Duration 40days 30days 35days
4. Set Column as Index by DataFrame.index Property
You can set pandas column as index by using DataFrame.index
property. In order to use a comuln as index, just select the columns from DataFrame and assign it to the DataFrame.index property.
# Set Column as index.
df.index = df['Courses']
print(df)
Yields below output.
# Output:
Courses Duration
Courses
Spark Spark 40days
PySpark PySpark 30days
Python Python 35days
Note that in the above example, I am setting Courses as Index but still that column is present on DataFrame. Use DataFrame.drop() method to drop the column if you don’t want it.
DataFrame.drop()
function is used to drop specified labels from rows or columns. Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names.
# Using DataFrame.drop() method.
df2=df.drop('Courses', axis=1)
print(df2)
Yields below output.
# Output:
Duration
Courses
Spark 40days
PySpark 30days
Python 35days
Complete Examples of Set Pandas Column as Index
# Create a Pandas DataFrame
import pandas as pd
df = pd.DataFrame([["Spark",20000,'40days'],["PySpark",22000,'30days'],["Python",25000,'35days']],
columns=['Courses','Fee','Duration'])
print(df)
# Using set_index() method
df.set_index('Fee', inplace=True)
print(df)
# Using Set_index() by Transform() method
df2=df.set_index('Courses').T
print(df2)
# Set Column as index
df.index = df['Courses']
print(df)
# Using DataFrame.drop() method
df2=df.drop('Courses', axis=1)
print(df2)
FAQ on Pandas Set Column as Index in DataFrame
To set a column as the index in a Pandas DataFrame, you can use the set_index()
method. This method allows you to set one or more columns as the index.
You can set multiple columns as the index in a Pandas DataFrame. When you do this, you create a MultiIndex, which is an index with more than one level. To set multiple columns as the index, you pass a list of column names to the set_index()
method.
By default, the column used as the index will no longer appear as a regular column. If you want to retain it as a column, set the drop=False
parameter.
You can set an index using an existing Datetime column in a Pandas DataFrame. This is a common operation when working with time series data, as it allows you to leverage time-based functionalities, such as slicing by date, resampling, or plotting.
A MultiIndex in Pandas is an index that contains multiple levels. It allows you to represent more complex hierarchical data structures within a DataFrame, which is particularly useful when working with grouped or multidimensional data. Instead of a single index column, a MultiIndex lets you use multiple columns as the index, enabling more advanced indexing and selection operations.
Conclusion
In this article, you have learned how to set pandas column as a row index using DataFrame.set_index()
method and DataFrame.index
property with simple examples.
Related Articles
- Pandas Set Index Name to DataFrame
- Convert Date (datetime) to String Format
- Pandas Filter DataFrame Rows on Dates
- Pandas Groupby Columns and Get Count
- Pandas Set Index to Column in DataFrame
- Pandas Select All Columns Except One Column
- Pandas set_index() – Set Index to DataFrame
- How to Compare Two Columns Using Pandas?
- Pandas – Convert Index to Column in DataFrame
- Change String Object to Date in Pandas DataFrame
- Count(Distinct) SQL Equivalent in Pandas DataFrame
- Pandas – What is a DataFrame Explained With Examples
- Pandas Set Value to Particular Cell in DataFrame Using Index