• Post author:
  • Post category:Pandas
  • Post last modified:December 12, 2024
  • Reading time:15 mins read
You are currently viewing Pandas Set Column as Index in DataFrame

You can set pandas column as index by using DataFrame.set_index() method and DataFrame.index property. What is an Index in pandas? The row label of DataFrame is an Index. In this article, I will explain how to use a column as an index with some simple examples. By default, an index is created for DataFrame. But, you can set an Index value while creating a DataFrame or set a specific existing column of DataFrame as an index.

Advertisements

Key Points –

  • The set_index() function is used to designate one or more columns as the index of a DataFrame, effectively replacing the default integer index.
  • The set_index() function can be used with the inplace parameter to modify the DataFrame directly, or it can return a new DataFrame without changing the original.
  • By default, the original columns used for the index are dropped from the DataFrame unless the drop parameter is set to False.
  • You can pass multiple column names to set_index() to create a MultiIndex (hierarchical index), allowing for more complex data slicing.
  • Setting a column as the index does not automatically sort the DataFrame by that column; you may need to use sort_index() if a sorted index is required.

1. Quick Examples to Set DataFrame Column as Row Index

If you are in a hurry below are some quick examples.


# Quick examples to set dataframe column as row index

# Using set_index() method
df.set_index('Fee', inplace=True)

# Using Set_index() by Transform() method
df2=df.set_index('Courses').T

# Set Column as index
df.index = df['Courses']

# Drop column after setting it as index
df2=df.drop('Courses', axis=1)

Now, let’s create a pandas DataFrame and execute these examples and validate results. Our DataFrame contains column names CoursesFeeDuration, and Discount.


# Create a Pandas DataFrame.
import pandas as pd
df = pd.DataFrame([["Spark",20000,'40days'],
                   ["PySpark",22000,'30days'],
                   ["Python",25000,'35days']],
                columns=['Courses','Fee','Duration'])
print(df)

Yields below output. This DataFrame was created with a default index.


# Output:
   Courses    Fee Duration
0    Spark  20000   40days
1  PySpark  22000   30days
2   Python  25000   35days

2. Using set_index() Method

Use DataFrame.set_index() method to set the existing column of DataFrame as an index. On DataFrame, the row label is an Index. If your DataFrame has already had an Index, this replaces the existing index or expands on it.

You can set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length).

Syntax:


# Syntax for set_index() method.
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Usage with Example:

In the below example, I am setting Fee column as an index.


# Using set_index() method.
df.set_index('Fee', inplace=True)
print(df)

Yields below output. If inplace=True is not provided, set_index() returns the modified DataFrame as a result. Using this approach, it automatically removes the column from the DataFrame after setting it as Index.


# Output:
       Courses Duration
Fee                    
20000    Spark   40days
22000  PySpark   30days
25000   Python   35days

3. Using Set_index() by Transform() Method

Set_index() method is also used to transpose the columns. There is a transpose (data.T) method in pandas that will help you do it.


# Using Set_index() by Transform() method.
df2=df.set_index('Courses').T
print(df2)

 Yields below output.


# Output:
Courses    Spark PySpark  Python
Duration  40days  30days  35days

4. Set Column as Index by DataFrame.index Property

You can set pandas column as index by using DataFrame.index property. In order to use a comuln as index, just select the columns from DataFrame and assign it to the DataFrame.index property.


# Set Column as index.
df.index = df['Courses']
print(df)

 Yields below output.


# Output:
         Courses Duration
Courses                  
Spark      Spark   40days
PySpark  PySpark   30days
Python    Python   35days

Note that in the above example, I am setting Courses as Index but still that column is present on DataFrame. Use DataFrame.drop() method to drop the column if you don’t want it.

DataFrame.drop() function is used to drop specified labels from rows or columns. Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names.


# Using DataFrame.drop() method.
df2=df.drop('Courses', axis=1)
print(df2)

 Yields below output.


# Output:
        Duration
Courses         
Spark     40days
PySpark   30days
Python    35days

Complete Examples of Set Pandas Column as Index


# Create a Pandas DataFrame
import pandas as pd
df = pd.DataFrame([["Spark",20000,'40days'],["PySpark",22000,'30days'],["Python",25000,'35days']],
                columns=['Courses','Fee','Duration'])
print(df)

# Using set_index() method
df.set_index('Fee', inplace=True)
print(df)

# Using Set_index() by Transform() method
df2=df.set_index('Courses').T
print(df2)

# Set Column as index
df.index = df['Courses']
print(df)

# Using DataFrame.drop() method
df2=df.drop('Courses', axis=1)
print(df2)

FAQ on Pandas Set Column as Index in DataFrame

How do I set a column as the index in a DataFrame?

To set a column as the index in a Pandas DataFrame, you can use the set_index() method. This method allows you to set one or more columns as the index.

Can I set multiple columns as the index?

You can set multiple columns as the index in a Pandas DataFrame. When you do this, you create a MultiIndex, which is an index with more than one level. To set multiple columns as the index, you pass a list of column names to the set_index() method.

What happens to the original column when it’s set as the index?

By default, the column used as the index will no longer appear as a regular column. If you want to retain it as a column, set the drop=False parameter.

Can I set an index using an existing Datetime column?

You can set an index using an existing Datetime column in a Pandas DataFrame. This is a common operation when working with time series data, as it allows you to leverage time-based functionalities, such as slicing by date, resampling, or plotting.

What is a MultiIndex, and how do I create one?

A MultiIndex in Pandas is an index that contains multiple levels. It allows you to represent more complex hierarchical data structures within a DataFrame, which is particularly useful when working with grouped or multidimensional data. Instead of a single index column, a MultiIndex lets you use multiple columns as the index, enabling more advanced indexing and selection operations.

Conclusion

In this article, you have learned how to set pandas column as a row index using DataFrame.set_index() method and DataFrame.index property with simple examples.

References