• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:8 mins read
You are currently viewing Pandas Set Column as Index in DataFrame

You can set pandas column as index by using DataFrame.set_index() method and DataFrame.index property. What is an Index in pandas? The row label of DataFrame is an Index. In this article, I will explain how to use a column as an index with some simple examples. By default, an index is created for DataFrame. But, you can set an Index value while creating a DataFrame or set a specific existing column of DataFrame as an index.

1. Quick Examples to Set DataFrame Column as Row Index

If you are in a hurry below are some quick examples.


# Below are some quick examples.

# Using set_index() method.
df.set_index('Fee', inplace=True)

# Using Set_index() by Transform() method.
df2=df.set_index('Courses').T

# Set Column as index.
df.index = df['Courses']

# Drop column after setting it as Index.
df2=df.drop('Courses', axis=1)

Now, let’s create a pandas DataFrame and execute these examples and validate results. Our DataFrame contains column names CoursesFeeDuration, and Discount.


# Create a Pandas DataFrame.
import pandas as pd
df = pd.DataFrame([["Spark",20000,'40days'],
                   ["PySpark",22000,'30days'],
                   ["Python",25000,'35days']],
                columns=['Courses','Fee','Duration'])
print(df)

Yields below output. This DataFrame was created with a default index.


# Output:
   Courses    Fee Duration
0    Spark  20000   40days
1  PySpark  22000   30days
2   Python  25000   35days

2. Using set_index() Method

Use DataFrame.set_index() method to set the existing column of DataFrame as an index. On DataFrame, the row label is an Index. If your DataFrame has already had an Index, this replaces the existing index or expands on it.

You can set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length).

Syntax:


# Syntax for set_index() method.
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Usage with Example:

In the below example, I am setting Fee column as an index.


# Using set_index() method.
df.set_index('Fee', inplace=True)
print(df)

Yields below output. If inplace=True is not provided, set_index() returns the modified DataFrame as a result. Using this approach, it automatically removes the column from the DataFrame after setting it as Index.


# Output:
       Courses Duration
Fee                    
20000    Spark   40days
22000  PySpark   30days
25000   Python   35days

3. Using Set_index() by Transform() Method

Set_index() method is also used to transpose the columns. There is a transpose (data.T) method in pandas that will help you do it.


# Using Set_index() by Transform() method.
df2=df.set_index('Courses').T
print(df2)

 Yields below output.


# Output:
Courses    Spark PySpark  Python
Duration  40days  30days  35days

4. Set Column as Index by DataFrame.index Property

You can set pandas column as index by using DataFrame.index property. In order to use a comuln as index, just select the columns from DataFrame and assign it to the DataFrame.index property.


# Set Column as index.
df.index = df['Courses']
print(df)

 Yields below output.


# Output:
         Courses Duration
Courses                  
Spark      Spark   40days
PySpark  PySpark   30days
Python    Python   35days

Note that in the above example, I am setting Courses as Index but still that column is present on DataFrame. Use DataFrame.drop() method to drop the column if you don’t want it.

DataFrame.drop() function is used to drop specified labels from rows or columns. Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names.


# Using DataFrame.drop() method.
df2=df.drop('Courses', axis=1)
print(df2)

 Yields below output.


# Output:
        Duration
Courses         
Spark     40days
PySpark   30days
Python    35days

5. Complete Examples of Set Pandas Column as Index


# Create a Pandas DataFrame.
import pandas as pd
df = pd.DataFrame([["Spark",20000,'40days'],["PySpark",22000,'30days'],["Python",25000,'35days']],
                columns=['Courses','Fee','Duration'])
print(df)

# Using set_index() method.
df.set_index('Fee', inplace=True)
print(df)

# Using Set_index() by Transform() method.
df2=df.set_index('Courses').T
print(df2)

# Set Column as index.
df.index = df['Courses']
print(df)

# Using DataFrame.drop() method.
df2=df.drop('Courses', axis=1)
print(df2)

Conclusion

In this article, you have learned how to set pandas column as a row index using DataFrame.set_index() method and DataFrame.index property with simple examples.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium