pandas.DataFrame.set_index()
is used to set the index to pandas DataFrame. By using set_index() method you can set the list of values, existing pandas DataFrame column, Series as an index, also set multiple columns as indexes. Use pandas.DataFrame.reset_index() to reset the index with default numeric values.
An index is like a pointer to identify rows/columns across the DataFrame or series. Rows and columns both have indexes. Rows indices are called indexes and for columns, it’s usually column names or labels.
pandas.DataFrame.set_index() Key Points
- Index can be set while creating a pandas DataFrame, use set_index() method to set indices to existing DataFrmae.
- You can also set index from a List, Series or DataFrame. hence, you can have mutliple indices to the DataFrame.
1. Quick Examples of pandas Set Index
Below are quick examples and usage of pandas.DataFrame.set_index() method.
# Below are the quick examples.
# Set list to index
index_labels=['r1','r2','r3']
df.index = index_labels
# Set single colin as index
df2 = df.set_index('Courses')
# Append index
df2 = df.set_index('Courses', append=True)
# Set multiple columns as Index
df2 = df.set_index(['Courses','Duration'])
# Set date time as index
df2 = df.set_index(pd.DatetimeIndex(pd.to_datetime(df['Start_Date'])))
2. pandas.DataFrame.set_index() Syntax
Below is the syntax of the set_index() method.
# Pandas DataFrame set_index() syntax
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
This method takes the below parameters and returns a DataFrame after setting an Index. If you used inplace=True, this returns None and sets the Index on the existing DataFrame object.
keys
– Accepts singe column name as String, list of column names e.t.cdrop
– Deletes the column after setting an index. Default set to True.append
– Specify to append new Index to existing Index. Default set to False.inplace
– Modifies the existing DataFrame object in place. Default set to False.verify_integrity
– Check the new index for duplicates. Default set to False. By using True it degrades the performance of the method.
Let’s create a pandas DataFrame, run the above examples, and validate results.
# Create DataFrame
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Hadoop"],
'Fee' :[20000,25000,26000],
'Duration':['30day','40days','35days'],
'Discount':[1000,np.nan,1200],
'Start_Date' : ['2021-02-04 05:30:00','01-09-2021 06:30:00',
'2021-03-06 07:30:00']
}
df = pd.DataFrame(technologies)
print(df)
# Output:
# Courses Fee Duration Discount Start_Date
# 0 Spark 20000 30day 1000.0 2021-02-04 05:30:00
# 1 PySpark 25000 40days NaN 01-09-2021 06:30:00
# 2 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00
3. pandas Set Index Example
Since we have not provided an index list at the time of creating the above DataFrame, pandas DataFrame by default assigns incremental sequence numbers as labels to rows as Index. You can change the index by assigning the list of values to DataFrame.index
variable.
# Set list to index
index_labels=['r1','r2','r3']
df.index = index_labels
print(df)
# Outputs:
# Courses Fee Duration Discount Start_Date
# r1 Spark 20000 30day 1000.0 2021-02-04 05:30:00
# r2 PySpark 25000 40days NaN 01-09-2021 06:30:00
# r3 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00
If you want, you can also set name to index using rename_axis().
4. Setting Single Column as Index by using set_index()
Sometimes you would be required to set one of the existing DataFrame column as an Index, you can achieve this by using set_index() method. after setting the index, it drops the column from DataFrame. To retain it use the drop=False
param.
# Set single colin as index
df2 = df.set_index('Courses')
print(df2)
# Output:
# Fee Duration Discount Start_Date
# Courses
# Spark 20000 30day 1000.0 2021-02-04 05:30:00
# PySpark 25000 40days NaN 01-09-2021 06:30:00
# Hadoop 26000 35days 1200.0 2021-03-06 07:30:00
Note that setting the index replaces the existing index in DataFrame. If you wanted to retain the existing Index and append new index use append=True
.
# Append index
df2 = df.set_index('Courses', append=True)
print(df2)
# Output:
# Fee Duration Discount Start_Date
# Courses
# r1 Spark 20000 30day 1000.0 2021-02-04 05:30:00
# r2 PySpark 25000 40days NaN 01-09-2021 06:30:00
# r3 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00
5. pandas set Index Multiple Columns
You can also set multiple columns as index in pandas, In order to do so just pass all columns in a list to DataFrame.set_index() method.
# Set multiple columns as Index
df2 = df.set_index(['Courses','Duration'])
print(df2)
# Output:
# Fee Discount Start_Date
# Courses Duration
# Spark 30day 20000 1000.0 2021-02-04 05:30:00
# PySpark 40days 25000 NaN 01-09-2021 06:30:00
# Hadoop 35days 26000 1200.0 2021-03-06 07:30:00
6. pandas Set Index to datetime
When you are working with date and time and wanted to perform some filtering on datetime, it’s best practice to set the date and time column as an index. Before you do this, make sure your date column is in datetime format. Use pandas.DatetimeIndex() method to conver datetime to index.
# Set date time as index
df2 = df.set_index(pd.DatetimeIndex(pd.to_datetime(df['Start_Date'])))
print(df2)
# Output:
# Courses Fee Duration Discount Start_Date
# Start_Date
# 2021-02-04 05:30:00 Spark 20000 30day 1000.0 2021-02-04 05:30:00
# 2021-01-09 06:30:00 PySpark 25000 40days NaN 01-09-2021 06:30:00
# 2021-03-06 07:30:00 Hadoop 26000 35days 1200.0 2021-03-06 07:30:00
By run df2.inf(), will result you below
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3 entries, 2021-02-04 05:30:00 to 2021-03-06 07:30:00
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Courses 3 non-null object
1 Fee 3 non-null int64
2 Duration 3 non-null object
3 Discount 2 non-null float64
4 Start_Date 3 non-null object
dtypes: float64(1), int64(1), object(3)
memory usage: 144.0+ bytes
None
In case you wanted to set the index to a column use DataFrame.reset_index(). There are also several other ways to set indices.
7. Complete Example of pandas Set Index
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Hadoop"],
'Fee' :[20000,25000,26000],
'Duration':['30day','40days','35days'],
'Discount':[1000,np.nan,1200],
'Start_Date' : ['2021-02-04 05:01:21','01-09-2021 06:03:41',
'2021-03-06 07:06:21']
}
df = pd.DataFrame(technologies)
print(df)
# Set list to index
index_labels=['r1','r2','r3']
df.index = index_labels
print(df)
# Set single colin as index
df2 = df.set_index('Courses')
print(df2)
# Append index
df2 = df.set_index('Courses', append=True)
print(df2)
# Set multiple columns as Index
df2 = df.set_index(['Courses','Duration'])
print(df2)
# Set date time as index
df2 = df.set_index(pd.DatetimeIndex(pd.to_datetime(df['Start_Date'])))
print(df2)
print(df2.info())
8. Conclusion
In this article, you have learned pandas.DataFrame.set_index() syntax, usage, and examples like setting list, DataFrame column as an index. And also learned to set multiple columns and DateTime as indexes to DataFrame.
Related Articles
- Pandas Set Index to Column in DataFrame
- Pandas Set Column as Index in DataFrame
- Pandas Set Value to Particular Cell in DataFrame Using Index
- Pandas Set Index Name to DataFrame
- Create a Set From a Series in Pandas
- pandas reset_index() – Rest Index on DataFrame
- Pandas – Set Order of Columns in DataFrame