In Pandas, you can get the column index for a specific column name using the get_loc()
method of the DataFrame. DataFrame.columns
return all column labels of DataFrame as an Index and get_loc()
is a method of Index that gives you a column Index for a given column. In this article, I will explain different ways to get an index from column names with examples.
Key Points –
- Converting
DataFrame.columns
to a list allows accessing the index of a column using standard list methods. - Attempting to access a non-existent column with
get_loc()
raises a KeyError, so handling missing column names is crucial. - Using
get_loc()
is generally more efficient than converting columns to a list and finding the index manually, especially with large DataFrames. - For MultiIndex columns,
get_loc()
can be used with tuples to access levels, but requires specifying all levels in the tuple. get_loc()
works on both integer and string-based column names, making it versatile for various DataFrame configurations.- The
Index
object returned byDataFrame.columns
is immutable, ensuring that column names cannot be modified directly through it.
Quick Examples of Column Index From Column Name
If you are in a hurry, below are some quick examples of how to get the column index from the column name in Pandas DataFrame.
# Quick examples of column index from column name
# Example 1: Get column index
# From column name i.e column 3
idx=df.columns.get_loc("Duration")
print("Column Index : "+ str(idx))
# Example 2: Dictionary of column name
# With associated index
idx_dic = {}
for col in df.columns:
idx_dic[col] = df.columns.get_loc(col)
print(idx_dic)
# Example 3: Get index for multiple column labels/names
query_cols=['Fee','Courses']
cols_index = [df.columns.get_loc(col) for col in query_cols]
print(cols_index)
# Example 4: Column index from column name
# Using get_indexer().
cols_index = df.columns.get_indexer(query_cols)
Now, let’s create a Pandas DataFrame with a few duplicate rows on all columns. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n",df)
Yields below output.
Get Column Index From Column Name by get_loc()
DataFrame.columns
return all column labels of DataFrame as an Index and Index.get_loc()
returns a column Index for a given column.
Syntax of Index.get_loc()
Following is the syntax of index.get_loc()
# Syntax for index.get_loc method.
Index.get_loc(key, method=None, tolerance=None)
In the below example, use the get_loc()
method on the columns
attribute of the DataFrame (df
). It takes the column name “Duration” as an argument and returns the index of that column. The resulting index is then assigned to the variable idx
.
The str(idx)
is used to convert the index to a string for concatenation with the rest of the print statement.
# Get column index from column name i.e column 3.
idx=df.columns.get_loc("Duration")
print("Column Index : "+ str(idx))
Yields below output.
Using Dictionary of Column Name With Associated Index
You can see if we want to create a dictionary with column name as key and associated index as value by idx_dic[]
method. For example-
# Dictionary of Column name with associated index
idx_dic = {}
for col in df.columns:
idx_dic[col] = df.columns.get_loc(col)
print(idx_dic)
Yields below output.
# Output:
{'Courses': 0, 'Fee': 1, 'Duration': 2, 'Discount': 3}
Get Index for Multiple Column Labels/Names
Using the same get_loc()
you can get the Index for multiple column labels/names in DataFrame by passing column labels as a list to this method.
To get the indices for multiple-column labels or names. It uses a list comprehension to iterate through the specified columns (query_cols
) and retrieves their indices using the get_loc()
method.
# Get Index for Multiple Column Labels/Names
query_cols=['Fee','Courses']
cols_index = [df.columns.get_loc(col) for col in query_cols]
print(cols_index)
# Output:
# cols_index : [1,0]
Get Column Index From Column Name Using get_indexer()
In Pandas, you can use the get_indexer()
method to get the indices for multiple column names efficiently. The get_indexer()
method returns an indexer array that can be used to index into an array or list-like structure.
In the below example, query_cols
is a list of column names for which you want to get the indices. The get_indexer()
method is then used to obtain the indices efficiently.
# Column index from column name
# Using get_indexer()
query_cols=['Fee','Courses']
cols_index = df.columns.get_indexer(query_cols)
print(cols_index)
# Output:
# [1 0]
Complete Examples
# Get Column Index From Column Name in Pandas
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
df = pd.DataFrame(technologies)
print(df)
# Get column index from column name i.e column 3
idx=df.columns.get_loc("Duration")
print("Column Index : "+ str(idx))
# Dictionary of Column name with associated index.
idx_dic = {}
for col in df.columns:
idx_dic[col] = df.columns.get_loc(col)
print(idx_dic)
# Get Index for Multiple Column Labels/Names
query_cols=['Fee','Courses']
cols_index = [df.columns.get_loc(col) for col in query_cols]
print(cols_index)
# Column index from column name using get_indexer()
cols_index = df.columns.get_indexer(query_cols)
Frequently Asked Questions on Get Column Index For Column Name
You can get the column index for a specific column name in a Pandas DataFrame using the get_loc()
method. For example, df.columns.get_loc(column_name)
returns the index of the column with the specified name (‘B’ in this case). The result is then printed, indicating the index of the column within the DataFrame.
You can get the indices for multiple column names at once. One way to achieve this is by using a list of column names and a list comprehension. For example, the get_loc()
method is applied for each column name in the query_cols
list, and the resulting indices are stored in the cols_index
list. The output will show the indices corresponding to the specified columns.
An alternative method to get indices for multiple columns is to use the get_indexer()
method. This method efficiently returns an array of indices for a list of column names.
You can create a dictionary mapping column names to their indices by iterating through the columns of the DataFrame and using the get_loc()
method for each column.
Conclusion
In this article, you have learned how to get column Index from a column name by using get_loc()
, and get_indexer()
. To get the index for multiple column names pass columns as a list to get_loc()
method.
Related Articles
- Pandas Explode Multiple Columns
- Set Order of Columns in Pandas DataFrame
- Pandas Add Constant Column to DataFrame
- Rename Index Values of Pandas DataFrame
- Pandas DataFrame insert() Function
- Pandas Rename Index of DataFrame
- How to Convert pandas Column to List
- Pandas – Drop Infinite Values From DataFrame
- Pandas Add Column based on Another Column
- Pandas Split Column into Two Columns
- How to get column names from Pandas DataFrame?
- Select Rows From List of Values in Pandas DataFrame
Reference