How to check if a single column or multiple columns exists in Pandas DataFrame? You can use Dataframe.columns
attribute that returns the column labels as a list from pandas DataFrame and use it with pandas if condition to check. In this article, I will explain several ways how to check If a column exists in Pandas DataFrame with examples.
1. Quick Examples of Check If a Column Exists in Pandas DataFrame
If you are in a hurry, below are some quick examples of how to check if a column exists in Pandas DataFrame.
# Quick examples of check if a column exists
# Example 1: Check if column Courses is in DataFrame.columns
if 'Courses' in df.columns:
print("Courses column is present : Yes")
else:
print("Courses column is present : No")
# Example 2: Check if column Courses is in DataFrame
if 'Courses' in df:
print("Courses column is present : Yes")
else:
print("Courses column is present : No")
# Example 3: Check if column Courses is not in DataFrame.columns
if 'Courses' not in df.columns:
print("Courses column is present : Yes")
else:
print("Courses column is present : No")
# Example 4: Check for multiple columns all exist Using set.issubset
if set(['Courses','Duration']).issubset(df.columns):
print("Courses column is present : Yes")
else:
print("Courses column is present : No")
# Example 5: By using curly braces to issubset DataFrame.coluns
if {'Courses','Duration'}.issubset(df.columns):
print("Courses column is present : Yes")
else:
print("Courses column is present : No")
# Example 6: To check if one or more columns all exist in DataFrame
if all([item in df.columns for item in ['Fee','Discount']]):
print("Courses column is present : Yes")
else:
print("Courses column is present : No")
Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 25000 40days 2300
r3 Python 22000 35days 1200
r4 pandas 30000 50days 2000
2. Check If Single Column Exists in DataFrame
Use DataFrame columns with if condition to check if a column exists. Let’s see if a "Courses"
column exists in Pandas DataFrame. DataFrame.columns
return a list of all column labels.
This program checks whether the column with the name ‘Courses’ is present in the DataFrame’s columns. If it is present, it prints “Courses column is present: Yes”; otherwise, it prints “Courses column is present: No”.
# Check if column Courses is in DataFrame.columns
if 'Courses' in df.columns:
print("Courses column is present : Yes")
else:
print("Courses column is present : No")
Yields below output.
# Output:
Courses column is present : Yes
Alternatively, you can also write it as
# Check if column Courses is in DataFrame
if 'Courses' in df:
print("Courses column is present : Yes")
else:
print("Courses column is present : No")
3. Check If a Column Not Exists in DataFrame
To check whether the "XYZ"
column exists in DataFrame or not, use not in operator. For Example, if 'XYZ' not in df.columns:
method.
# Check if column Courses is not in DataFrame.columns
if 'XYZ' not in df.columns:
print("XYZ column is present : NO")
else:
print("XYZ column is present : Yes")
Yields below output.
# Output:
XYZ column is present : NO
4. Check for Multiple Columns Exists in Pandas DataFrame
In order to check if a list of multiple selected columns exist in pandas DataFrame, use set.issubset
. For Example, if set(['Courses','Duration']).issubset(df.columns):
method.
# Check for multiple columns all exist Using set.issubset
if set(['Courses','Duration']).issubset(df.columns):
print("Columns is present : Yes")
else:
print("Columns is present : No")
Yields below output.
# Output:
Columns is present : Yes
To set([])
can alternatively be constructed with curly braces.
# By using curly braces to issubset DataFrame.coluns
if {'Courses','Duration'}.issubset(df.columns):
print("Column is present : Yes")
else:
print("Column is present : No")
Yields the same output as above.
5. To Check If One or More Columns All Exist in DataFrame
To check if one or more columns exist in pandas DataFrame, use a list comprehension, as in: For instance, if all([item in df.columns for item in ['Fee','Discount']]):
.
# To check if one or more columns all exist in DataFrame
if all([item in df.columns for item in ['Fee','Discount']]):
print("Column is present : Yes")
else:
print("Column is present : No")
Yields the same output as above.
6. Complete Example For Check If a Column Exists in DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Check if column Courses is in DataFrame.columns
if 'Courses' in df.columns:
print("Courses column is present Yes")
else:
print("Courses column is not present No")
# Check if column Courses is in DataFrame
if 'Courses' in df:
print("Courses column is present Yes")
else:
print("Courses column is not present No")
# Check if column Courses is not in DataFrame.columns
if 'Courses' not in df.columns:
print("Courses column is present Yes")
else:
print("Courses column is not present No")
# Check for multiple columns all exist Using set.issubset
if set(['Courses','Duration']).issubset(df.columns):
print("Courses column is present Yes")
else:
print("Courses column is not present No")
# By using curly braces to issubset DataFrame.coluns
if {'Courses','Duration'}.issubset(df.columns):
print("Courses column is present Yes")
else:
print("Courses column is not present No")
# To check if one or more columns all exist in DataFrame
if all([item in df.columns for item in ['Fee','Discount']]):
print("Courses column is present Yes")
else:
print("Courses column is not present No")
Frequently Asked Questions on Check If a Column Exists in DataFrame
To check if a specific column exists in a Pandas DataFrame, you can use the in
operator or the columns
attribute.
You can use the get
method to check if a column exists in a Pandas DataFrame. The get
method is used to access a column by name, and it returns None
if the column doesn’t exist
If you check for a column that doesn’t exist using the get
method, it will return None
. For example, since ‘NonExistentColumn’ does not exist in the DataFrame, the result
will be None
, and the condition result is not None
will be False
. Therefore, it will print that the column does not exist in the DataFrame.
There is a difference. Using in
checks if the column name is in the list of columns, while get
actually retrieves the column and checks if it’s None
. Using in
is more common for existence checks.
While in
and get
are common methods, you can also use try-except
blocks to handle the case where the column doesn’t exist. However, using in
is generally more readable and idiomatic.
Conclusion
In this article, you have learned how to check If a column exists in DataFrame and if a column does not exist by using the list and set methods of if conditions. You can get all DataFrame column labels by using DataFrame.columns
.
Happy Learning !!
Related Articles
- Create Pandas DataFrame With Working Examples
- Count NaN Values in Pandas DataFrame
- pandas.DataFrame.where() Examples
- How to use Pandas unstack() Function
- How to use Pandas stack() function
- Pandas Insert List into Cell of DataFrame
- Get Column Average or Mean in Pandas DataFrame
- Pandas Remove Columns & Index | Writing CSV File
- Pandas Drop First/Last N Columns From DataFrame
- How to Delete Last Row From Pandas DataFrame
- Convert NumPy Array to Pandas DataFrame
- Pandas – Retrieve Number of Columns From DataFrame
- Select Rows From List of Values in Pandas DataFrame
- Pandas – Retrieve Number of Rows From DataFrame