• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:16 mins read
You are currently viewing Pandas – Check If a Column Exists in DataFrame

How to check if a single column or multiple columns exists in Pandas DataFrame? You can use Dataframe.columns attribute that returns the column labels as a list from pandas DataFrame and use it with pandas if condition to check. In this article, I will explain several ways how to check If a column exists in Pandas DataFrame with examples.

1. Quick Examples of Check If a Column Exists in Pandas DataFrame

If you are in a hurry, below are some quick examples of how to check if a column exists in Pandas DataFrame.


# Quick examples of check if a column exists

# Example 1: Check if column Courses is in DataFrame.columns
if 'Courses' in df.columns:
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

# Example 2: Check if column Courses is in DataFrame
if 'Courses' in df:
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

# Example 3: Check if column Courses is not in DataFrame.columns
if 'Courses' not in df.columns:
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

# Example 4: Check for multiple columns all exist Using set.issubset
if set(['Courses','Duration']).issubset(df.columns):
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

# Example 5: By using curly braces to issubset DataFrame.coluns 
if {'Courses','Duration'}.issubset(df.columns):
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

# Example 6: To check if one or more columns all exist in DataFrame
if all([item in df.columns for item in ['Fee','Discount']]):
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

Yields below output.


# Output:
    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  25000   40days      2300
r3   Python  22000   35days      1200
r4   pandas  30000   50days      2000

2. Check If Single Column Exists in DataFrame

Use DataFrame columns with if condition to check if a column exists. Let’s see if a "Courses" column exists in Pandas DataFrame. DataFrame.columns return a list of all column labels.

This program checks whether the column with the name ‘Courses’ is present in the DataFrame’s columns. If it is present, it prints “Courses column is present: Yes”; otherwise, it prints “Courses column is present: No”.


# Check if column Courses is in DataFrame.columns
if 'Courses' in df.columns:
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

Yields below output.


# Output:
Courses column is present : Yes

Alternatively, you can also write it as


# Check if column Courses is in DataFrame
if 'Courses' in df:
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

3. Check If a Column Not Exists in DataFrame

To check whether the "XYZ" column exists in DataFrame or not, use not in operator. For Example, if 'XYZ' not in df.columns: method.


# Check if column Courses is not in DataFrame.columns
if 'XYZ' not in df.columns:
   print("XYZ column is present : NO")
else:
   print("XYZ column is present : Yes")

Yields below output.


# Output:
XYZ column is present : NO

4. Check for Multiple Columns Exists in Pandas DataFrame

In order to check if a list of multiple selected columns exist in pandas DataFrame, use set.issubset. For Example, if set(['Courses','Duration']).issubset(df.columns): method.


# Check for multiple columns all exist Using set.issubset
if set(['Courses','Duration']).issubset(df.columns):
   print("Columns is present : Yes")
else:
   print("Columns is present : No")

Yields below output.


# Output:
Columns is present : Yes

To set([]) can alternatively be constructed with curly braces.


# By using curly braces to issubset DataFrame.coluns 
if {'Courses','Duration'}.issubset(df.columns):
   print("Column is present : Yes")
else:
   print("Column is present : No")

Yields the same output as above.

5. To Check If One or More Columns All Exist in DataFrame

To check if one or more columns exist in pandas DataFrame, use a list comprehension, as in: For instance, if all([item in df.columns for item in ['Fee','Discount']]): .


# To check if one or more columns all exist in DataFrame
if all([item in df.columns for item in ['Fee','Discount']]):
   print("Column is present : Yes")
else:
   print("Column is present : No")

Yields the same output as above.

6. Complete Example For Check If a Column Exists in DataFrame


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Check if column Courses is in DataFrame.columns
if 'Courses' in df.columns:
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

# Check if column Courses is in DataFrame
if 'Courses' in df:
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

# Check if column Courses is not in DataFrame.columns
if 'Courses' not in df.columns:
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

# Check for multiple columns all exist Using set.issubset
if set(['Courses','Duration']).issubset(df.columns):
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

# By using curly braces to issubset DataFrame.coluns 
if {'Courses','Duration'}.issubset(df.columns):
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

# To check if one or more columns all exist in DataFrame
if all([item in df.columns for item in ['Fee','Discount']]):
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

Frequently Asked Questions on Check If a Column Exists in DataFrame

How can I check if a specific column exists in a Pandas DataFrame?

To check if a specific column exists in a Pandas DataFrame, you can use the in operator or the columns attribute.

Can I use the get method to check if a column exists?

You can use the get method to check if a column exists in a Pandas DataFrame. The get method is used to access a column by name, and it returns None if the column doesn’t exist

What happens if I check for a column that doesn’t exist?

If you check for a column that doesn’t exist using the get method, it will return None. For example, since ‘NonExistentColumn’ does not exist in the DataFrame, the result will be None, and the condition result is not None will be False. Therefore, it will print that the column does not exist in the DataFrame.

Is there a difference between using in and get to check for a column?

There is a difference. Using in checks if the column name is in the list of columns, while get actually retrieves the column and checks if it’s None. Using in is more common for existence checks.

Are there other ways to check for column existence?

While in and get are common methods, you can also use try-except blocks to handle the case where the column doesn’t exist. However, using in is generally more readable and idiomatic.

Conclusion

In this article, you have learned how to check If a column exists in DataFrame and if a column does not exist by using the list and set methods of if conditions. You can get all DataFrame column labels by using DataFrame.columns.

Happy Learning !!

References

Malli

Malli is an experienced technical writer with a passion for translating complex Python concepts into clear, concise, and user-friendly articles. Over the years, he has written hundreds of articles in Pandas, NumPy, Python, and takes pride in ability to bridge the gap between technical experts and end-users.

Leave a Reply