Pandas – Check If a Column Exists in DataFrame

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:January 19, 2023
Spread the love

How to check if a single column or multiple columns exists in pandas DataFrame? You can use Dataframe.columns attribute that returns the column labels as a list from pandas DataFrame and use it with pandas if condition to check. In this article, I will explain several ways how to check If a column exists in pandas DataFrame with examples.

1. Quick Examples of Check If a Column Exists in Pandas DataFrame

In case if you hurry, below are some quick examples of how to check if a column exists in pandas DataFrame.


# Below are quick example
# Check if column Courses is in DataFrame.columns
if 'Courses' in df.columns:
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

# Check if column Courses is in DataFrame
if 'Courses' in df:
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

# Check if column Courses is not in DataFrame.columns
if 'Courses' not in df.columns:
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

# Check for multiple columns all exist Using set.issubset
if set(['Courses','Duration']).issubset(df.columns):
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

# By using curly braces to issubset DataFrame.coluns 
if {'Courses','Duration'}.issubset(df.columns):
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

# To check if one or more columns all exist in DataFrame
if all([item in df.columns for item in ['Fee','Discount']]):
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

Yields below output.


    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  25000   40days      2300
r3   Python  22000   35days      1200
r4   pandas  30000   50days      2000

2. Check If Single Column Exists in DataFrame

Use DataFrame columns with if condition to check if a column exists. Let’s see if a "Courses" column exists in pandas DataFrame. DataFrame.columns return a list of all column labels.


# Check if column Courses is in DataFrame.columns
if 'Courses' in df.columns:
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

Yields below output.


Courses column is present : Yes

Alternatively, you can also write it as


# Check if column Courses is in DataFrame
if 'Courses' in df:
   print("Courses column is present : Yes")
else:
   print("Courses column is present : No")

3. Check If a Column Not Exists in DataFrame

To check whether the "XYZ" column exists in DataFrame or not, use not in operator. For Example, if 'XYZ' not in df.columns: method.


# Check if column Courses is not in DataFrame.columns
if 'XYZ' not in df.columns:
   print("XYZ column is present : NO")
else:
   print("XYZ column is present : Yes")

Yields below output.


XYZ column is present : NO

4. Check for Multiple Columns Exists in Pandas DataFrame

In order to check if a list of multiple selected columns exist in pandas DataFrame, use set.issubset. For Example, if set(['Courses','Duration']).issubset(df.columns): method.


# Check for multiple columns all exist Using set.issubset
if set(['Courses','Duration']).issubset(df.columns):
   print("Columns is present : Yes")
else:
   print("Columns is present : No")

Yields below output.


Columns is present : Yes

To set([]) can alternatively be constructed with curly braces.


# By using curly braces to issubset DataFrame.coluns 
if {'Courses','Duration'}.issubset(df.columns):
   print("Column is present : Yes")
else:
   print("Column is present : No")

Yields same output as above.

5. To Check If One or More Columns All Exist in DataFrame

To check if one or more columns exist in pandas DataFrame, use a list comprehension, as in: For instance, if all([item in df.columns for item in ['Fee','Discount']]): .


# To check if one or more columns all exist in DataFrame
if all([item in df.columns for item in ['Fee','Discount']]):
   print("Column is present : Yes")
else:
   print("Column is present : No")

Yields same output as above.

6. Complete Example For Check If a Column Exists in DataFrame


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Check if column Courses is in DataFrame.columns
if 'Courses' in df.columns:
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

# Check if column Courses is in DataFrame
if 'Courses' in df:
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

# Check if column Courses is not in DataFrame.columns
if 'Courses' not in df.columns:
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

# Check for multiple columns all exist Using set.issubset
if set(['Courses','Duration']).issubset(df.columns):
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

# By using curly braces to issubset DataFrame.coluns 
if {'Courses','Duration'}.issubset(df.columns):
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

# To check if one or more columns all exist in DataFrame
if all([item in df.columns for item in ['Fee','Discount']]):
   print("Courses column is present Yes")
else:
   print("Courses column is not present No")

Conclusion

In this article, you have learned how to check If a column exists in DataFrame and if a column does not exist by using the list and set methods of if conditions. You can get all DataFrame column labels by using DataFrame.columns.

Happy Learning !!

References

Leave a Reply

You are currently viewing Pandas – Check If a Column Exists in DataFrame