Pandas Check Column Contains a Value in DataFrame

You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd.series(), in operator, pandas.series.isin(), str.contains() methods and many more. In this article, I will explain how to check if a column contains a particular value with examples. These return True when a value contains in a specified column, False when not found.

1. Quick Examples of Pandas Column Contains Particular value of DataFrame

If you are in a hurry, below are some quick examples of how to check if a pandas DataFrame column contains/exists a particular string value or a list of values.


# Below are some quick examples.
# Check Column Contains a Value in DataFrame
print('Spark' in df['Courses'].unique())

# Check Column Contains a Value in DataFrame
print('Spark' in set(df['Courses']))

# Using DataFrame.values.
print('Spark' in df['Courses'].values)

# Check column contains Particular value of DataFrame 
# using Pandas.Series.isin()
print(df['Courses'].isin(['Spark','Python']))

# Column contains in multiple strings.
print(df[df['Courses'].str.contains('ark')])

Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names CoursesFeeDuration, and Discount.


# Create a DataFrame.
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

Yields below output.


    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  25000   40days      2300
r3   Python  22000   35days      1200
r4   pandas  30000   50days      2000

2. Check Column Contains a Value in DataFrame

Use in operator on a Series to check if a column contains/exists a string value in a pandas DataFrame. df['Courses'] returns a Series object with all values from column Courses, pandas.Series.unique will return unique values of the Series object. Uniques are returned in order of appearance. The unique technique is based on hash tables. in operators return True when a value is found in a Series object.


# Check the value of index by unique values.
print('Spark' in df['Courses'].unique())

# Output:
True

We can use the in & not in operators on these values to check if a given element exists or not.


# Check the value of index by in parameter.
print('Spark' in set(df['Courses']))

# Output:
True

You can also check using in operator with pandas.DataFrame.values. This returns numpy.ndarray.


# Series can check the value in index by DataFrame.values.
print('Spark' in df['Courses'].values)

# Output:
True

3. Using pandas.Series.isin() to Check Column Contains Value

Pandas.Series.isin() function is used to check whether a column contains a list of multiple values. It returns a boolean Series showing each element in the Series matches an element in the passed sequence of values exactly.


# Check column contains Particular value of DataFrame by Pandas.Series.isin()
df=print(df['Courses'].isin(['Spark','Python']))

# Output:
r1     True
r2    False
r3     True
r4    False
Name: Courses, dtype: bool

4. Series.Str.contains() to Check Part of a value in Column

You can see how we can determine a pandas column contains a particular value of DataFrame using Series.Str.contains(). This contains() function is used to test the pattern or regex is contained within a string of a Series or Index.


# Column contains particular value by multiple strings.
print(df[df['Courses'].str.contains('ark')])

# Output:
    Courses    Fee Duration  Discount
r1    Spark  20000   30days      1000
r2  PySpark  25000   40days      2300

5. Complete examples of Checking Column Contains a Particular Value


# Create a DataFrame.
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Check the value of index by unique values.
print('Spark' in df['Courses'].unique())

# Check the value of index by in parameter.
print('Spark' in set(df['Courses']))

# Series can check the value in index by DataFrame.values.
print('Spark' in df['Courses'].values)

# Check column contains Particular value of DataFrame by Pandas.Series.isin()
print(df['Courses'].isin(['Spark','Python']))

# Column contains particular value by multiple strings.
print(df[df['Courses'].str.contains('ark')])

Conclusion

In this article, you have learned how to check if a DataFrame column contains/exists a part of a value with examples by using in & not in operators, pandas.Series.isin() and also check if multiple elements exist in DataFrame.

You May Also Like

References

Leave a Reply

Pandas Check Column Contains a Value in DataFrame