You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd.series()
, in
operator, pandas.series.isin()
, str.contains()
methods and many more. In this article, I will explain how to check if a column contains a particular value with examples. These return True
when a value contains in a specified column, False
when not found.
1. Quick Examples of Pandas Column Contains Particular value of DataFrame
If you are in a hurry, below are some quick examples of how to check if a pandas DataFrame column contains/exists a particular string value or a list of values.
# Below are some quick examples.
# Check Column Contains a Value in DataFrame
print('Spark' in df['Courses'].unique())
# Check Column Contains a Value in DataFrame
print('Spark' in set(df['Courses']))
# Using DataFrame.values.
print('Spark' in df['Courses'].values)
# Check column contains Particular value of DataFrame
# Using Pandas.Series.isin()
print(df['Courses'].isin(['Spark','Python']))
# Column contains in multiple strings.
print(df[df['Courses'].str.contains('ark')])
Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create a DataFrame.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 25000 40days 2300
r3 Python 22000 35days 1200
r4 pandas 30000 50days 2000
2. Check Column Contains a Value in DataFrame
Use in
operator on a Series to check if a column contains/exists a string value in a pandas DataFrame. df['Courses']
returns a Series object with all values from column Courses
, pandas.Series.unique
will return unique values of the Series object. Uniques are returned in order of appearance. The unique technique is based on hash tables. in
operators return True
when a value is found in a Series object.
# Check the value of index by unique values.
print('Spark' in df['Courses'].unique())
# Output:
# True
We can use the in & not in operators on these values to check if a given element exists or not.
# Check the value of index by in parameter.
print('Spark' in set(df['Courses']))
# Output:
# True
You can also check using in
operator with pandas.DataFrame.values
. This returns numpy.ndarray
.
# Series can check the value in index by DataFrame.values.
print('Spark' in df['Courses'].values)
# Output:
# True
3. Using pandas.Series.isin() to Check Column Contains Value
Pandas.Series.isin()
function is used to check whether a column contains a list of multiple values. It returns a boolean Series showing each element in the Series matches an element in the passed sequence of values exactly.
# Check column contains Particular value of DataFrame by Pandas.Series.isin()
df=print(df['Courses'].isin(['Spark','Python']))
# Output:
# r1 True
# r2 False
# r3 True
# r4 False
# Name: Courses, dtype: bool
4. Series.Str.contains() to Check Part of a value in Column
You can see how we can determine a pandas column contains a particular value of DataFrame using Series.Str.contains()
. This contains()
function is used to test the pattern or regex is contained within a string of a Series or Index.
# Column contains particular value by multiple strings.
print(df[df['Courses'].str.contains('ark')])
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# r2 PySpark 25000 40days 2300
5. Complete examples of Checking Column Contains a Particular Value
# Create a DataFrame.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Check the value of index by unique values.
print('Spark' in df['Courses'].unique())
# Check the value of index by in parameter.
print('Spark' in set(df['Courses']))
# Series can check the value in index by DataFrame.values.
print('Spark' in df['Courses'].values)
# Check column contains Particular value of DataFrame by Pandas.Series.isin()
print(df['Courses'].isin(['Spark','Python']))
# Column contains particular value by multiple strings.
print(df[df['Courses'].str.contains('ark')])
Conclusion
In this article, you have learned how to check if a DataFrame column contains/exists a part of a value with examples by using in
& not in
operators, pandas.Series.isin()
and also check if multiple elements exist in DataFrame.
Related Articles
- How to Add an Empty Column to a Pandas DataFrame
- How to Combine Two Series into pandas DataFrame
- Install pandas on Windows Step-by-Step
- Convert Index to Column in Pandas DataFrame
- Replace NaN Values with Zeroes in a Column of a Pandas DataFrame
- How to Convert pandas Column to List
- Pandas Add Column based on Another Column
- Pandas Split Column into Two Columns