You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd.series()
, in
operator, pandas.series.isin()
, str.contains()
methods and many more. In this article, I will explain how to check if a column contains a particular value with examples. These return True
when a value contains in a specified column, False
when not found.
1. Quick Examples of Pandas Column Contains Particular value of DataFrame
If you are in a hurry, below are some quick examples of how to check if a pandas DataFrame column contains/exists a particular string value or a list of values.
# Below are some quick examples.
# Check Column Contains a Value in DataFrame
print('Spark' in df['Courses'].unique())
# Check Column Contains a Value in DataFrame
print('Spark' in set(df['Courses']))
# Using DataFrame.values.
print('Spark' in df['Courses'].values)
# Check column contains Particular value of DataFrame
# using Pandas.Series.isin()
print(df['Courses'].isin(['Spark','Python']))
# Column contains in multiple strings.
print(df[df['Courses'].str.contains('ark')])
Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create a DataFrame.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
Yields below output.
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 25000 40days 2300
r3 Python 22000 35days 1200
r4 pandas 30000 50days 2000
2. Check Column Contains a Value in DataFrame
Use in
operator on a Series to check if a column contains/exists a string value in a pandas DataFrame. df['Courses']
returns a Series object with all values from column Courses
, pandas.Series.unique
will return unique values of the Series object. Uniques are returned in order of appearance. The unique technique is based on hash tables. in
operators return True
when a value is found in a Series object.
# Check the value of index by unique values.
print('Spark' in df['Courses'].unique())
# Output:
True
We can use the in & not in operators on these values to check if a given element exists or not.
# Check the value of index by in parameter.
print('Spark' in set(df['Courses']))
# Output:
True
You can also check using in
operator with pandas.DataFrame.values
. This returns numpy.ndarray
.
# Series can check the value in index by DataFrame.values.
print('Spark' in df['Courses'].values)
# Output:
True
3. Using pandas.Series.isin() to Check Column Contains Value
Pandas.Series.isin()
function is used to check whether a column contains a list of multiple values. It returns a boolean Series showing each element in the Series matches an element in the passed sequence of values exactly.
# Check column contains Particular value of DataFrame by Pandas.Series.isin()
df=print(df['Courses'].isin(['Spark','Python']))
# Output:
r1 True
r2 False
r3 True
r4 False
Name: Courses, dtype: bool
4. Series.Str.contains() to Check Part of a value in Column
You can see how we can determine a pandas column contains a particular value of DataFrame using Series.Str.contains()
. This contains()
function is used to test the pattern or regex is contained within a string of a Series or Index.
# Column contains particular value by multiple strings.
print(df[df['Courses'].str.contains('ark')])
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30days 1000
r2 PySpark 25000 40days 2300
5. Complete examples of Checking Column Contains a Particular Value
# Create a DataFrame.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Check the value of index by unique values.
print('Spark' in df['Courses'].unique())
# Check the value of index by in parameter.
print('Spark' in set(df['Courses']))
# Series can check the value in index by DataFrame.values.
print('Spark' in df['Courses'].values)
# Check column contains Particular value of DataFrame by Pandas.Series.isin()
print(df['Courses'].isin(['Spark','Python']))
# Column contains particular value by multiple strings.
print(df[df['Courses'].str.contains('ark')])
Conclusion
In this article, you have learned how to check if a DataFrame column contains/exists a part of a value with examples by using in
& not in
operators, pandas.Series.isin()
and also check if multiple elements exist in DataFrame.
Related Articles
- How to Add an Empty Column to a Pandas DataFrame
- How to Combine Two Series into pandas DataFrame
- Install pandas on Windows Step-by-Step
- Convert Index to Column in Pandas DataFrame
- Replace NaN Values with Zeroes in a Column of a Pandas DataFrame
- How to Convert pandas Column to List
- Pandas Add Column based on Another Column
- Pandas Split Column into Two Columns