You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd.series()
, in
operator, pandas.series.isin()
, str.contains()
methods and many more.
In this article, I will explain how to check if a column contains a particular value with examples. These return True
when a value contains in a specified column, False
when not found.
Key Points –
- The
isin()
method is a simple way to check if a column contains any value from a list, returning a boolean Series. - For partial string matches or substring checks, use the
str.contains()
method, which can also handle regex for pattern matching. - The
unique()
method provides the unique values in a column, which can be useful when checking for the presence of a specific value. - Accessing a column as a NumPy array using
.values
allows checking for a value directly with Python’sin
operator. - When using
str.contains()
, you can control case sensitivity by passing thecase
parameter (e.g.,case=False
for case-insensitive matching). - Using methods like
isin()
andstr.contains()
returns a boolean Series, which can be used for filtering rows directly.
Quick Examples of Pandas Column Contains Particular value of DataFrame
If you are in a hurry, below are some quick examples of how to check if a pandas DataFrame column contains/exists a particular string value or a list of values.
# Quick examples of pandas column contains a value
# Check Column Contains a Value in DataFrame
print('Spark' in df['Courses'].unique())
# Check Column Contains a Value in DataFrame
print('Spark' in set(df['Courses']))
# Using DataFrame.values
print('Spark' in df['Courses'].values)
# Check column contains Particular value of DataFrame
# Using Pandas.Series.isin()
print(df['Courses'].isin(['Spark','Python']))
# Column contains in multiple strings.
print(df[df['Courses'].str.contains('ark')])
Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create a DataFrame.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n", df)
Yields below output.
Check Column Contains a Value in DataFrame
Use in
operator on a Series to check if a column contains/exists a string value in a pandas DataFrame. df['Courses']
returns a Series object with all values from column Courses
, pandas.Series.unique
will return unique values of the Series object. Uniques are returned in order of appearance. The unique technique is based on hash tables. in
operators return True
when a value is found in a Series object.
In the below example, the unique values in the ‘Courses’ column using the unique()
method. Then, it checks if the string ‘Spark’ is present in the array of unique values and prints the result.
# Check the value of index by unique values.
print('Spark' in df['Courses'].unique())
# Output:
# True
We can use the in & not in operators on these values to check if a given element exists or not. For instance, first, create a set of unique values in the ‘Courses’ column using the set()
function. Then, it checks if the string ‘Spark’ is present in the set of unique values and prints the result.
# Check the value of index by in parameter.
print('Spark' in set(df['Courses']))
# Output:
# True
You can also check using in
operator with pandas.DataFrame.values
. This returns numpy.ndarray
. For instance, this program directly checks if the string ‘Spark’ is present in the underlying NumPy array (values) of the ‘Courses’ column. If ‘Spark’ is present in the values, the print
statement will output True
; otherwise, it will output False
.
# Series can check the value in index by DataFrame.values.
print('Spark' in df['Courses'].values)
# Output:
# True
Using pandas.Series.isin() to Check Column Contains Value
Pandas.Series.isin()
function is used to check whether a column contains a list of multiple values. It returns a boolean Series showing each element in the Series matches an element in the passed sequence of values exactly.
In the below example, contains_spark
will be a boolean Series where each element indicates whether the corresponding value in the ‘Courses’ column is equal to ‘Spark’, or ‘Python’.
# Check column contains Particular value of DataFrame by Pandas.Series.isin()
contains_spark = df['Courses'].isin(['Spark','Python'])
print(contains_spark)
# Output:
# r1 True
# r2 False
# r3 True
# r4 False
# Name: Courses, dtype: bool
Series.Str.contains() to Check Part of a value in Column
You can see how we can determine a pandas column contains a particular value of DataFrame using Series.Str.contains()
. This contains()
function is used to test the pattern or regex is contained within a string of a Series or Index.
# Column contains particular value by multiple strings.
print(df[df['Courses'].str.contains('ark')])
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# r2 PySpark 25000 40days 2300
Complete examples of Checking Column Contain a Particular Value
# Create a DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Python","pandas"],
'Fee' :[20000,25000,22000,30000],
'Duration':['30days','40days','35days','50days'],
'Discount':[1000,2300,1200,2000]
}
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Check the value of index by unique values
print('Spark' in df['Courses'].unique())
# Check the value of index by in parameter
print('Spark' in set(df['Courses']))
# Series can check the value in index by DataFrame.values
print('Spark' in df['Courses'].values)
# Check column contains Particular value of DataFrame by Pandas.Series.isin()
print(df['Courses'].isin(['Spark','Python']))
# Column contains particular value by multiple strings
print(df[df['Courses'].str.contains('ark')])
Frequently Asked Questions on Pandas contains Column Value
To check if a specific value exists in a column of a Pandas DataFrame, you can use the isin()
method.
You can check if multiple values exist in a column using the isin()
method in pandas. For example, contains_values
will be a boolean Series where each element indicates whether the corresponding value in the ‘Courses’ column is equal to either ‘Python’ or ‘Java’.
If you want to check if a column contains a substring, you can use the str.contains()
method in pandas. For example, contains_substring
will be a boolean Series where each element indicates whether the substring ‘Spark’ is present in the corresponding value of the ‘Courses’ column.
You can perform a case-insensitive check using the str.contains()
method in pandas. You can achieve this by setting the case
parameter to False
.
To negate the condition and filter rows where the column does not contain a specific value, you can use the ~
(tilde) operator along with the condition.
To check if any value in a column is missing (NaN), you can use the isna()
method in pandas. For example, contains_missing
will be a boolean Series where each element indicates whether the corresponding value in the ‘Courses’ column is missing (NaN).
Conclusion
In this article, you have learned how to check if a DataFrame column contains/exists a part of a value with examples by using in
& not in
operators, pandas.Series.isin()
and also check if multiple elements exist in DataFrame.
Related Articles
- Install pandas on Windows Step-by-Step
- Convert Index to Column in Pandas DataFrame
- How to Convert pandas Column to List
- Pandas Add Column based on Another Column
- Pandas Split Column into Two Columns
- Pandas Difference Between loc[] vs iloc[]
- How to Convert List to Pandas Series
- How to Plot Columns of Pandas DataFrame
- Pandas Select Multiple Columns in DataFrame
- How to Add an Empty Column to a Pandas DataFrame
- How to Combine Two Series into pandas DataFrame
- Replace NaN Values with Zeroes in a Column of a Pandas DataFrame