Pandas Series.str.contains()
method is used to check whether each string in a Series contains a specified substring or pattern. It returns a boolean Series indicating whether each element contains the specified substring or not.
In this article, I will explain the Pandas Series.str.contains()
function by using its syntax, parameters, usage and how we can return a boolean Series indicating whether each element contains the pattern with examples.
Key Points –
Series.str.contains()
efficiently checks if a pattern or substring exists within each element of a Pandas Series, providing a quick way to filter or manipulate data based on textual content.- By default,
Series.str.contains()
is case-sensitive, but you can make it case-insensitive by setting thecase
parameter toFalse
. - You can use regular expressions for more complex pattern matching by setting the
regex
parameter toTrue
. - The
na
parameter specifies the value to return for missing data, with the default being NaN.
Syntax of Pandas Series.str.contains() Function
Following is the syntax of the pandas Series.str.contains() function.
# Syntax of Series.str.contains() function
Series.str.contains(pat, case=True, flags=0, na=None, regex=True)
Parameters of the Series.str.contains()
Following are the parameters of the Series.str.contains() function.
pat
– The string or regular expression pattern to search for.case
– (bool, default True) If True, the search is case-sensitive.flags
– (int, default 0) Flags to pass through to the regex engine (e.g.,re.IGNORECASE
).na
– Optional. Indicates how NA (null) values should be treated. The default is NaN.regex
– Optional. If True, treats the pattern as a regular expression. If False, treat it as a plain string. The default isTrue
.
Return Value
The pandas.Series.str.contains()
function returns a Series of booleans. This Series indicates whether each string element in the original Series contains the specified pattern or regular expression.
Create Pandas Series
Pandas Series can be created in several ways by using Python lists & dictionaries, below example create a Series from a dictionary. To use Pandas first, you need to import using import pandas as pd
.
import pandas as pd
# Create Pandas Series
Courses = ['Spark', 'PySpark', 'Java', 'Pandas', 'Python','MongoDB']
series = pd.Series(Courses)
print("Original Series:\n",series)
Output:
Here’s an example of performing a basic substring search using Series.str.contains()
.
# Check if each string contains 'pa'
result = series.str.contains('pa')
print(result)
In this example, we have a Series series
with six strings. We use series.str.contains(pa)
to check if each string in the Series contains the substring pa
. The result is a boolean Series where each value indicates whether the corresponding string contains ‘pa’ (True) or not (False). This example yields the below output.
Pandas Series.str.contains() using Case-Insensitive
To perform a case-insensitive search using Series.str.contains()
simply, you can set the case
parameter to False
. This will ensure that the search ignores the case of the letters when matching the pattern.
# Perform a case-insensitive search for the substring 'pa'
contains_pa = series.str.contains('pa', case=False)
print(contains_pa)
# Output:
# 0 True
# 1 True
# 2 False
# 3 True
# 4 False
# 5 False
# dtype: bool
Here,
- The first two elements (
Spark
andPySpark
) contain the substringpa
(regardless of case), so their corresponding values in thecontains_pa
Series areTrue
. - The third element (
Java
) does not contain the substringpa
, so its corresponding value in thecontains_pa
Series isFalse
. - The fourth element (
Pandas
) contains the substringPa
(applying case-insensitive), so its corresponding value in thecontains_pa
Series isTrue
. - The fifth and sixth elements (
Python
andMongoDB
) do not contain the substringpa
, so their corresponding values in thecontains_pa
Series areFalse
.
Using Regular Expressions
Alternatively. you can use regular expressions with Series.str.contains()
in Pandas by setting the regex
parameter to True
. This allows for more complex pattern matching within each element of the Series. Utilizing regular expressions with Series.str.contains()
to check if each string contains either a
or e
followed by i
.
# Check if each string contains 'a' or 'e' followed by 'i'
result = series.str.contains(r'a|e.*i', regex=True)
print(result)
# Output:
# 0 True
# 1 True
# 2 True
# 3 True
# 4 False
# 5 False
# dtype: bool
Here,
- The
r'a|e.*i'
is a regular expression pattern. a|e.*i
matches either ‘a’ or ‘e’ followed by any number of characters (.*
) and then ‘i’.- The
regex=True
parameter ensures that the pattern is treated as a regular expression. Series.str.contains()
checks each element of the Series (series
) for a match with the regular expression pattern.- The result is a boolean Series (
result
) indicating whether each string contains the specified pattern.
Using Flags with Regular Expressions
Using flags with regular expressions in Series.str.contains()
allows you to modify the behavior of the regular expression matching. For example, you can use flags to make the matching case insensitive or to enable multiline matching.
import pandas as pd
import re
# Create Pandas Series
Courses = ['Spark', 'PySpark', 'Java', 'Pandas', 'Python','MongoDB']
series = pd.Series(Courses)
# Perform a case-insensitive search for the substring 'pa'
contains_pa = series.str.contains('pa', flags=re.IGNORECASE, regex=True)
print(contains_pa)
# Output:
# 0 True
# 1 True
# 2 False
# 3 True
# 4 False
# 5 False
# dtype: bool
Here,
- We use the
re.IGNORECASE
flag to perform a case-insensitive search. - The regular expression
pa
will match bothpa
andPa
in the strings. - The
flags=re.IGNORECASE
argument is passed toSeries.str.contains()
to enable case-insensitive matching.
Using an Alternative Pattern Matching Mode
You can use alternative pattern-matching modes, such as regular expressions, with Series.str.contains()
. For instance, we use regex=True
to enable regular expression mode. The pattern Py
matches any string that contains the letter Py
followed by any character. As a result, the boolean Series indicates whether each string in the Series satisfies this pattern.
# Check if each string contains 'Py' followed by any character
result = series.str.contains('Py', regex=True)
print(result)
# Output:
# 0 False
# 1 True
# 2 False
# 3 False
# 4 True
# 5 False
# dtype: bool
Handling Missing Values
Similarly, handling missing values when using Series.str.contains()
in Pandas involves specifying how to treat these missing values within the operation. You can use the na
parameter to define the behavior when encountering missing values in the Series.
import pandas as pd
import numpy as np
# Create a Pandas Series with missing values
courses = ['Spark', np.nan, 'Java', 'Pandas', 'Python', 'MongoDB']
series = pd.Series(courses)
# Check if each string contains 'a'
result = series.str.contains('a', na=False)
print(result)
# Output:
#0 True
#1 False
#2 True
#3 True
#4 False
#5 False
#dtype: bool
Here,
- We’ve used
np.nan
to represent missing values in the Series. - The
na=False
parameter tells Pandas to treat missing values asFalse
when performing the operation. Any element with a missing value will have a correspondingFalse
value in the result.
FAQ on Pandas Series.str.contains()
pandas.Series.str.contains()
checks each string in a Series for a specified substring or pattern (using a regular expression by default) and returns a Series of booleans indicating whether the pattern was found.
Series.str.contains() allows you to check if each element of a Pandas Series contains a specified pattern or substring. It’s commonly used for filtering data, data cleaning, and text analysis tasks.
By default str.contains()
treats the pattern as a regular expression. If you want to search for a literal string, set the regex
parameter to False
.
Series.str.contains() in Pandas supports case-insensitive searches. You can perform case-insensitive searches by setting the case parameter to False. This instructs Pandas to ignore the case of the letters when matching the pattern.
Handling missing values in a Series when using the pandas.Series.str.contains()
function can be done by specifying the na
parameter. This parameter allows you to set a fill value for missing data.
Conclusion
In this article, you have learned Pandas Series str.contains()
function is a powerful way to search for patterns or substrings within strings in a Series object. This function returns a boolean Series indicating whether each element of the Series contains the specified pattern or substring with examples.
Happy Learning!!
Related Articles
- Pandas Series map() Function
- Pandas Series.diff() Function
- Pandas Series.quantile() Function
- Pandas Series.shift() Function
- Pandas Series any() Function
- Pandas Series.isin() Function
- Pandas Series.rolling() Function
- How to Rename a Pandas Series
- Pandas Series count() Function
- What is a Pandas Series Explained With Examples