• Post author:
  • Post category:Pandas
  • Post last modified:June 11, 2024
  • Reading time:17 mins read

Pandas Series.str.contains() method is used to check whether each string in a Series contains a specified substring or pattern. It returns a boolean Series indicating whether each element contains the specified substring or not.

Advertisements

In this article, I will explain the Pandas Series.str.contains() function by using its syntax, parameters, usage and how we can return a boolean Series indicating whether each element contains the pattern with examples.

Key Points –

  • Series.str.contains() efficiently checks if a pattern or substring exists within each element of a Pandas Series, providing a quick way to filter or manipulate data based on textual content.
  • By default, Series.str.contains() is case-sensitive, but you can make it case-insensitive by setting the case parameter to False.
  • You can use regular expressions for more complex pattern matching by setting the regex parameter to True.
  • The na parameter specifies the value to return for missing data, with the default being NaN.

Syntax of Pandas Series.str.contains() Function

Following is the syntax of the pandas Series.str.contains() function.


# Syntax of Series.str.contains() function
Series.str.contains(pat, case=True, flags=0, na=None, regex=True)

Parameters of the Series.str.contains()

Following are the parameters of the Series.str.contains() function.

  • pat – The string or regular expression pattern to search for.
  • case – (bool, default True) If True, the search is case-sensitive.
  • flags – (int, default 0) Flags to pass through to the regex engine (e.g., re.IGNORECASE).
  • na – Optional. Indicates how NA (null) values should be treated. The default is NaN.
  • regex – Optional. If True, treats the pattern as a regular expression. If False, treat it as a plain string. The default is True.

Return Value

The pandas.Series.str.contains() function returns a Series of booleans. This Series indicates whether each string element in the original Series contains the specified pattern or regular expression.

Create Pandas Series

Pandas Series can be created in several ways by using Python lists & dictionaries, below example create a Series from a dictionary. To use Pandas first, you need to import using import pandas as pd.


import pandas as pd

# Create Pandas Series
Courses = ['Spark', 'PySpark', 'Java', 'Pandas', 'Python','MongoDB']
series = pd.Series(Courses)
print("Original Series:\n",series)

Output:

Here’s an example of performing a basic substring search using Series.str.contains().


# Check if each string contains 'pa'
result = series.str.contains('pa')
print(result)

In this example, we have a Series series with six strings. We use series.str.contains(pa) to check if each string in the Series contains the substring pa. The result is a boolean Series where each value indicates whether the corresponding string contains ‘pa’ (True) or not (False). This example yields the below output.

Pandas Series.str.contains() using Case-Insensitive

To perform a case-insensitive search using Series.str.contains() simply, you can set the case parameter to False. This will ensure that the search ignores the case of the letters when matching the pattern.


# Perform a case-insensitive search for the substring 'pa'
contains_pa = series.str.contains('pa', case=False)
print(contains_pa)

# Output:
# 0     True
# 1     True
# 2    False
# 3     True
# 4    False
# 5    False
# dtype: bool

Here,

  • The first two elements (Spark and PySpark) contain the substring pa (regardless of case), so their corresponding values in the contains_pa Series are True.
  • The third element (Java) does not contain the substring pa, so its corresponding value in the contains_pa Series is False.
  • The fourth element (Pandas) contains the substring Pa (applying case-insensitive), so its corresponding value in the contains_pa Series is True.
  • The fifth and sixth elements (Python and MongoDB) do not contain the substring pa, so their corresponding values in the contains_pa Series are False.

Using Regular Expressions

Alternatively. you can use regular expressions with Series.str.contains() in Pandas by setting the regex parameter to True. This allows for more complex pattern matching within each element of the Series. Utilizing regular expressions with Series.str.contains() to check if each string contains either a or e followed by i.


# Check if each string contains 'a' or 'e' followed by 'i'
result = series.str.contains(r'a|e.*i', regex=True)
print(result)

# Output:
# 0     True
# 1     True
# 2     True
# 3     True
# 4    False
# 5    False
# dtype: bool

Here,

  • The r'a|e.*i' is a regular expression pattern.
  • a|e.*i matches either ‘a’ or ‘e’ followed by any number of characters (.*) and then ‘i’.
  • The regex=True parameter ensures that the pattern is treated as a regular expression.
  • Series.str.contains() checks each element of the Series (series) for a match with the regular expression pattern.
  • The result is a boolean Series (result) indicating whether each string contains the specified pattern.

Using Flags with Regular Expressions

Using flags with regular expressions in Series.str.contains() allows you to modify the behavior of the regular expression matching. For example, you can use flags to make the matching case insensitive or to enable multiline matching.


import pandas as pd
import re

# Create Pandas Series
Courses = ['Spark', 'PySpark', 'Java', 'Pandas', 'Python','MongoDB']
series = pd.Series(Courses)

# Perform a case-insensitive search for the substring 'pa'
contains_pa = series.str.contains('pa', flags=re.IGNORECASE, regex=True)
print(contains_pa)

# Output:
# 0     True
# 1     True
# 2    False
# 3     True
# 4    False
# 5    False
# dtype: bool

Here,

  • We use the re.IGNORECASE flag to perform a case-insensitive search.
  • The regular expression pa will match both pa and Pa in the strings.
  • The flags=re.IGNORECASE argument is passed to Series.str.contains() to enable case-insensitive matching.

Using an Alternative Pattern Matching Mode

You can use alternative pattern-matching modes, such as regular expressions, with Series.str.contains(). For instance, we use regex=True to enable regular expression mode. The pattern Py matches any string that contains the letter Py followed by any character. As a result, the boolean Series indicates whether each string in the Series satisfies this pattern.


# Check if each string contains 'Py' followed by any character
result = series.str.contains('Py', regex=True)
print(result)

# Output:
# 0    False
# 1     True
# 2    False
# 3    False
# 4     True
# 5    False
# dtype: bool

Handling Missing Values

Similarly, handling missing values when using Series.str.contains() in Pandas involves specifying how to treat these missing values within the operation. You can use the na parameter to define the behavior when encountering missing values in the Series.


import pandas as pd
import numpy as np

# Create a Pandas Series with missing values
courses = ['Spark', np.nan, 'Java', 'Pandas', 'Python', 'MongoDB']
series = pd.Series(courses)

# Check if each string contains 'a'
result = series.str.contains('a', na=False)
print(result)

# Output:
#0     True
#1    False
#2     True
#3     True
#4    False
#5    False
#dtype: bool

Here,

  • We’ve used np.nan to represent missing values in the Series.
  • The na=False parameter tells Pandas to treat missing values as False when performing the operation. Any element with a missing value will have a corresponding False value in the result.

FAQ on Pandas Series.str.contains()

What does pandas.Series.str.contains() do?

pandas.Series.str.contains() checks each string in a Series for a specified substring or pattern (using a regular expression by default) and returns a Series of booleans indicating whether the pattern was found.

What is the purpose of Series.str.contains() in Pandas?

Series.str.contains() allows you to check if each element of a Pandas Series contains a specified pattern or substring. It’s commonly used for filtering data, data cleaning, and text analysis tasks.

Can I use a regular expression with str.contains()?

By default str.contains() treats the pattern as a regular expression. If you want to search for a literal string, set the regex parameter to False.

Does Series.str.contains() support case-insensitive searches?

Series.str.contains() in Pandas supports case-insensitive searches. You can perform case-insensitive searches by setting the case parameter to False. This instructs Pandas to ignore the case of the letters when matching the pattern.

How do I handle missing values in the Series?

Handling missing values in a Series when using the pandas.Series.str.contains() function can be done by specifying the na parameter. This parameter allows you to set a fill value for missing data.

Conclusion

In this article, you have learned Pandas Series str.contains() function is a powerful way to search for patterns or substrings within strings in a Series object. This function returns a boolean Series indicating whether each element of the Series contains the specified pattern or substring with examples.

Happy Learning!!