• Post author:
  • Post category:PySpark
  • Post last modified:May 2, 2025
  • Reading time:19 mins read

In PySpark, understanding the concept of like() vs rlike() vs ilike() is essential, especially when working with text data. You can use these functions to filter rows based on specific patterns, such as checking if a name contains both uppercase and lowercase letters or ends with a certain keyword. PySpark makes it easy to handle such cases with its powerful set of string functions.

Advertisements

In this article, I will explain the like(), rlike(), and ilike() functions in PySpark, how they are used for processing text data, and the key differences between them.

  • like() – Performs SQL-style pattern matching
  • rlike() – Uses regular expressions for pattern matching
  • ilike() – Performs case-insensitive SQL-style pattern matching

By the end of this article, you will understand:

  • How each function works
  • The main differences between them
  • Practical examples for real-world use cases
  • Best practices for effective usage
  • Answers to frequently asked questions to clear up common doubts

Key points-

  • Use like() for SQL-style wildcard matching with % and _.
  • like() is case-sensitive, mind the letter casing.
  • Use rlike() for powerful regex-based pattern matching.
  • rlike() supports expressions like (?i) to ignore case.
  • Use ilike() for case-insensitive SQL-style matching (Spark 3.0+).
  • ilike() simplifies matching without regex when case doesn’t matter.
  • Use filter() or where() with these functions to apply conditions.
  • Cast numeric columns to strings before pattern matching.
  • rlike() is best for complex or conditional text patterns.
  • Always choose the function that best fits your case sensitivity and pattern complexity.

like() – SQL-style Pattern Matching

The like() function is used to perform SQL-style pattern matching in PySpark. It checks if a string column matches a specified pattern using wildcard characters.

Syntax

Following is the syntax of the SQL like().


# Syntax of the like() function
col("column_name").like("pattern")

Parameters

pattern (str): SQL-style string pattern using %(any number of characters) and _ (single character)wildcards.

Return Value

It returns a column of boolean values. Each value is True if the corresponding row’s value matches based on the pattern, otherwise False.

Usage of like()

You can use the like() function in PySpark to filter rows in a DataFrame by applying SQL-style pattern matching to specific columns. Let’s create a DataFrame and apply the like() function to the target column (where you want to match string patterns using SQL expressions) to retrieve the filtered rows.


# Use like() function to filter rows of DataFrame
# Create SparkSession
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

# Create data frame
data = [(1,"James Smith"), (2,"Michael Rose"),
  (3,"Robert Williams"), (4,"Rames Rose"),(5,"Rames rose")]
df = spark.createDataFrame(data=data,schema=["id","name"])
df.show()

# Filter where name contains Rose string
# Import col from sql functions in pyspark
from pyspark.sql.functions import col
df.filter(col("name").like("%Rose%")).show()

Yields below the output.

PySpark like() vs rlike() vs ilike()

From the above code, the function filters the DataFrame rows based on a case-sensitive match.

rlike() – Regex Matching

In PySpark, the rlike() function is used to apply regular expressions to string columns for advanced pattern matching. It accepts a regex string as a parameter and returns a Boolean column indicating whether each row matches the expression. By default, rlike() is case-sensitive. To make it case-insensitive, you must modify the regex pattern (e.g., using (?i) inline flag). Compared to like(), rlike() offers greater power and flexibility.

rlike() Syntax

Following is a syntax of rlike() function.


# Syntax of the rike() function
col("column_name").rlike("regex_pattern")

Parameters

regex_pattern (str): A valid regular expression pattern.

Return Value

Returns a boolean column where each element in the Column is matched by an extended regex expression.

Usage

Let’s see an example of using rlike() to evaluate a regular expression. In the below examples, I use rlike() function to filter the PySpark DataFrame rows by matching patterns with regular expressions, including case-insensitive matching and filtering rows based on specific column values.


# Match names ending with 'rose', case-insensitive

from pyspark.sql.functions import col

# Filter rows by cheking value contains in anohter column by ignoring case
df.filter(col("name").rlike("(?i)rose")).show()

Yields below the output.


# Output:
+---+------------+
| id|        name|
+---+------------+
|  2|Michael Rose|
|  4|  Rames Rose|
|  5|  Rames rose|
+---+------------+

If you don’t use a case-insensitive flag, rlike() matches patterns in a case-sensitive manner by default.


# Match names ending with 'rose', case-sensitive
df.filter(col("name").rlike("rose")).show()

Yields below the output.


# Output:

+---+----------+
| id|      name|
+---+----------+
|  5|Rames rose|
+---+----------+

ilike()-Case-insensitive SQL-style Pattern Matching

The ilike() function is similar to like() but performs case-insensitive matching. This is useful when you want simple wildcard matching without worrying about letter casing.

ilike() Syntax

Following is the syntax of ilike().


# Syntax of the ilike() function
col("column_name").ilike("pattern")

Parameters

pattern (str): SQL-style string pattern using % and _matched case-insensitively.

Return Value

Returns a column of boolean values.

Usage of ilike() Function

Let’s apply the ilike() function on PySpark DataFrame columns to perform case-insensitive SQL-style pattern matching. This function filters rows based on case-insensitive string patterns using SQL expressions.


# Case-insensitive match using SQL-style wildcard
from pyspark.sql.functions import col

df.filter(col("name").ilike("%rose%"))

Yields below the output.


# Output:

+---+------------+
| id|        name|
+---+------------+
|  2|Michael Rose|
|  4|  Rames Rose|
|  5|  Rames rose|
+---+------------+

Differences Between PySpark like(), rlike(), and like()

The table below highlights the key differences among these functions to help you understand when to use each one effectively.

PySpark like() vs rlike() vs ilike()

Best Practices

  • Always cast numeric columns to string before using these functions
  • Use like() for simple wildcard matching when case matters.
  • Use ilike() for case-insensitive SQL-style matching (Spark 3.0+).
  • Use rlike() for advanced or flexible pattern matching with regex.

Example:

You can convert the numeric column to a string before filtering rows using pattern-matching functions such as like(), rlike(), or ilike().


# Filter where id ends with 5
from pyspark.sql.functions import col

df.filter(col("id").cast("string").like("%5")).show()

Yields below the output.


# Output:
+---+----------+
| id|      name|
+---+----------+
|  5|Rames rose|
+---+----------+

Frequently Asked Questions of PySpark like() vs rlike() vs ilike()

What is the difference between like() and rlike() in PySpark?

like() uses SQL-style pattern matching with % and _.
rlike() uses regular expressions, which are more powerful and flexible.

Is like() case-sensitive in PySpark?

Yes, like() is case-sensitive. For example, %Rose% will not match rose.

How can I perform a case-insensitive pattern match in PySpark?

Use either:
ilike() for SQL-style patterns (Spark 3.0+), or
rlike("(?i)pattern") for regex-based matching.

What does ilike() do in PySpark?

ilike() is the case-insensitive version of like(), introduced in Spark 3.0+. It uses SQL-style wildcards.

What wildcard characters does like() or ilike() support?

% → matches any sequence of characters
_ → matches a single character

How can I use regex in like() or ilike()?

like() and ilike() only support SQL wildcards. Use rlike() for regex.

How do I check if a string ends with “rose”, regardless of case?

Use: col("name").rlike("(?i)rose$")
This matches strings ending with “rose”, case-insensitively.

Which function should I use for the most flexible pattern matching?

Use rlike() – it supports full regular expressions and can handle complex patterns.

Is like() available in all versions of PySpark?

No. ilike() is available only in Spark 3.0 and above.

Conclusion

In this article, I have explained the concept of PySpark like()vs rlike()vs ilike() for filtering DataFrame rows based on string patterns. Choosing the appropriate function is crucial when working with textual data, as each one handles pattern matching differently. While all three are used to match string patterns, they vary in syntax, pattern type (SQL-style vs. regex), and case sensitivity, making it important to select the one that best fits your specific use case.

Happy Learning!!

Leave a Reply