Spark Filter Using contains() Examples

Spread the love

In Spark & PySpark, contains() function is used to match a column value contains in a literal string (matches on part of the string), this is mostly used to filter rows on DataFrame.

  • contains() – This method checks if string specified as an argument contains in a DataFrame column if contains it returns true otherwise false.
  • This function is available in Column class.

You can also match by wildcard character using like() & match by regular expression by using rlike() functions.

In order to explain contains() with examples first, let’s create a DataFrame with some test data.

//Make sure you create a SparkSession object.
import spark.implicits._

val data = Seq((1,"James Smith"), (2,"Michael Rose"),
  (3,"Robert Williams"), (4,"Rames Rose"),(5,"Rames rose")
val df = data.toDF("id","name")

1. Filter DataFrame Column contains() in a String

The contains() method checks whether a DataFrame column string contains a string specified as an argument (matches on part of the string). Returns true if the string exists and false if not. Below example returns, all rows from DataFrame that contains string mes on the name column.

//Filter all rows that contains string 'mes' in a 'name' column
import org.apache.spark.sql.functions.col
| id|       name|
|  1|James Smith|
|  4| Rames Rose|
|  5| Rames rose|

//You can also use with like

If you wanted to filter by case insensitive refer to Spark rlike() function to filter by regular expression

2. Spark SQL contains() Example

//Using it on SQL to filter rows
spark.sql("select * from TAB where name like '%mes%'").show()

3. PySpark contains() Example

from pyspark.sql.functions import col


In this Spark, PySpark article, I have covered examples of how to filter DataFrame rows based on columns contains in a string with examples.

Happy Learning !!

Naveen (NNK) is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Spark Filter Using contains() Examples