You are currently viewing Spark Filter startsWith(), endsWith() Examples

Spark filter startsWith() and endsWith() are used to search DataFrame rows by checking column value starts with and ends with a string, these methods are also used to filter not starts with and not ends with a string. Both these methods are from the Column class.

  • startsWith() – Returns Boolean value true when DataFrame column value starts with a string specified as an argument to this method, when not match returns false.
  • endsWith() – Returns Boolean True when DataFrame column value ends with a string specified as an argument to this method, when not match returns false.

In order to explain these with examples first, let’s create a DataFrame with some test data.

import spark.implicits._

val data = Seq((1,"James Smith"), (2,"Michael Rose"),
  (3,"Robert Williams"), (4,"Rames Rose"),(5,"Rames rose")
val df = data.toDF("id","name")

1. Spark Filter startsWith()

The startsWith() method lets you check whether the Spark DataFrame column string value starts with a string specified as an argument to this method. This method is case-sensitive. Below example returns, all rows from DataFrame that start with the string James on the name column.

// Spark Filter startsWith()
import org.apache.spark.sql.functions.col
| id|       name|
|  1|James Smith|

use below examples if you wanted to try NOT startsWith() (starts with) a string.

df.filter(! col("name").startsWith("James")).show()
df.filter( col("name").startsWith("James") === false).show()
| id|           name|
|  2|   Michael Rose|
|  3|Robert Williams|
|  4|     Rames Rose|
|  5|     Rames rose|

2. Spark Filter endsWith()

The endsWith() method lets you check whether the Spark DataFrame column string value ends with a string specified as an argument to this method. This method is case-sensitive. Below example returns, all rows from DataFrame that ends with the string Rose on the name column.

// Spark Filter endsWith()
| id|        name|
|  2|Michael Rose|
|  4|  Rames Rose|

Similarly for NOT endsWith() (ends with) a string.

// NOT ends with a string
df.filter(! col("name").endsWith("Rose")).show()
| id|           name|
|  1|    James Smith|
|  3|Robert Williams|
|  5|     Rames rose|

3. Using Spark SQL Expression

// Starts with a String
spark.sql("select * from DATA where name like 'James%'").show()
// NOT starts with a String
spark.sql("select * from DATA where name not like 'James%'").show()

Happy Learning !!