You are currently viewing Spark Filter startsWith(), endsWith() Examples

Spark filter startsWith() and endsWith() are used to search DataFrame rows by checking column value starts with and ends with a string, these methods are also used to filter not starts with and not ends with a string. Both these methods are from the Column class.

  • startsWith() – Returns Boolean value true when DataFrame column value starts with a string specified as an argument to this method, when not match returns false.
  • endsWith() – Returns Boolean True when DataFrame column value ends with a string specified as an argument to this method, when not match returns false.

In order to explain these with examples first, let’s create a DataFrame with some test data.


import spark.implicits._

val data = Seq((1,"James Smith"), (2,"Michael Rose"),
  (3,"Robert Williams"), (4,"Rames Rose"),(5,"Rames rose")
)
val df = data.toDF("id","name")

1. Spark Filter startsWith()

The startsWith() method lets you check whether the Spark DataFrame column string value starts with a string specified as an argument to this method. This method is case-sensitive. Below example returns, all rows from DataFrame that start with the string James on the name column.


// Spark Filter startsWith()
import org.apache.spark.sql.functions.col
df.filter(col("name").startsWith("James")).show()
+---+-----------+
| id|       name|
+---+-----------+
|  1|James Smith|
+---+-----------+

use below examples if you wanted to try NOT startsWith() (starts with) a string.


df.filter(! col("name").startsWith("James")).show()
df.filter( col("name").startsWith("James") === false).show()
+---+---------------+
| id|           name|
+---+---------------+
|  2|   Michael Rose|
|  3|Robert Williams|
|  4|     Rames Rose|
|  5|     Rames rose|
+---+---------------+

2. Spark Filter endsWith()

The endsWith() method lets you check whether the Spark DataFrame column string value ends with a string specified as an argument to this method. This method is case-sensitive. Below example returns, all rows from DataFrame that ends with the string Rose on the name column.


// Spark Filter endsWith()
df.filter(col("name").endsWith("Rose")).show()
+---+------------+
| id|        name|
+---+------------+
|  2|Michael Rose|
|  4|  Rames Rose|
+---+------------+

Similarly for NOT endsWith() (ends with) a string.


// NOT ends with a string
df.filter(! col("name").endsWith("Rose")).show()
df.filter(col("name").endsWith("Rose")==false).show()
+---+---------------+
| id|           name|
+---+---------------+
|  1|    James Smith|
|  3|Robert Williams|
|  5|     Rames rose|
+---+---------------+

3. Using Spark SQL Expression


df.createOrReplaceTempView("DATA")
// Starts with a String
spark.sql("select * from DATA where name like 'James%'").show()
// NOT starts with a String
spark.sql("select * from DATA where name not like 'James%'").show()

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium