Spark Filter – startsWith(), endsWith() Examples

Spark filter startsWith() and endsWith() are used to search DataFrame rows by checking column value starts with and ends with a string, these methods are also used to filter not starts with and not ends with a string. Both these methods are from the Column class.

  • startsWith() – Returns Boolean value true when DataFrame column value starts with a string specified as an argument to this method, when not match returns false.
  • endsWith() – Returns Boolean True when DataFrame column value ends with a string specified as an argument to this method, when not match returns false.

In order to explain these with examples first, let’s create a DataFrame with some test data.


import spark.implicits._

val data = Seq((1,"James Smith"), (2,"Michael Rose"),
  (3,"Robert Williams"), (4,"Rames Rose"),(5,"Rames rose")
)
val df = data.toDF("id","name")

Spark Filter startsWith()

The startsWith() method lets you check whether the Spark DataFrame column string value starts with a string specified as an argument to this method. This method is case-sensitive. Below example returns, all rows from DataFrame that start with the string James on the name column.


import org.apache.spark.sql.functions.col
df.filter(col("name").startsWith("James")).show()
+---+-----------+
| id|       name|
+---+-----------+
|  1|James Smith|
+---+-----------+

use below examples if you wanted to try NOT startsWith() (starts with) a string.


df.filter(! col("name").startsWith("James")).show()
df.filter( col("name").startsWith("James") === false).show()
+---+---------------+
| id|           name|
+---+---------------+
|  2|   Michael Rose|
|  3|Robert Williams|
|  4|     Rames Rose|
|  5|     Rames rose|
+---+---------------+

Spark Filter endsWith()

The endsWith() method lets you check whether the Spark DataFrame column string value ends with a string specified as an argument to this method. This method is case-sensitive. Below example returns, all rows from DataFrame that ends with the string Rose on the name column.


df.filter(col("name").endsWith("Rose")).show()
+---+------------+
| id|        name|
+---+------------+
|  2|Michael Rose|
|  4|  Rames Rose|
+---+------------+

Similarly for NOT endsWith() (ends with) a string.


//NOT ends with a string
df.filter(! col("name").endsWith("Rose")).show()
df.filter(col("name").endsWith("Rose")==false).show()
+---+---------------+
| id|           name|
+---+---------------+
|  1|    James Smith|
|  3|Robert Williams|
|  5|     Rames rose|
+---+---------------+

Using Spark SQL Expression


df.createOrReplaceTempView("DATA")
//Starts with a String
spark.sql("select * from DATA where name like 'James%'").show()
//NOT starts with a String
spark.sql("select * from DATA where name not like 'James%'").show()

Happy Learning !!

Related Articles

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply