Site icon Spark By {Examples}

Spark Filter startsWith(), endsWith() Examples

spar startswith

Spark filter startsWith() and endsWith() are used to search DataFrame rows by checking column value starts with and ends with a string, these methods are also used to filter not starts with and not ends with a string. Both these methods are from the Column class.

In order to explain these with examples first, let’s create a DataFrame with some test data.


import spark.implicits._

val data = Seq((1,"James Smith"), (2,"Michael Rose"),
  (3,"Robert Williams"), (4,"Rames Rose"),(5,"Rames rose")
)
val df = data.toDF("id","name")

1. Spark Filter startsWith()

The startsWith() method lets you check whether the Spark DataFrame column string value starts with a string specified as an argument to this method. This method is case-sensitive. Below example returns, all rows from DataFrame that start with the string James on the name column.


// Spark Filter startsWith()
import org.apache.spark.sql.functions.col
df.filter(col("name").startsWith("James")).show()
+---+-----------+
| id|       name|
+---+-----------+
|  1|James Smith|
+---+-----------+

use below examples if you wanted to try NOT startsWith() (starts with) a string.


df.filter(! col("name").startsWith("James")).show()
df.filter( col("name").startsWith("James") === false).show()
+---+---------------+
| id|           name|
+---+---------------+
|  2|   Michael Rose|
|  3|Robert Williams|
|  4|     Rames Rose|
|  5|     Rames rose|
+---+---------------+

2. Spark Filter endsWith()

The endsWith() method lets you check whether the Spark DataFrame column string value ends with a string specified as an argument to this method. This method is case-sensitive. Below example returns, all rows from DataFrame that ends with the string Rose on the name column.


// Spark Filter endsWith()
df.filter(col("name").endsWith("Rose")).show()
+---+------------+
| id|        name|
+---+------------+
|  2|Michael Rose|
|  4|  Rames Rose|
+---+------------+

Similarly for NOT endsWith() (ends with) a string.


// NOT ends with a string
df.filter(! col("name").endsWith("Rose")).show()
df.filter(col("name").endsWith("Rose")==false).show()
+---+---------------+
| id|           name|
+---+---------------+
|  1|    James Smith|
|  3|Robert Williams|
|  5|     Rames rose|
+---+---------------+

3. Using Spark SQL Expression


df.createOrReplaceTempView("DATA")
// Starts with a String
spark.sql("select * from DATA where name like 'James%'").show()
// NOT starts with a String
spark.sql("select * from DATA where name not like 'James%'").show()

Happy Learning !!

Exit mobile version