Spark filter startsWith() and endsWith() are used to search DataFrame rows by checking column value starts with and ends with a string, these methods are also used to filter not starts with and not ends with a string. Both these methods are from the Column class.
startsWith()
– Returns Boolean value true when DataFrame column value starts with a string specified as an argument to this method, when not match returns false.endsWith()
– Returns Boolean True when DataFrame column value ends with a string specified as an argument to this method, when not match returns false.
In order to explain these with examples first, let’s create a DataFrame with some test data.
import spark.implicits._
val data = Seq((1,"James Smith"), (2,"Michael Rose"),
(3,"Robert Williams"), (4,"Rames Rose"),(5,"Rames rose")
)
val df = data.toDF("id","name")
Spark Filter startsWith()
The startsWith()
method lets you check whether the Spark DataFrame column string value starts with a string specified as an argument to this method. This method is case-sensitive. Below example returns, all rows from DataFrame that start with the string James
on the name
column.
import org.apache.spark.sql.functions.col
df.filter(col("name").startsWith("James")).show()
+---+-----------+
| id| name|
+---+-----------+
| 1|James Smith|
+---+-----------+
use below examples if you wanted to try NOT startsWith()
(starts with) a string.
df.filter(! col("name").startsWith("James")).show()
df.filter( col("name").startsWith("James") === false).show()
+---+---------------+
| id| name|
+---+---------------+
| 2| Michael Rose|
| 3|Robert Williams|
| 4| Rames Rose|
| 5| Rames rose|
+---+---------------+
Spark Filter endsWith()
The endsWith()
method lets you check whether the Spark DataFrame column string value ends with a string specified as an argument to this method. This method is case-sensitive. Below example returns, all rows from DataFrame that ends with the string Rose
on the name
column.
df.filter(col("name").endsWith("Rose")).show()
+---+------------+
| id| name|
+---+------------+
| 2|Michael Rose|
| 4| Rames Rose|
+---+------------+
Similarly for NOT endsWith()
(ends with) a string.
//NOT ends with a string
df.filter(! col("name").endsWith("Rose")).show()
df.filter(col("name").endsWith("Rose")==false).show()
+---+---------------+
| id| name|
+---+---------------+
| 1| James Smith|
| 3|Robert Williams|
| 5| Rames rose|
+---+---------------+
Using Spark SQL Expression
df.createOrReplaceTempView("DATA")
//Starts with a String
spark.sql("select * from DATA where name like 'James%'").show()
//NOT starts with a String
spark.sql("select * from DATA where name not like 'James%'").show()
Happy Learning !!
Related Articles
- How to Filter Rows with NULL/NONE (IS NULL & IS NOT NULL) in Spark
- Spark Filter using Multiple Conditions
- Spark DataFrame Where Filter | Multiple Conditions
- Spark Data Frame Where () To Filter Rows
- Spark Filter Using contains() Examples
- Spark Get Current Number of Partitions of DataFrame
- Spark Partitioning & Partition Understanding
- Spark map() vs mapPartitions() with Examples