Spark sortByKey()
transformation is an RDD operation that is used to sort the values of the key by ascending or descending order. sortByKey()
function operates on pair RDD (key/value pair) and it is available in <a href="https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala">org.apache.spark.rdd.OrderedRDDFunctions</a>
.
First, let’s create an RDD from the list.
val spark: SparkSession = SparkSession.builder()
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate()
val data = Seq(("Project","A", 1),
("Gutenberg’s", "X",3),
("Alice’s", "C",5),
("Adventures","B", 1)
)
val rdd=spark.sparkContext.parallelize(data)
As you see the data here, it’s in key/value pair. Key is the work name and value is the count.
Related:
Spark RDD sortByKey() Syntax
Below is the syntax of the Spark RDD sortByKey()
transformation, this returns Tuple2
after sorting the data.
sortByKey(ascending:Boolean,numPartitions:int):org.apache.spark.rdd.RDD[scala.Tuple2[K, V]]
This function takes two optional arguments; ascending as Boolean and numPartitions as an integer.
ascending
is used to specify the order of the sort, by default, it is true meaning ascending order, use false for descending order.
numPartitions
is used to specify the number of partitions it should create with the result of the sortByKey() function.
RDD sortByKey() Example
On our input the RDD is not in Pair RDD (key/value pair) hence we cannot apply sortByKey() transformation so first you need to convert this RDD into Pair RDD.
By using Spark RDD map transformation you can conver RDD into Pair RDD. I would like to sort on index 2 of the RDD.
Now let’s use the sortByKey()
to sort.
val rdd3= rdd2.sortByKey()
rdd3.foreach(println)
Since I have not used any arguments for sorting by default it sorts in ascending order. This yields the below output in the console.

Below example sorts in descending order.
val rdd4= rdd2.sortByKey(false)
rdd4.foreach(println)
// Prints to console
(X,(Gutenberg’s,X,3))
(C,(Alice’s,C,5))
(B,(Adventures,B,1))
(A,(Project,A,1))
Complete sortByKey() Scala Example
Below is a complete example of RDD sortByKey()
transformation with Scala Example.
import org.apache.spark.sql.SparkSession
object SortByKeyExample extends App{
val spark: SparkSession = SparkSession.builder()
.master("local[1]")
.appName("SparkByExamples.com")
.getOrCreate()
val data = Seq(("Project","A", 1),
("Gutenberg’s", "X",3),
("Alice’s", "C",5),
("Adventures","B", 1)
)
val rdd=spark.sparkContext.parallelize(data)
rdd.foreach(println)
val rdd2=rdd.map(f=>{(f._2, (f._1,f._2,f._3))})
rdd2.foreach(println)
val rdd3= rdd2.sortByKey()
rdd2.foreach(println)
}
Conclusion
In this article, you have learned Spark RDD sortByKey()
transformation to sort RDD in ascending or descending order. If RDD is non in Pair RDD you need to convert it using map transformation before calling the sortByKey() function.
Happy Learning !!
Related Articles
- Spark – Sort multiple DataFrame columns
- Spark – Sort by column in descending order?
- Spark – How to Sort DataFrame column explained
- Spark SQL Sort functions – complete list
- Spark Word Count Explained with Example
- Spark RDD fold() function example
- Spark RDD reduce() function example
- Spark RDD aggregate() operation example