
In this tutorial, we will show you a Spark SQL Dataframe example of how to calculate a difference between two dates in days, Months and year using Scala language and functions datediff
, months_between
.
First Let’s see getting the difference between two dates using datediff
Spark function.
Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),("2018-07-23"))
.toDF("date").select(
col("date"),
current_date().as("current_date"),
datediff(current_date(),col("date")).as("datediff")
).show()
Output:
+----------+------------+--------+
| date|current_date|datediff|
+----------+------------+--------+
|2019-07-01| 2019-07-24| 23|
|2019-06-24| 2019-07-24| 30|
|2019-08-24| 2019-07-24| -31|
|2018-07-23| 2019-07-24| 366|
+----------+------------+--------+
Now, Let’s see how to get month and year differences between two dates using months_between
function.
val df = Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
("2018-12-23"),("2018-07-20"))
.toDF("startDate").select(
col("startDate"),current_date().as("endDate")
)
df.withColumn("datesDiff", datediff(col("endDate"),col("startDate")))
.withColumn("montsDiff", months_between(
col("endDate"),col("startDate")))
.withColumn("montsDiff_round",round(months_between(
col("endDate"),col("startDate")),2))
.withColumn("yearsDiff",months_between(
col("endDate"),col("startDate"),true).divide(12))
.withColumn("yearsDiff_round",round(months_between(
col("endDate"),col("startDate"),true).divide(12),2))
.show()
Output:
+----------+----------+---------+-----------+---------------+--------------------+---------------+
| startDate| endDate|datesDiff| montsDiff|montsDiff_round| yearsDiff|yearsDiff_round|
+----------+----------+---------+-----------+---------------+--------------------+---------------+
|2019-07-01|2019-07-24| 23| 0.74193548| 0.74| 0.06182795666666666| 0.06|
|2019-06-24|2019-07-24| 30| 1.0| 1.0| 0.08333333333333333| 0.08|
|2019-08-24|2019-07-24| -31| -1.0| -1.0|-0.08333333333333333| -0.08|
|2018-12-23|2019-07-24| 213| 7.03225806| 7.03| 0.586021505| 0.59|
|2018-07-20|2019-07-24| 369|12.12903226| 12.13| 1.0107526883333333| 1.01|
+----------+----------+---------+-----------+---------------+--------------------+---------------+
Let’s see another example of the difference between two dates when dates are not in Spark DateType format ‘yyyy-MM-dd’. when dates are not in Spark DateType format, all Spark functions return null. Hence, first convert the input dates to Spark DateType using to_date
function.
val dfDate = Seq(("07-01-2019"),("06-24-2019"),("08-24-2019"),
("12-23-2018"),("07-20-2018"))
.toDF("startDate").select(
to_date(col("startDate"),"MM-dd-yyyy").as("startDate"),
current_date().as("endDate")
Complete Code:
package com.sparkbyexamples.spark.dataframe.functions
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._
object DateDiff extends App {
val spark:SparkSession = SparkSession.builder()
.master("local")
.appName("SparkByExamples.com")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.sqlContext.implicits._
//Difference between two dates in days
Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
("2018-07-23")).toDF("date")
.select(
col("date"),
current_date().as("current_date"),
datediff(current_date(),col("date")).as("datediff")
).show()
// Difference between two dates in Months and Years
val df = Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
("2018-12-23"),("2018-07-20"))
.toDF("startDate").select(
col("startDate"),current_date().as("endDate")
)
calculateDiff(df)
//Difference between two dates when dates are not in Spark DateType format 'yyyy-MM-dd'.
//Note that when dates are not in Spark DateType format, all Spark functions returns null
//Hence, first convert the input dates to Spark DateType using to_date function
val dfDate = Seq(("07-01-2019"),("06-24-2019"),("08-24-2019"),
("12-23-2018"),("07-20-2018"))
.toDF("startDate").select(
to_date(col("startDate"),"MM-dd-yyyy").as("startDate"),
current_date().as("endDate")
)
calculateDiff(dfDate)
def calculateDiff(df:DataFrame): Unit ={
df.withColumn("datesDiff", datediff(col("endDate"),col("startDate")))
.withColumn("montsDiff", months_between(
col("endDate"),col("startDate")))
.withColumn("montsDiff_round",round(months_between(
col("endDate"),col("startDate")),2))
.withColumn("yearsDiff",months_between(
col("endDate"),col("startDate"),true).divide(12))
.withColumn("yearsDiff_round",round(months_between(
col("endDate"),col("startDate"),true).divide(12),2))
.show()
}
}
Conclusion:
In this tutorial, we have learned how to calculate days, months and years in between two dates using Scala language and Spark SQL Date and Time functions datediff, months_between. You can find more information about these functions at the following blog
If you want to know the complete list of Spark functions by group, please read below article.
Happy Learning !!