# Calculate difference between two dates in days, months and years

In this tutorial, we will show you a Spark SQL Dataframe example of how to calculate a difference between two dates in days, Months and year using Scala language and functions `datediff`,` months_between`.

First Let’s see getting the difference between two dates using `datediff` Spark function.

``````
Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),("2018-07-23"))
.toDF("date").select(
col("date"),
current_date().as("current_date"),
datediff(current_date(),col("date")).as("datediff")
).show()
``````

Output:

``````
+----------+------------+--------+
|      date|current_date|datediff|
+----------+------------+--------+
|2019-07-01|  2019-07-24|      23|
|2019-06-24|  2019-07-24|      30|
|2019-08-24|  2019-07-24|     -31|
|2018-07-23|  2019-07-24|     366|
+----------+------------+--------+
``````

Now, Let’s see how to get month and year differences between two dates using `months_between` function.

``````
val df = Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
("2018-12-23"),("2018-07-20"))
.toDF("startDate").select(
col("startDate"),current_date().as("endDate")
)

df.withColumn("datesDiff", datediff(col("endDate"),col("startDate")))
.withColumn("montsDiff", months_between(
col("endDate"),col("startDate")))
.withColumn("montsDiff_round",round(months_between(
col("endDate"),col("startDate")),2))
.withColumn("yearsDiff",months_between(
col("endDate"),col("startDate"),true).divide(12))
.withColumn("yearsDiff_round",round(months_between(
col("endDate"),col("startDate"),true).divide(12),2))
.show()
``````

Output:

``````
+----------+----------+---------+-----------+---------------+--------------------+---------------+
| startDate|   endDate|datesDiff|  montsDiff|montsDiff_round|           yearsDiff|yearsDiff_round|
+----------+----------+---------+-----------+---------------+--------------------+---------------+
|2019-07-01|2019-07-24|       23| 0.74193548|           0.74| 0.06182795666666666|           0.06|
|2019-06-24|2019-07-24|       30|        1.0|            1.0| 0.08333333333333333|           0.08|
|2019-08-24|2019-07-24|      -31|       -1.0|           -1.0|-0.08333333333333333|          -0.08|
|2018-12-23|2019-07-24|      213| 7.03225806|           7.03|         0.586021505|           0.59|
|2018-07-20|2019-07-24|      369|12.12903226|          12.13|  1.0107526883333333|           1.01|
+----------+----------+---------+-----------+---------------+--------------------+---------------+
``````

Let’s see another example of the difference between two dates when dates are not in Spark DateType format ‘yyyy-MM-dd’. when dates are not in Spark DateType format, all Spark functions return null. Hence, first convert the input dates to Spark DateType using `to_date` function.

``````
val dfDate = Seq(("07-01-2019"),("06-24-2019"),("08-24-2019"),
("12-23-2018"),("07-20-2018"))
.toDF("startDate").select(
to_date(col("startDate"),"MM-dd-yyyy").as("startDate"),
current_date().as("endDate")
``````

#### Complete Code:

``````
package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._

object DateDiff extends App {

val spark:SparkSession = SparkSession.builder()
.master("local")
.appName("SparkByExamples.com")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

import spark.sqlContext.implicits._

//Difference between two dates in days
Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
("2018-07-23")).toDF("date")
.select(
col("date"),
current_date().as("current_date"),
datediff(current_date(),col("date")).as("datediff")
).show()

// Difference between two dates in Months and Years
val df = Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
("2018-12-23"),("2018-07-20"))
.toDF("startDate").select(
col("startDate"),current_date().as("endDate")
)

calculateDiff(df)

//Difference between two dates when dates are not in Spark DateType format 'yyyy-MM-dd'.
//Note that when dates are not in Spark DateType format, all Spark functions returns null
//Hence, first convert the input dates to Spark DateType using to_date function
val dfDate = Seq(("07-01-2019"),("06-24-2019"),("08-24-2019"),
("12-23-2018"),("07-20-2018"))
.toDF("startDate").select(
to_date(col("startDate"),"MM-dd-yyyy").as("startDate"),
current_date().as("endDate")
)

calculateDiff(dfDate)

def calculateDiff(df:DataFrame): Unit ={
df.withColumn("datesDiff", datediff(col("endDate"),col("startDate")))
.withColumn("montsDiff", months_between(
col("endDate"),col("startDate")))
.withColumn("montsDiff_round",round(months_between(
col("endDate"),col("startDate")),2))
.withColumn("yearsDiff",months_between(
col("endDate"),col("startDate"),true).divide(12))
.withColumn("yearsDiff_round",round(months_between(
col("endDate"),col("startDate"),true).divide(12),2))
.show()
}
}
``````

#### Conclusion:

In this tutorial, we have learned how to calculate days, months and years in between two dates using Scala language and Spark SQL Date and Time functions datediff, months_between. You can find more information about these functions at the following blog

If you want to know the complete list of Spark functions by group, please read below article.

Happy Learning !!

### NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..