Calculate difference between two dates in days, months and years

Spark Difference between two dates

In this tutorial, we will show you a Spark SQL Dataframe example of how to calculate a difference between two dates in days, Months and year using Scala language and functions datediff, months_between.

First Let’s see getting the difference between two dates using datediff Spark function.


Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),("2018-07-23"))
.toDF("date").select(
      col("date"),
      current_date().as("current_date"),
      datediff(current_date(),col("date")).as("datediff")
    ).show()

Output:


+----------+------------+--------+
|      date|current_date|datediff|
+----------+------------+--------+
|2019-07-01|  2019-07-24|      23|
|2019-06-24|  2019-07-24|      30|
|2019-08-24|  2019-07-24|     -31|
|2018-07-23|  2019-07-24|     366|
+----------+------------+--------+

Now, Let’s see how to get month and year differences between two dates using months_between function.


val df = Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
("2018-12-23"),("2018-07-20"))
    .toDF("startDate").select(
    col("startDate"),current_date().as("endDate")
  )

df.withColumn("datesDiff", datediff(col("endDate"),col("startDate")))
      .withColumn("montsDiff", months_between(
        col("endDate"),col("startDate")))
      .withColumn("montsDiff_round",round(months_between(
        col("endDate"),col("startDate")),2))
      .withColumn("yearsDiff",months_between(
        col("endDate"),col("startDate"),true).divide(12))
      .withColumn("yearsDiff_round",round(months_between(
        col("endDate"),col("startDate"),true).divide(12),2))
      .show()

Output:


+----------+----------+---------+-----------+---------------+--------------------+---------------+
| startDate|   endDate|datesDiff|  montsDiff|montsDiff_round|           yearsDiff|yearsDiff_round|
+----------+----------+---------+-----------+---------------+--------------------+---------------+
|2019-07-01|2019-07-24|       23| 0.74193548|           0.74| 0.06182795666666666|           0.06|
|2019-06-24|2019-07-24|       30|        1.0|            1.0| 0.08333333333333333|           0.08|
|2019-08-24|2019-07-24|      -31|       -1.0|           -1.0|-0.08333333333333333|          -0.08|
|2018-12-23|2019-07-24|      213| 7.03225806|           7.03|         0.586021505|           0.59|
|2018-07-20|2019-07-24|      369|12.12903226|          12.13|  1.0107526883333333|           1.01|
+----------+----------+---------+-----------+---------------+--------------------+---------------+

Let’s see another example of the difference between two dates when dates are not in Spark DateType format ‘yyyy-MM-dd’. when dates are not in Spark DateType format, all Spark functions return null. Hence, first convert the input dates to Spark DateType using to_date function.


val dfDate = Seq(("07-01-2019"),("06-24-2019"),("08-24-2019"),
("12-23-2018"),("07-20-2018"))
    .toDF("startDate").select(
    to_date(col("startDate"),"MM-dd-yyyy").as("startDate"),
    current_date().as("endDate")

Complete Code:


package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._

object DateDiff extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  //Difference between two dates in days
  Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
       ("2018-07-23")).toDF("date")
    .select(
      col("date"),
      current_date().as("current_date"),
      datediff(current_date(),col("date")).as("datediff")
    ).show()

  // Difference between two dates in Months and Years
  val df = Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
       ("2018-12-23"),("2018-07-20"))
    .toDF("startDate").select(
    col("startDate"),current_date().as("endDate")
  )

  calculateDiff(df)

  //Difference between two dates when dates are not in Spark DateType format 'yyyy-MM-dd'.
  //Note that when dates are not in Spark DateType format, all Spark functions returns null
  //Hence, first convert the input dates to Spark DateType using to_date function
  val dfDate = Seq(("07-01-2019"),("06-24-2019"),("08-24-2019"),
      ("12-23-2018"),("07-20-2018"))
    .toDF("startDate").select(
    to_date(col("startDate"),"MM-dd-yyyy").as("startDate"),
    current_date().as("endDate")
  )

  calculateDiff(dfDate)

  def calculateDiff(df:DataFrame): Unit ={
    df.withColumn("datesDiff", datediff(col("endDate"),col("startDate")))
      .withColumn("montsDiff", months_between(
        col("endDate"),col("startDate")))
      .withColumn("montsDiff_round",round(months_between(
        col("endDate"),col("startDate")),2))
      .withColumn("yearsDiff",months_between(
        col("endDate"),col("startDate"),true).divide(12))
      .withColumn("yearsDiff_round",round(months_between(
        col("endDate"),col("startDate"),true).divide(12),2))
      .show()
  }
}

Conclusion:

In this tutorial, we have learned how to calculate days, months and years in between two dates using Scala language and Spark SQL Date and Time functions datediff, months_between. You can find more information about these functions at the following blog

If you want to know the complete list of Spark functions by group, please read below article.

Happy Learning !!

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply