Calculate difference between two dates in days, months and years

Spark Difference between two dates

In this tutorial, we will show you a Spark SQL Dataframe example of how to calculate a difference between two dates in days, Months and year using Scala language and functions datediff, months_between.

First Let’s see getting the difference between two dates using datediff Spark function.


Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),("2018-07-23"))
.toDF("date").select(
      col("date"),
      current_date().as("current_date"),
      datediff(current_date(),col("date")).as("datediff")
    ).show()

Output:


+----------+------------+--------+
|      date|current_date|datediff|
+----------+------------+--------+
|2019-07-01|  2019-07-24|      23|
|2019-06-24|  2019-07-24|      30|
|2019-08-24|  2019-07-24|     -31|
|2018-07-23|  2019-07-24|     366|
+----------+------------+--------+

Now, Let’s see how to get month and year differences between two dates using months_between function.


val df = Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
("2018-12-23"),("2018-07-20"))
    .toDF("startDate").select(
    col("startDate"),current_date().as("endDate")
  )

df.withColumn("datesDiff", datediff(col("endDate"),col("startDate")))
      .withColumn("montsDiff", months_between(
        col("endDate"),col("startDate")))
      .withColumn("montsDiff_round",round(months_between(
        col("endDate"),col("startDate")),2))
      .withColumn("yearsDiff",months_between(
        col("endDate"),col("startDate"),true).divide(12))
      .withColumn("yearsDiff_round",round(months_between(
        col("endDate"),col("startDate"),true).divide(12),2))
      .show()

Output:


+----------+----------+---------+-----------+---------------+--------------------+---------------+
| startDate|   endDate|datesDiff|  montsDiff|montsDiff_round|           yearsDiff|yearsDiff_round|
+----------+----------+---------+-----------+---------------+--------------------+---------------+
|2019-07-01|2019-07-24|       23| 0.74193548|           0.74| 0.06182795666666666|           0.06|
|2019-06-24|2019-07-24|       30|        1.0|            1.0| 0.08333333333333333|           0.08|
|2019-08-24|2019-07-24|      -31|       -1.0|           -1.0|-0.08333333333333333|          -0.08|
|2018-12-23|2019-07-24|      213| 7.03225806|           7.03|         0.586021505|           0.59|
|2018-07-20|2019-07-24|      369|12.12903226|          12.13|  1.0107526883333333|           1.01|
+----------+----------+---------+-----------+---------------+--------------------+---------------+

Let’s see another example of the difference between two dates when dates are not in Spark DateType format ‘yyyy-MM-dd’. when dates are not in Spark DateType format, all Spark functions return null. Hence, first convert the input dates to Spark DateType using to_date function.


val dfDate = Seq(("07-01-2019"),("06-24-2019"),("08-24-2019"),
("12-23-2018"),("07-20-2018"))
    .toDF("startDate").select(
    to_date(col("startDate"),"MM-dd-yyyy").as("startDate"),
    current_date().as("endDate")

Complete Code:


package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._

object DateDiff extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  //Difference between two dates in days
  Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
       ("2018-07-23")).toDF("date")
    .select(
      col("date"),
      current_date().as("current_date"),
      datediff(current_date(),col("date")).as("datediff")
    ).show()

  // Difference between two dates in Months and Years
  val df = Seq(("2019-07-01"),("2019-06-24"),("2019-08-24"),
       ("2018-12-23"),("2018-07-20"))
    .toDF("startDate").select(
    col("startDate"),current_date().as("endDate")
  )

  calculateDiff(df)

  //Difference between two dates when dates are not in Spark DateType format 'yyyy-MM-dd'.
  //Note that when dates are not in Spark DateType format, all Spark functions returns null
  //Hence, first convert the input dates to Spark DateType using to_date function
  val dfDate = Seq(("07-01-2019"),("06-24-2019"),("08-24-2019"),
      ("12-23-2018"),("07-20-2018"))
    .toDF("startDate").select(
    to_date(col("startDate"),"MM-dd-yyyy").as("startDate"),
    current_date().as("endDate")
  )

  calculateDiff(dfDate)

  def calculateDiff(df:DataFrame): Unit ={
    df.withColumn("datesDiff", datediff(col("endDate"),col("startDate")))
      .withColumn("montsDiff", months_between(
        col("endDate"),col("startDate")))
      .withColumn("montsDiff_round",round(months_between(
        col("endDate"),col("startDate")),2))
      .withColumn("yearsDiff",months_between(
        col("endDate"),col("startDate"),true).divide(12))
      .withColumn("yearsDiff_round",round(months_between(
        col("endDate"),col("startDate"),true).divide(12),2))
      .show()
  }
}

Conclusion:

In this tutorial, we have learned how to calculate days, months and years in between two dates using Scala language and Spark SQL Date and Time functions datediff, months_between. You can find more information about these functions at the following blog

If you want to know the complete list of Spark functions by group, please read below article.

Happy Learning !!

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply

You are currently viewing Calculate difference between two dates in days, months and years