Spark SQL – Add Day, Month, and Year to Date

In this tutorial, we will show you a Spark SQL DataFrame example of how to add a day, month, and year to DataFrame date column. We will also see some examples when the DataFrame column has different date formats and adding day, month, and year with a value from another column.

Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.

Spark SQL provides DataFrame function add_months() to add or subtract months from a Date Column and date_add(), date_sub() to add and subtract days.

Below code, add days and months to Dataframe column, when the input Date in “yyyy-MM-dd” Spark DateType format.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df=Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date")
df.select(col("date"),
    add_months(col("date"),3).as("add_months"), // provide +ve value to add months
    add_months(col("date"),-3).as("sub_months"), //provide -ve value to subtract months
    date_add(col("date"),4).as("date_add"), // to add day
    date_sub(col("date"),4).as("date_sub") //to substract day
  ).show()

Outputs:


+----------+----------+----------+----------+----------+
|      date|add_months|sub_months|  date_add|  date_sub|
+----------+----------+----------+----------+----------+
|2019-01-23|2019-04-23|2018-10-23|2019-01-27|2019-01-19|
|2019-06-24|2019-09-24|2019-03-24|2019-06-28|2019-06-20|
|2019-09-20|2019-12-20|2019-06-20|2019-09-24|2019-09-16|
+----------+----------+----------+----------+----------+

In case, if your input Date is not in Spark DateType “yyyy-MM-dd” format, then first you need to convert the Date to DateType before you apply add_months() and date_sub() function. Let’s see an example.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df2=Seq(("06-03-2009"),("07-24-2009")).toDF("date").
df2.select(col("Date"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),3).as("add_months"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),-3).as("add_months2"),
    date_add(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_add"),
    date_add(to_date(col("Date"),"MM-dd-yyyy"),-3).as("date_add2"),
    date_sub(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_sub")
  ).show()

Outputs:


+----------+----------+-----------+----------+----------+----------+
|      Date|add_months|add_months2|  date_add| date_add2|  date_sub|
+----------+----------+-----------+----------+----------+----------+
|06-03-2009|2009-09-03| 2009-03-03|2009-06-06|2009-05-31|2009-05-31|
|07-24-2009|2009-10-24| 2009-04-24|2009-07-27|2009-07-21|2009-07-21|
+----------+----------+-----------+----------+----------+----------+

Add Day, Month and Year with a Value From Another Column

If you wanted to add a day, month, and year with the value from another column, you need to use expr() SQL function. Below is an example with add_month() and I will leave it to you to explore for month and year.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df3=Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3))
    .toDF("date","increment")
df3.select(col("date"),col("increment"),
      expr("add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int))").as("inc_date")
   ).show()

outputs


+----------+---------+----------+
|      date|increment|  inc_date|
+----------+---------+----------+
|2019-01-23|        1|2019-02-23|
|2019-06-24|        2|2019-08-24|
|2019-09-20|        3|2019-12-20|
+----------+---------+----------+

Alternatively, you can also write above statement with selectExpr()


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df4=Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3)).toDF("date","increment")
df4.selectExpr("date","increment","add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int)) as inc_date")
   .show()

Complete Code:


package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{add_months, date_add, _}

object DateAddMonths extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date").select(
    col("date"),
    add_months(col("date"),3).as("add_months"),
    add_months(col("date"),-3).as("sub_months"),
    date_add(col("date"),4).as("date_add"),
    date_sub(col("date"),4).as("date_sub")
  ).show()

  Seq(("06-03-2009"),("07-24-2009")).toDF("date").select(
    col("Date"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),3).as("add_months"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),-3).as("add_months2"),
    date_add(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_add"),
      date_add(to_date(col("Date"),"MM-dd-yyyy"),-3).as("date_add2"),
    date_sub(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_sub")
  ).show()

 //Adding value from another column
  Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3))
    .toDF("date","increment")
    .select(col("date"),col("increment"),
      expr("add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int))").as("inc_date"))
    .show()
}

Conclusion:

From this article, you have learned Spark Dataframe function add_months() to add months or subtract months from a column and date_add(), date_sub() to add and subtract days from a column.

Happy Learning !!

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply

This Post Has 10 Comments

  1. Igor

    Thank you for this article!
    I’m trying to do add an dynamic amout in each column, those amounts are in another column ‘amount’.
    It’s posible to do somenthing like that? I already search a lot and couldn’t do that.
    Thanks!

    df_new = df.select(add_months(col(‘date’),col(amount))).alias(‘date’))

    1. NNK

      yes, certainly you can do it. please provide the details with a data, I will try to provide a solution.

      1. CG

        Hi NNK
        I’m not the OP but I’m having the same issue. Using the data from your example:
        Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3)).toDF(“date”,”increment”).select(
        col(“date”),
        add_months(to_date(col(“date”),”yyyy-MM-dd”),col(“increment”)).as(“date_inc”)
        ).show()

        I get the following error:
        :27: error: type mismatch;
        found : org.apache.spark.sql.Column
        required: Int

        Is there any workaround for this?
        Thank you very much!

        1. NNK

          Hi CG, I see you have added additional column “increment” and you are trying to add value from this column to data. Your solution won’t work as to_date() method doesn’t take column as the second argument. try below

          Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3))
          .toDF(“date”,”increment”)
          .select(col(“date”),expr(“add_months(to_date(date,’yyyy-MM-dd’),cast(increment as int))”))
          .show()

          Let me know if this works.

          Thanks.

          1. CG

            It appears the solution works in PySpark, Unfortunately this isn’t working in scala:

            scala> Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3)).toDF(“date”,”increment”).select(
            | col(“date”),
            | add_months(to_date(col(“date”),”yyyy-MM-dd”),cast(increment as int)).as(“date_inc”)
            | ).show()
            :27: error: not found: value cast
            add_months(to_date(col(“date”),”yyyy-MM-dd”),cast(increment as int)).as(“date_inc”)

          2. NNK

            You are not using expr() function as explained in solution, please try with expr() or selectExpr() function. I’ve updated the article with these. please check.

          3. CG

            You’re absolutely correct, I missed the expr()
            Thank you very much for taking the time!

          4. NNK

            Glad it helped you and thanks for reading the article.

  2. Anonymous

    We’ve got the days and months, but how do we add years to a timestamp?

    1. Matthias

      Hi Anonymous

      You could just use the add_months functionality and multiply by 12 to get years. Here’s an example defining an own function:

      “`
      import org.apache.spark.sql.functions.add_months
      import org.apache.spark.sql.Column

      def add_years(col: Column, years: Int): Column = add_months(col, 12*years)

      val nYears = 3

      Seq((“2019-01-23”),(“2019-06-24”),(“2019-09-20”))
      .toDF(“date”)
      .withColumn(“threeYearsSubstractedExample”, add_years(col(“date”), -3))
      “`

You are currently viewing Spark SQL – Add Day, Month, and Year to Date