Spark SQL – Add Day, Month, and Year to Date

  • Post author:

In this tutorial, we will show you a Spark SQL DataFrame example of how to add a day, month, and year to DataFrame date column. We will also see some examples when the DataFrame column has different date formats and adding day, month, and year with a value from another column.

Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.

Spark SQL provides DataFrame function add_months() to add or subtract months from a Date Column and date_add(), date_sub() to add and subtract days.

Below code, add days and months to Dataframe column, when the input Date in “yyyy-MM-dd” Spark DateType format.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df=Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date")
df.select(col("date"),
    add_months(col("date"),3).as("add_months"), // provide +ve value to add months
    add_months(col("date"),-3).as("sub_months"), //provide -ve value to subtract months
    date_add(col("date"),4).as("date_add"), // to add day
    date_sub(col("date"),4).as("date_sub") //to substract day
  ).show()

Outputs:


+----------+----------+----------+----------+----------+
|      date|add_months|sub_months|  date_add|  date_sub|
+----------+----------+----------+----------+----------+
|2019-01-23|2019-04-23|2018-10-23|2019-01-27|2019-01-19|
|2019-06-24|2019-09-24|2019-03-24|2019-06-28|2019-06-20|
|2019-09-20|2019-12-20|2019-06-20|2019-09-24|2019-09-16|
+----------+----------+----------+----------+----------+

In case, if your input Date is not in Spark DateType “yyyy-MM-dd” format, then first you need to convert the Date to DateType before you apply add_months() and date_sub() function. Let’s see an example.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df2=Seq(("06-03-2009"),("07-24-2009")).toDF("date").
df2.select(col("Date"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),3).as("add_months"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),-3).as("add_months2"),
    date_add(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_add"),
    date_add(to_date(col("Date"),"MM-dd-yyyy"),-3).as("date_add2"),
    date_sub(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_sub")
  ).show()

Outputs:


+----------+----------+-----------+----------+----------+----------+
|      Date|add_months|add_months2|  date_add| date_add2|  date_sub|
+----------+----------+-----------+----------+----------+----------+
|06-03-2009|2009-09-03| 2009-03-03|2009-06-06|2009-05-31|2009-05-31|
|07-24-2009|2009-10-24| 2009-04-24|2009-07-27|2009-07-21|2009-07-21|
+----------+----------+-----------+----------+----------+----------+

Add Day, Month and Year with a Value From Another Column

If you wanted to add a day, month, and year with the value from another column, you need to use expr() SQL function. Below is an example with add_month() and I will leave it to you to explore for month and year.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df3=Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3))
    .toDF("date","increment")
df3.select(col("date"),col("increment"),
      expr("add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int))").as("inc_date")
   ).show()

outputs


+----------+---------+----------+
|      date|increment|  inc_date|
+----------+---------+----------+
|2019-01-23|        1|2019-02-23|
|2019-06-24|        2|2019-08-24|
|2019-09-20|        3|2019-12-20|
+----------+---------+----------+

Alternatively, you can also write above statement with selectExpr()


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df4=Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3)).toDF("date","increment")
df4.selectExpr("date","increment","add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int)) as inc_date")
   .show()

Complete Code:


package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{add_months, date_add, _}

object DateAddMonths extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date").select(
    col("date"),
    add_months(col("date"),3).as("add_months"),
    add_months(col("date"),-3).as("sub_months"),
    date_add(col("date"),4).as("date_add"),
    date_sub(col("date"),4).as("date_sub")
  ).show()

  Seq(("06-03-2009"),("07-24-2009")).toDF("date").select(
    col("Date"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),3).as("add_months"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),-3).as("add_months2"),
    date_add(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_add"),
      date_add(to_date(col("Date"),"MM-dd-yyyy"),-3).as("date_add2"),
    date_sub(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_sub")
  ).show()

 //Adding value from another column
  Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3))
    .toDF("date","increment")
    .select(col("date"),col("increment"),
      expr("add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int))").as("inc_date"))
    .show()
}

Conclusion:

From this article, you have learned Spark Dataframe function add_months() to add months or subtract months from a column and date_add(), date_sub() to add and subtract days from a column. You can find more information about these functions at the following blog

Happy Learning !!

spark add day month year

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

This Post Has 10 Comments

  1. Anonymous

    We’ve got the days and months, but how do we add years to a timestamp?

    1. Matthias

      Hi Anonymous

      You could just use the add_months functionality and multiply by 12 to get years. Here’s an example defining an own function:

      “`
      import org.apache.spark.sql.functions.add_months
      import org.apache.spark.sql.Column

      def add_years(col: Column, years: Int): Column = add_months(col, 12*years)

      val nYears = 3

      Seq((“2019-01-23”),(“2019-06-24”),(“2019-09-20”))
      .toDF(“date”)
      .withColumn(“threeYearsSubstractedExample”, add_years(col(“date”), -3))
      “`

  2. Igor

    Thank you for this article!
    I’m trying to do add an dynamic amout in each column, those amounts are in another column ‘amount’.
    It’s posible to do somenthing like that? I already search a lot and couldn’t do that.
    Thanks!

    df_new = df.select(add_months(col(‘date’),col(amount))).alias(‘date’))

    1. NNK

      yes, certainly you can do it. please provide the details with a data, I will try to provide a solution.

      1. CG

        Hi NNK
        I’m not the OP but I’m having the same issue. Using the data from your example:
        Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3)).toDF(“date”,”increment”).select(
        col(“date”),
        add_months(to_date(col(“date”),”yyyy-MM-dd”),col(“increment”)).as(“date_inc”)
        ).show()

        I get the following error:
        :27: error: type mismatch;
        found : org.apache.spark.sql.Column
        required: Int

        Is there any workaround for this?
        Thank you very much!

        1. NNK

          Hi CG, I see you have added additional column “increment” and you are trying to add value from this column to data. Your solution won’t work as to_date() method doesn’t take column as the second argument. try below

          Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3))
          .toDF(“date”,”increment”)
          .select(col(“date”),expr(“add_months(to_date(date,’yyyy-MM-dd’),cast(increment as int))”))
          .show()

          Let me know if this works.

          Thanks.

          1. CG

            It appears the solution works in PySpark, looking at this comment that popped up recently in https://forums.databricks.com/answers/45143/view.html
            Unfortunately this isn’t working in scala:

            scala> Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3)).toDF(“date”,”increment”).select(
            | col(“date”),
            | add_months(to_date(col(“date”),”yyyy-MM-dd”),cast(increment as int)).as(“date_inc”)
            | ).show()
            :27: error: not found: value cast
            add_months(to_date(col(“date”),”yyyy-MM-dd”),cast(increment as int)).as(“date_inc”)

          2. NNK

            You are not using expr() function as explained in solution, please try with expr() or selectExpr() function. I’ve updated the article with these. please check.

          3. CG

            You’re absolutely correct, I missed the expr()
            Thank you very much for taking the time!

          4. NNK

            Glad it helped you and thanks for reading the article.

You are currently viewing Spark SQL – Add Day, Month, and Year to Date