You are currently viewing Spark SQL – Add Day, Month, and Year to Date

In this tutorial, we will show you a Spark SQL DataFrame example of how to add a day, month, and year to DataFrame date column. We will also see some examples when the DataFrame column has different date formats and adding day, month, and year with a value from another column.

Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.

Spark SQL provides DataFrame function add_months() to add or subtract months from a Date Column and date_add(), date_sub() to add and subtract days.

Below code, add days and months to Dataframe column, when the input Date in “yyyy-MM-dd” Spark DateType format.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df=Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date")
df.select(col("date"),
    add_months(col("date"),3).as("add_months"), // provide +ve value to add months
    add_months(col("date"),-3).as("sub_months"), //provide -ve value to subtract months
    date_add(col("date"),4).as("date_add"), // to add day
    date_sub(col("date"),4).as("date_sub") //to substract day
  ).show()

Outputs:


+----------+----------+----------+----------+----------+
|      date|add_months|sub_months|  date_add|  date_sub|
+----------+----------+----------+----------+----------+
|2019-01-23|2019-04-23|2018-10-23|2019-01-27|2019-01-19|
|2019-06-24|2019-09-24|2019-03-24|2019-06-28|2019-06-20|
|2019-09-20|2019-12-20|2019-06-20|2019-09-24|2019-09-16|
+----------+----------+----------+----------+----------+

In case, if your input Date is not in Spark DateType “yyyy-MM-dd” format, then first you need to convert the Date to DateType before you apply add_months() and date_sub() function. Let’s see an example.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df2=Seq(("06-03-2009"),("07-24-2009")).toDF("date").
df2.select(col("Date"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),3).as("add_months"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),-3).as("add_months2"),
    date_add(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_add"),
    date_add(to_date(col("Date"),"MM-dd-yyyy"),-3).as("date_add2"),
    date_sub(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_sub")
  ).show()

Outputs:


+----------+----------+-----------+----------+----------+----------+
|      Date|add_months|add_months2|  date_add| date_add2|  date_sub|
+----------+----------+-----------+----------+----------+----------+
|06-03-2009|2009-09-03| 2009-03-03|2009-06-06|2009-05-31|2009-05-31|
|07-24-2009|2009-10-24| 2009-04-24|2009-07-27|2009-07-21|2009-07-21|
+----------+----------+-----------+----------+----------+----------+

Add Day, Month and Year with a Value From Another Column

If you wanted to add a day, month, and year with the value from another column, you need to use expr() SQL function. Below is an example with add_month() and I will leave it to you to explore for month and year.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df3=Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3))
    .toDF("date","increment")
df3.select(col("date"),col("increment"),
      expr("add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int))").as("inc_date")
   ).show()

outputs


+----------+---------+----------+
|      date|increment|  inc_date|
+----------+---------+----------+
|2019-01-23|        1|2019-02-23|
|2019-06-24|        2|2019-08-24|
|2019-09-20|        3|2019-12-20|
+----------+---------+----------+

Alternatively, you can also write above statement with selectExpr()


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df4=Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3)).toDF("date","increment")
df4.selectExpr("date","increment","add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int)) as inc_date")
   .show()

Complete Code:


package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{add_months, date_add, _}

object DateAddMonths extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date").select(
    col("date"),
    add_months(col("date"),3).as("add_months"),
    add_months(col("date"),-3).as("sub_months"),
    date_add(col("date"),4).as("date_add"),
    date_sub(col("date"),4).as("date_sub")
  ).show()

  Seq(("06-03-2009"),("07-24-2009")).toDF("date").select(
    col("Date"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),3).as("add_months"),
    add_months(to_date(col("Date"),"MM-dd-yyyy"),-3).as("add_months2"),
    date_add(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_add"),
      date_add(to_date(col("Date"),"MM-dd-yyyy"),-3).as("date_add2"),
    date_sub(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_sub")
  ).show()

 //Adding value from another column
  Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3))
    .toDF("date","increment")
    .select(col("date"),col("increment"),
      expr("add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int))").as("inc_date"))
    .show()
}

Conclusion:

From this article, you have learned Spark Dataframe function add_months() to add months or subtract months from a column and date_add(), date_sub() to add and subtract days from a column.

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has 10 Comments

  1. Matthias

    Hi Anonymous

    You could just use the add_months functionality and multiply by 12 to get years. Here’s an example defining an own function:

    “`
    import org.apache.spark.sql.functions.add_months
    import org.apache.spark.sql.Column

    def add_years(col: Column, years: Int): Column = add_months(col, 12*years)

    val nYears = 3

    Seq((“2019-01-23”),(“2019-06-24”),(“2019-09-20”))
    .toDF(“date”)
    .withColumn(“threeYearsSubstractedExample”, add_years(col(“date”), -3))
    “`

  2. NNK

    Glad it helped you and thanks for reading the article.

  3. CG

    You’re absolutely correct, I missed the expr()
    Thank you very much for taking the time!

  4. NNK

    You are not using expr() function as explained in solution, please try with expr() or selectExpr() function. I’ve updated the article with these. please check.

  5. CG

    It appears the solution works in PySpark, Unfortunately this isn’t working in scala:

    scala> Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3)).toDF(“date”,”increment”).select(
    | col(“date”),
    | add_months(to_date(col(“date”),”yyyy-MM-dd”),cast(increment as int)).as(“date_inc”)
    | ).show()
    :27: error: not found: value cast
    add_months(to_date(col(“date”),”yyyy-MM-dd”),cast(increment as int)).as(“date_inc”)

  6. NNK

    Hi CG, I see you have added additional column “increment” and you are trying to add value from this column to data. Your solution won’t work as to_date() method doesn’t take column as the second argument. try below

    Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3))
    .toDF(“date”,”increment”)
    .select(col(“date”),expr(“add_months(to_date(date,’yyyy-MM-dd’),cast(increment as int))”))
    .show()

    Let me know if this works.

    Thanks.

  7. CG

    Hi NNK
    I’m not the OP but I’m having the same issue. Using the data from your example:
    Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3)).toDF(“date”,”increment”).select(
    col(“date”),
    add_months(to_date(col(“date”),”yyyy-MM-dd”),col(“increment”)).as(“date_inc”)
    ).show()

    I get the following error:
    :27: error: type mismatch;
    found : org.apache.spark.sql.Column
    required: Int

    Is there any workaround for this?
    Thank you very much!

  8. Anonymous

    We’ve got the days and months, but how do we add years to a timestamp?

  9. NNK

    yes, certainly you can do it. please provide the details with a data, I will try to provide a solution.

  10. Igor

    Thank you for this article!
    I’m trying to do add an dynamic amout in each column, those amounts are in another column ‘amount’.
    It’s posible to do somenthing like that? I already search a lot and couldn’t do that.
    Thanks!

    df_new = df.select(add_months(col(‘date’),col(amount))).alias(‘date’))

Comments are closed.