In this tutorial, we will show you a Spark SQL DataFrame example of how to add a day, month, and year to DataFrame date column. We will also see some examples when the DataFrame column has different date formats and adding day, month, and year with a value from another column.
Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.
Spark SQL provides DataFrame function add_months()
to add or subtract months from a Date Column and date_add()
, date_sub()
to add and subtract days.
Below code, add days and months to Dataframe column, when the input Date in “yyyy-MM-dd” Spark DateType format.
import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df=Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date")
df.select(col("date"),
add_months(col("date"),3).as("add_months"), // provide +ve value to add months
add_months(col("date"),-3).as("sub_months"), //provide -ve value to subtract months
date_add(col("date"),4).as("date_add"), // to add day
date_sub(col("date"),4).as("date_sub") //to substract day
).show()
Outputs:
+----------+----------+----------+----------+----------+
| date|add_months|sub_months| date_add| date_sub|
+----------+----------+----------+----------+----------+
|2019-01-23|2019-04-23|2018-10-23|2019-01-27|2019-01-19|
|2019-06-24|2019-09-24|2019-03-24|2019-06-28|2019-06-20|
|2019-09-20|2019-12-20|2019-06-20|2019-09-24|2019-09-16|
+----------+----------+----------+----------+----------+
In case, if your input Date is not in Spark DateType “yyyy-MM-dd” format, then first you need to convert the Date to DateType before you apply add_months()
and date_sub()
function. Let’s see an example.
import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df2=Seq(("06-03-2009"),("07-24-2009")).toDF("date").
df2.select(col("Date"),
add_months(to_date(col("Date"),"MM-dd-yyyy"),3).as("add_months"),
add_months(to_date(col("Date"),"MM-dd-yyyy"),-3).as("add_months2"),
date_add(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_add"),
date_add(to_date(col("Date"),"MM-dd-yyyy"),-3).as("date_add2"),
date_sub(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_sub")
).show()
Outputs:
+----------+----------+-----------+----------+----------+----------+
| Date|add_months|add_months2| date_add| date_add2| date_sub|
+----------+----------+-----------+----------+----------+----------+
|06-03-2009|2009-09-03| 2009-03-03|2009-06-06|2009-05-31|2009-05-31|
|07-24-2009|2009-10-24| 2009-04-24|2009-07-27|2009-07-21|2009-07-21|
+----------+----------+-----------+----------+----------+----------+
Add Day, Month and Year with a Value From Another Column
If you wanted to add a day, month, and year with the value from another column, you need to use expr()
SQL function. Below is an example with add_month() and I will leave it to you to explore for month and year.
import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df3=Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3))
.toDF("date","increment")
df3.select(col("date"),col("increment"),
expr("add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int))").as("inc_date")
).show()
outputs
+----------+---------+----------+
| date|increment| inc_date|
+----------+---------+----------+
|2019-01-23| 1|2019-02-23|
|2019-06-24| 2|2019-08-24|
|2019-09-20| 3|2019-12-20|
+----------+---------+----------+
Alternatively, you can also write above statement with selectExpr()
import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df4=Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3)).toDF("date","increment")
df4.selectExpr("date","increment","add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int)) as inc_date")
.show()
Complete Code:
package com.sparkbyexamples.spark.dataframe.functions
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{add_months, date_add, _}
object DateAddMonths extends App {
val spark:SparkSession = SparkSession.builder()
.master("local")
.appName("SparkByExamples.com")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.sqlContext.implicits._
Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date").select(
col("date"),
add_months(col("date"),3).as("add_months"),
add_months(col("date"),-3).as("sub_months"),
date_add(col("date"),4).as("date_add"),
date_sub(col("date"),4).as("date_sub")
).show()
Seq(("06-03-2009"),("07-24-2009")).toDF("date").select(
col("Date"),
add_months(to_date(col("Date"),"MM-dd-yyyy"),3).as("add_months"),
add_months(to_date(col("Date"),"MM-dd-yyyy"),-3).as("add_months2"),
date_add(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_add"),
date_add(to_date(col("Date"),"MM-dd-yyyy"),-3).as("date_add2"),
date_sub(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_sub")
).show()
//Adding value from another column
Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3))
.toDF("date","increment")
.select(col("date"),col("increment"),
expr("add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int))").as("inc_date"))
.show()
}
Conclusion:
From this article, you have learned Spark Dataframe function add_months()
to add months or subtract months from a column and date_add()
, date_sub()
to add and subtract days from a column. You can find more information about these functions at the following blog
Happy Learning !!

We’ve got the days and months, but how do we add years to a timestamp?
Hi Anonymous
You could just use the add_months functionality and multiply by 12 to get years. Here’s an example defining an own function:
“`
import org.apache.spark.sql.functions.add_months
import org.apache.spark.sql.Column
def add_years(col: Column, years: Int): Column = add_months(col, 12*years)
val nYears = 3
Seq((“2019-01-23”),(“2019-06-24”),(“2019-09-20”))
.toDF(“date”)
.withColumn(“threeYearsSubstractedExample”, add_years(col(“date”), -3))
“`
Thank you for this article!
I’m trying to do add an dynamic amout in each column, those amounts are in another column ‘amount’.
It’s posible to do somenthing like that? I already search a lot and couldn’t do that.
Thanks!
df_new = df.select(add_months(col(‘date’),col(amount))).alias(‘date’))
yes, certainly you can do it. please provide the details with a data, I will try to provide a solution.
Hi NNK
I’m not the OP but I’m having the same issue. Using the data from your example:
Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3)).toDF(“date”,”increment”).select(
col(“date”),
add_months(to_date(col(“date”),”yyyy-MM-dd”),col(“increment”)).as(“date_inc”)
).show()
I get the following error:
:27: error: type mismatch;
found : org.apache.spark.sql.Column
required: Int
Is there any workaround for this?
Thank you very much!
Hi CG, I see you have added additional column “increment” and you are trying to add value from this column to data. Your solution won’t work as to_date() method doesn’t take column as the second argument. try below
Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3))
.toDF(“date”,”increment”)
.select(col(“date”),expr(“add_months(to_date(date,’yyyy-MM-dd’),cast(increment as int))”))
.show()
Let me know if this works.
Thanks.
It appears the solution works in PySpark, looking at this comment that popped up recently in https://forums.databricks.com/answers/45143/view.html
Unfortunately this isn’t working in scala:
scala> Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3)).toDF(“date”,”increment”).select(
| col(“date”),
| add_months(to_date(col(“date”),”yyyy-MM-dd”),cast(increment as int)).as(“date_inc”)
| ).show()
:27: error: not found: value cast
add_months(to_date(col(“date”),”yyyy-MM-dd”),cast(increment as int)).as(“date_inc”)
You are not using expr() function as explained in solution, please try with expr() or selectExpr() function. I’ve updated the article with these. please check.
You’re absolutely correct, I missed the expr()
Thank you very much for taking the time!
Glad it helped you and thanks for reading the article.