In this tutorial, we will show you a Spark SQL DataFrame example of how to add a day, month, and year to DataFrame date column. We will also see some examples when the DataFrame column has different date formats and adding day, month, and year with a value from another column.
Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.
Spark SQL provides DataFrame function add_months()
to add or subtract months from a Date Column and date_add()
, date_sub()
to add and subtract days.
Below code, add days and months to Dataframe column, when the input Date in “yyyy-MM-dd” Spark DateType format.
import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df=Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date")
df.select(col("date"),
add_months(col("date"),3).as("add_months"), // provide +ve value to add months
add_months(col("date"),-3).as("sub_months"), //provide -ve value to subtract months
date_add(col("date"),4).as("date_add"), // to add day
date_sub(col("date"),4).as("date_sub") //to substract day
).show()
Outputs:
+----------+----------+----------+----------+----------+
| date|add_months|sub_months| date_add| date_sub|
+----------+----------+----------+----------+----------+
|2019-01-23|2019-04-23|2018-10-23|2019-01-27|2019-01-19|
|2019-06-24|2019-09-24|2019-03-24|2019-06-28|2019-06-20|
|2019-09-20|2019-12-20|2019-06-20|2019-09-24|2019-09-16|
+----------+----------+----------+----------+----------+
In case, if your input Date is not in Spark DateType “yyyy-MM-dd” format, then first you need to convert the Date to DateType before you apply add_months()
and date_sub()
function. Let’s see an example.
import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df2=Seq(("06-03-2009"),("07-24-2009")).toDF("date").
df2.select(col("Date"),
add_months(to_date(col("Date"),"MM-dd-yyyy"),3).as("add_months"),
add_months(to_date(col("Date"),"MM-dd-yyyy"),-3).as("add_months2"),
date_add(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_add"),
date_add(to_date(col("Date"),"MM-dd-yyyy"),-3).as("date_add2"),
date_sub(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_sub")
).show()
Outputs:
+----------+----------+-----------+----------+----------+----------+
| Date|add_months|add_months2| date_add| date_add2| date_sub|
+----------+----------+-----------+----------+----------+----------+
|06-03-2009|2009-09-03| 2009-03-03|2009-06-06|2009-05-31|2009-05-31|
|07-24-2009|2009-10-24| 2009-04-24|2009-07-27|2009-07-21|2009-07-21|
+----------+----------+-----------+----------+----------+----------+
Add Day, Month and Year with a Value From Another Column
If you wanted to add a day, month, and year with the value from another column, you need to use expr()
SQL function. Below is an example with add_month() and I will leave it to you to explore for month and year.
import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df3=Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3))
.toDF("date","increment")
df3.select(col("date"),col("increment"),
expr("add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int))").as("inc_date")
).show()
outputs
+----------+---------+----------+
| date|increment| inc_date|
+----------+---------+----------+
|2019-01-23| 1|2019-02-23|
|2019-06-24| 2|2019-08-24|
|2019-09-20| 3|2019-12-20|
+----------+---------+----------+
Alternatively, you can also write above statement with selectExpr()
import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val df4=Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3)).toDF("date","increment")
df4.selectExpr("date","increment","add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int)) as inc_date")
.show()
Complete Code:
package com.sparkbyexamples.spark.dataframe.functions
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{add_months, date_add, _}
object DateAddMonths extends App {
val spark:SparkSession = SparkSession.builder()
.master("local")
.appName("SparkByExamples.com")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.sqlContext.implicits._
Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date").select(
col("date"),
add_months(col("date"),3).as("add_months"),
add_months(col("date"),-3).as("sub_months"),
date_add(col("date"),4).as("date_add"),
date_sub(col("date"),4).as("date_sub")
).show()
Seq(("06-03-2009"),("07-24-2009")).toDF("date").select(
col("Date"),
add_months(to_date(col("Date"),"MM-dd-yyyy"),3).as("add_months"),
add_months(to_date(col("Date"),"MM-dd-yyyy"),-3).as("add_months2"),
date_add(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_add"),
date_add(to_date(col("Date"),"MM-dd-yyyy"),-3).as("date_add2"),
date_sub(to_date(col("Date"),"MM-dd-yyyy"),3).as("date_sub")
).show()
//Adding value from another column
Seq(("2019-01-23",1),("2019-06-24",2),("2019-09-20",3))
.toDF("date","increment")
.select(col("date"),col("increment"),
expr("add_months(to_date(date,'yyyy-MM-dd'),cast(increment as int))").as("inc_date"))
.show()
}
Conclusion:
From this article, you have learned Spark Dataframe function add_months()
to add months or subtract months from a column and date_add()
, date_sub()
to add and subtract days from a column.
Happy Learning !!
Thank you for this article!
I’m trying to do add an dynamic amout in each column, those amounts are in another column ‘amount’.
It’s posible to do somenthing like that? I already search a lot and couldn’t do that.
Thanks!
df_new = df.select(add_months(col(‘date’),col(amount))).alias(‘date’))
yes, certainly you can do it. please provide the details with a data, I will try to provide a solution.
Hi NNK
I’m not the OP but I’m having the same issue. Using the data from your example:
Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3)).toDF(“date”,”increment”).select(
col(“date”),
add_months(to_date(col(“date”),”yyyy-MM-dd”),col(“increment”)).as(“date_inc”)
).show()
I get the following error:
:27: error: type mismatch;
found : org.apache.spark.sql.Column
required: Int
Is there any workaround for this?
Thank you very much!
Hi CG, I see you have added additional column “increment” and you are trying to add value from this column to data. Your solution won’t work as to_date() method doesn’t take column as the second argument. try below
Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3))
.toDF(“date”,”increment”)
.select(col(“date”),expr(“add_months(to_date(date,’yyyy-MM-dd’),cast(increment as int))”))
.show()
Let me know if this works.
Thanks.
It appears the solution works in PySpark, Unfortunately this isn’t working in scala:
scala> Seq((“2019-01-23”,1),(“2019-06-24”,2),(“2019-09-20”,3)).toDF(“date”,”increment”).select(
| col(“date”),
| add_months(to_date(col(“date”),”yyyy-MM-dd”),cast(increment as int)).as(“date_inc”)
| ).show()
:27: error: not found: value cast
add_months(to_date(col(“date”),”yyyy-MM-dd”),cast(increment as int)).as(“date_inc”)
You are not using expr() function as explained in solution, please try with expr() or selectExpr() function. I’ve updated the article with these. please check.
You’re absolutely correct, I missed the expr()
Thank you very much for taking the time!
Glad it helped you and thanks for reading the article.
We’ve got the days and months, but how do we add years to a timestamp?
Hi Anonymous
You could just use the add_months functionality and multiply by 12 to get years. Here’s an example defining an own function:
“`
import org.apache.spark.sql.functions.add_months
import org.apache.spark.sql.Column
def add_years(col: Column, years: Int): Column = add_months(col, 12*years)
val nYears = 3
Seq((“2019-01-23”),(“2019-06-24”),(“2019-09-20”))
.toDF(“date”)
.withColumn(“threeYearsSubstractedExample”, add_years(col(“date”), -3))
“`