Problem: How to add or subtract hours, minutes, and seconds to/from Spark DataFrame date or timestamp columns?
Solution: Spark SQL has no functions that add/subtract time unit hours, minutes, and seconds to or from a Timestamp column, however, SQL defines Interval to do it.
Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.
Using Spark SQL Interval
Since Spark doesn’t have any functions to add units to the Timestamp, we use INTERVAL
to do our job.
import org.apache.spark.sql.functions._
import spark.sqlContext.implicits._
spark.sql( "select current_timestamp," +
"cast(current_timestamp as TIMESTAMP) + INTERVAL 2 hours as added_hours," +
"cast(current_timestamp as TIMESTAMP) + INTERVAL 5 minutes as added_minutes," +
"cast(current_timestamp as TIMESTAMP) + INTERVAL 55 seconds as added_seconds"
).show(false)
Before we apply INTERVAL, first you need to convert timestamp column from string to TimestampType using cast.
Yields below output:
+----------------------+----------------------+----------------------+----------------------+
|current_timestamp() |added_hours |added_minutes |added_seconds |
+----------------------+----------------------+----------------------+----------------------+
|2019-11-17 05:54:22.87|2019-11-17 07:54:22.87|2019-11-17 05:59:22.87|2019-11-17 05:55:17.87|
+----------------------+----------------------+----------------------+----------------------+
The above example directly uses spark select without DataFrame, below example demonstrates how to do the same when we have a timestamp as a DataFrame column
import org.apache.spark.sql.functions._
val df = Seq(("2019-07-01 12:01:19.101"),
("2019-06-24 12:01:19.222"),
("2019-11-16 16:44:55.406"),
("2019-11-16 16:50:59.406")).toDF("input_timestamp")
df.createOrReplaceTempView("AddTimeExample")
val df2 = spark.sql("select input_timestamp, " +
"cast(input_timestamp as TIMESTAMP) + INTERVAL 2 hours as added_hours," +
"cast(input_timestamp as TIMESTAMP) + INTERVAL 5 minutes as added_minutes," +
"cast(input_timestamp as TIMESTAMP) + INTERVAL 55 seconds as added_seconds from AddTimeExample"
)
df2.show(false)
Here, first, we create a temporary table using createOrReplaceTempView() and then use this on SQL select.
+-----------------------+-----------------------+-----------------------+-----------------------+
|input_timestamp |added_hours |added_minutes |added_seconds |
+-----------------------+-----------------------+-----------------------+-----------------------+
|2019-07-01 12:01:19.101|2019-07-01 14:01:19.101|2019-07-01 12:06:19.101|2019-07-01 12:02:14.101|
|2019-06-24 12:01:19.222|2019-06-24 14:01:19.222|2019-06-24 12:06:19.222|2019-06-24 12:02:14.222|
|2019-11-16 16:44:55.406|2019-11-16 18:44:55.406|2019-11-16 16:49:55.406|2019-11-16 16:45:50.406|
|2019-11-16 16:50:59.406|2019-11-16 18:50:59.406|2019-11-16 16:55:59.406|2019-11-16 16:51:54.406|
+-----------------------+-----------------------+-----------------------+-----------------------+
Let’s see the same example without creating a temporary table but use <a href="https://sparkbyexamples.com/spark/spark-dataframe-withcolumn/">withColumn()</a>
and expr()
function
import org.apache.spark.sql.functions._
df.withColumn("added_hours",col("input_timestamp") + expr("INTERVAL 2 HOURS"))
.withColumn("added_minutes",col("input_timestamp") + expr("INTERVAL 2 minutes"))
.withColumn("added_seconds",col("input_timestamp") + expr("INTERVAL 2 seconds"))
.show(false)
Spark Subtract Hours, Minutes, and Seconds from Timestamp
By just using – instead of +, you can subtract an hour, minute, and second from Timestamp.
import org.apache.spark.sql.functions._
df.withColumn("sub_hours",col("input_timestamp") - expr("INTERVAL 2 HOURS"))
.withColumn("sub_minutes",col("input_timestamp") - expr("INTERVAL 2 minutes"))
.withColumn("sub_seconds",col("input_timestamp") - expr("INTERVAL 2 seconds"))
.show(false)
Complete Example
Below is a complete example of how to add or subtract hours, minutes, and seconds from the DataFrame Timestamp column. This example is also available at Spark Examples Git Hub project
package com.sparkbyexamples.spark.dataframe.functions.datetime
import org.apache.spark.sql.SparkSession
object AddTime extends App {
val spark:SparkSession = SparkSession.builder()
.master("local")
.appName("SparkByExamples.com")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.sqlContext.implicits._
spark.sql( "select current_timestamp," +
"cast(current_timestamp as TIMESTAMP) + INTERVAL 2 hours as added_hours," +
"cast(current_timestamp as TIMESTAMP) + INTERVAL 5 minutes as added_minutes," +
"cast(current_timestamp as TIMESTAMP) + INTERVAL 55 seconds as added_seconds"
).show(false)
val df = Seq(("2019-07-01 12:01:19.101"),
("2019-06-24 12:01:19.222"),
("2019-11-16 16:44:55.406"),
("2019-11-16 16:50:59.406")).toDF("input_timestamp")
df.createOrReplaceTempView("AddTimeExample")
val df2 = spark.sql("select input_timestamp, " +
"cast(input_timestamp as TIMESTAMP) + INTERVAL 2 hours as added_hours," +
"cast(input_timestamp as TIMESTAMP) + INTERVAL 5 minutes as added_minutes," +
"cast(input_timestamp as TIMESTAMP) + INTERVAL 55 seconds as added_seconds from AddTimeExample"
)
df2.show(false)
// alternate way
df.withColumn("added_hours",col("input_timestamp") + expr("INTERVAL 2 HOURS"))
.withColumn("added_minutes",col("input_timestamp") + expr("INTERVAL 2 minutes"))
.withColumn("added_seconds",col("input_timestamp") + expr("INTERVAL 2 seconds"))
.show(false)
}
Happy Learning!!
Thanks, this helps. Just add that by changing the + symbol to – we can subtract time interval.
Hello Diggi, Appreciate your command. I’ve added a suggested example. Thanks again.
How can we put dynamic value in above example in place of hard coded value?
Thanks for the tutorial!!!!!!! U solved my problem!