Spark – Add Hours, Minutes, and Seconds to Timestamp

Problem: How to add or subtract hours, minutes, and seconds to/from Spark DataFrame date or timestamp columns?

Solution: Spark SQL has no functions that add/subtract time unit hours, minutes, and seconds to or from a Timestamp column, however, SQL defines Interval to do it.

Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.

1. Using Spark SQL Interval

Since Spark doesn’t have any functions to add units to the Timestamp, we use INTERVAL to do our job.


// Using Spark SQL Interval
import org.apache.spark.sql.functions._
import spark.sqlContext.implicits._

spark.sql( "select current_timestamp," +
    "cast(current_timestamp as TIMESTAMP) + INTERVAL 2 hours as added_hours," +
    "cast(current_timestamp as TIMESTAMP) + INTERVAL 5 minutes as added_minutes," +
    "cast(current_timestamp as TIMESTAMP) + INTERVAL 55 seconds as added_seconds"
  ).show(false)

Before we apply INTERVAL, first you need to convert timestamp column from string to TimestampType using cast.

Yields below output:


// Output:
+----------------------+----------------------+----------------------+----------------------+
|current_timestamp()   |added_hours           |added_minutes         |added_seconds         |
+----------------------+----------------------+----------------------+----------------------+
|2019-11-17 05:54:22.87|2019-11-17 07:54:22.87|2019-11-17 05:59:22.87|2019-11-17 05:55:17.87|
+----------------------+----------------------+----------------------+----------------------+

The above example directly uses spark select without DataFrame, below example demonstrates how to do the same when we have a timestamp as a DataFrame column


import org.apache.spark.sql.functions._
val df = Seq(("2019-07-01 12:01:19.101"),
    ("2019-06-24 12:01:19.222"),
    ("2019-11-16 16:44:55.406"),
    ("2019-11-16 16:50:59.406")).toDF("input_timestamp")

df.createOrReplaceTempView("AddTimeExample")

val df2 = spark.sql("select input_timestamp, " +
    "cast(input_timestamp as TIMESTAMP) + INTERVAL 2 hours as added_hours," +
    "cast(input_timestamp as TIMESTAMP) + INTERVAL 5 minutes as added_minutes," +
    "cast(input_timestamp as TIMESTAMP) + INTERVAL 55 seconds as added_seconds from AddTimeExample"
    )
df2.show(false)

Here, first, we create a temporary table using createOrReplaceTempView() and then use this on SQL select.


+-----------------------+-----------------------+-----------------------+-----------------------+
|input_timestamp        |added_hours            |added_minutes          |added_seconds          |
+-----------------------+-----------------------+-----------------------+-----------------------+
|2019-07-01 12:01:19.101|2019-07-01 14:01:19.101|2019-07-01 12:06:19.101|2019-07-01 12:02:14.101|
|2019-06-24 12:01:19.222|2019-06-24 14:01:19.222|2019-06-24 12:06:19.222|2019-06-24 12:02:14.222|
|2019-11-16 16:44:55.406|2019-11-16 18:44:55.406|2019-11-16 16:49:55.406|2019-11-16 16:45:50.406|
|2019-11-16 16:50:59.406|2019-11-16 18:50:59.406|2019-11-16 16:55:59.406|2019-11-16 16:51:54.406|
+-----------------------+-----------------------+-----------------------+-----------------------+

Let’s see the same example without creating a temporary table but use <a href="https://sparkbyexamples.com/spark/spark-dataframe-withcolumn/">withColumn()</a> and expr() function


import org.apache.spark.sql.functions._
df.withColumn("added_hours",col("input_timestamp") + expr("INTERVAL 2 HOURS"))
  .withColumn("added_minutes",col("input_timestamp") + expr("INTERVAL 2 minutes"))
  .withColumn("added_seconds",col("input_timestamp") + expr("INTERVAL 2 seconds"))
  .show(false)

2. Spark Subtract Hours, Minutes, and Seconds from Timestamp

By just using – instead of +, you can subtract an hour, minute, and second from Timestamp.


// Spark Subtract Hours, Minutes, and Seconds from Timestamp 
import org.apache.spark.sql.functions._
df.withColumn("sub_hours",col("input_timestamp") - expr("INTERVAL 2 HOURS"))
  .withColumn("sub_minutes",col("input_timestamp") - expr("INTERVAL 2 minutes"))
  .withColumn("sub_seconds",col("input_timestamp") - expr("INTERVAL 2 seconds"))
  .show(false)

3. Complete Example

Below is a complete example of how to add or subtract hours, minutes, and seconds from the DataFrame Timestamp column. This example is also available at Spark Examples Git Hub project


// Complete Example 
package com.sparkbyexamples.spark.dataframe.functions.datetime

import org.apache.spark.sql.SparkSession

object AddTime extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  spark.sql( "select current_timestamp," +
    "cast(current_timestamp as TIMESTAMP) + INTERVAL 2 hours as added_hours," +
    "cast(current_timestamp as TIMESTAMP) + INTERVAL 5 minutes as added_minutes," +
    "cast(current_timestamp as TIMESTAMP) + INTERVAL 55 seconds as added_seconds"
  ).show(false)


  val df = Seq(("2019-07-01 12:01:19.101"),
    ("2019-06-24 12:01:19.222"),
    ("2019-11-16 16:44:55.406"),
    ("2019-11-16 16:50:59.406")).toDF("input_timestamp")


  df.createOrReplaceTempView("AddTimeExample")

  val df2 = spark.sql("select input_timestamp, " +
    "cast(input_timestamp as TIMESTAMP) + INTERVAL 2 hours as added_hours," +
    "cast(input_timestamp as TIMESTAMP) + INTERVAL 5 minutes as added_minutes," +
    "cast(input_timestamp as TIMESTAMP) + INTERVAL 55 seconds as added_seconds from AddTimeExample"
    )
  df2.show(false)

  // Alternate way
  df.withColumn("added_hours",col("input_timestamp") + expr("INTERVAL 2 HOURS"))
    .withColumn("added_minutes",col("input_timestamp") + expr("INTERVAL 2 minutes"))
    .withColumn("added_seconds",col("input_timestamp") + expr("INTERVAL 2 seconds"))
    .show(false)
}

Happy Learning!!

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply

This Post Has 4 Comments

  1. Pruthvi raj

    Thanks for the tutorial!!!!!!! U solved my problem!

  2. Anonymous

    How can we put dynamic value in above example in place of hard coded value?

  3. Diggi

    Thanks, this helps. Just add that by changing the + symbol to – we can subtract time interval.

    1. NNK

      Hello Diggi, Appreciate your command. I’ve added a suggested example. Thanks again.