You are currently viewing Spark SQL – Working with Unix  Timestamp

In this Spark article, you will learn how to convert or cast the DataFrame column from Unix timestamp in seconds (Long) to Date, Timestamp, and vice-versa using SQL functions <em>unix_timestamp</em>() and <em>from_unixtime()</em> with Scala examples.

Unix Time (Epoch Time)

What is Unix Time (Epoch Time)

Unix time is also known as Epoch time which specifies the moment in time since 1970-01-01 00:00:00 UTC. It is the number of seconds passed since Epoch time. Epoch time is widely used in Unix like operating systems.

Before we start, first, let’s Create a Spark DataFrame with different Dates and Timestamps.


import org.apache.spark.sql.functions._
import spark.sqlContext.implicits._

//Convert Timestamp to Unix timestamp
val inputDF = Seq(("2019-07-01 12:01:19","07-01-2019 12:01:19", "07-01-2019"))
    .toDF("timestamp_1","timestamp_2","timestamp_3")
inputDF.printSchema()
inputDF.show(false)

To make it simple for our example, I’ve named the columns as timestamp_1, timestamp_2, timestamp_3. From the above code, printSchema() and show(false) on DataFrame yields below output.


root
 |-- timestamp_1: string (nullable = true)
 |-- timestamp_2: string (nullable = true)
 |-- timestamp_3: string (nullable = true)

+-----------------------+-----------------------+-----------+
|timestamp_1            |timestamp_2            |timestamp_3|
+-----------------------+-----------------------+-----------+
|2019-07-01 12:01:19.000|07-01-2019 12:01:19.000|07-01-2019 |
+-----------------------+-----------------------+-----------+

unix_timestamp – Converts Date and Timestamp columns to Unix Timestamp

Use Spark SQL function unix_timestamp() to return a current Unix timestamp in seconds (Long), when arguments supplied, it returns the Unix timestamp of the input date or time column.

Syntax:


def unix_timestamp(): Column
def unix_timestamp(s: Column): Column
def unix_timestamp(s: Column, p: String): Column

This function has 3 different syntaxes, First one without arguments returns current timestamp in epoch time (Long), the other 2 takes an argument as date or timestamp which you want to convert to epoch time and format of the first argument you are supplying as the second argument. Let’s see with some examples.


import org.apache.spark.sql.functions._
//Convert timestamp to unix timestamp
val df = inputDF.select(
      unix_timestamp(col("timestamp_1")).as("timestamp_1"),
      unix_timestamp(col("timestamp_2"),"MM-dd-yyyy HH:mm:ss").as("timestamp_2"),
      unix_timestamp(col("timestamp_3"),"MM-dd-yyyy").as("timestamp_3"),
      unix_timestamp().as("timestamp_4")
   )
df.printSchema()
df.show(false)

Yields below output. As you notice, the above example uses all three unix_timestamp() syntaxes.


root
 |-- timestamp_1: long (nullable = true)
 |-- timestamp_2: long (nullable = true)
 |-- timestamp_3: long (nullable = true)
 |-- timestamp_4: long (nullable = true)

+-----------+-----------+-----------+-----------+
|timestamp_1|timestamp_2|timestamp_3|timestamp_4|
+-----------+-----------+-----------+-----------+
|1562007679 |1562007679 |1561964400 |1577166257 |
+-----------+-----------+-----------+-----------+

from_unixtime – Converts Unix Timestamp to Date and Timestamp string

Spark SQL Function from_unixtime() is used to convert the Unix timestamp to a String representing Date and Timestamp, in other words, it converts the Epoch time in seconds to date and timestamp.

Syntax:


def from_unixtime(ut: Column): Column
def from_unixtime(ut: Column, f: String): Column

This function has 2 signatures, the first one returns the Timestamp in a default format yyyy-MM-dd HH:mm:ss and the second one returns in a format specified in an argument.


import org.apache.spark.sql.functions._
// Convert Unix timestamp to timestamp
val df2 = df.select(
    from_unixtime(col("timestamp_1")).as("timestamp_1"),
    from_unixtime(col("timestamp_2"),"MM-dd-yyyy HH:mm:ss").as("timestamp_2"),
    from_unixtime(col("timestamp_3"),"MM-dd-yyyy").as("timestamp_3"),
    from_unixtime(col("timestamp_4")).as("timestamp_4")
)
df2.printSchema()
df2.show(false)

Yields below output. This snippet also uses both syntaxes of from_unixtime() function.


root
 |-- timestamp_1: string (nullable = true)
 |-- timestamp_2: string (nullable = true)
 |-- timestamp_3: string (nullable = true)
 |-- timestamp_4: string (nullable = true)

+-------------------+-------------------+-----------+-------------------+
|timestamp_1        |timestamp_2        |timestamp_3|timestamp_4        |
+-------------------+-------------------+-----------+-------------------+
|2019-07-01 12:01:19|07-01-2019 12:01:19|07-01-2019 |2019-12-23 21:44:17|
+-------------------+-------------------+-----------+-------------------+

Source code for reference


package com.sparkbyexamples.spark.dataframe.functions.datetime

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{from_unixtime, unix_timestamp, _}

object UnixTimestamp extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  //Convert Timestamp to Unix timestamp
  val inputDF = Seq(("2019-07-01 12:01:19","07-01-2019 12:01:19", "07-01-2019"))
    .toDF("timestamp_1","timestamp_2","timestamp_3")
  inputDF.printSchema()
  inputDF.show(false)

  //Convert timestamp to unix timestamp
   val df = inputDF.select(
      unix_timestamp(col("timestamp_1")).as("timestamp_1"),
      unix_timestamp(col("timestamp_2"),"MM-dd-yyyy HH:mm:ss").as("timestamp_2"),
      unix_timestamp(col("timestamp_3"),"MM-dd-yyyy").as("timestamp_3"),
      unix_timestamp().as("timestamp_4")
   )
  df.printSchema()
  df.show(false)

  // Convert Unix timestamp to timestamp
  val df2 = df.select(
    from_unixtime(col("timestamp_1")).as("timestamp_1"),
    from_unixtime(col("timestamp_2"),"MM-dd-yyyy HH:mm:ss").as("timestamp_2"),
    from_unixtime(col("timestamp_3"),"MM-dd-yyyy").as("timestamp_3"),
    from_unixtime(col("timestamp_4")).as("timestamp_4")
  )
  df2.printSchema()
  df2.show(false)
}

The complete code is available at GitHub project for reference

Conclusion

In this article, you have learned about Unix Time and how to use the Spark SQL function unix_timestamp() to convert Date and Timestamp to Unix Timestamp and from_unixtime() function to convert Unix Timestamp to Date and Timestamp back using Scala example.

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium