You are currently viewing Spark to_timestamp() – Convert String to Timestamp Type

In this tutorial, you will learn how to convert a String column to Timestamp using Spark <em>to_timestamp</em>() function and the converted time would be in a format MM-dd-yyyy HH:mm:ss.SSS, I will explain how to use this function with a few Scala examples.

Syntax – to_timestamp()


Syntax: to_timestamp(timestampString:Column) 
Syntax: to_timestamp(timestampString:Column,format:String) 
 

This function has two signatures, the first signature takes just one argument and the argument should be in Timestamp format MM-dd-yyyy HH:mm:ss.SSS, when the format is not in this format, it returns null.

Related: Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.

The second signature takes an additional String argument to specify the format of the input Timestamp; this support formats specified in SimeDateFormat. Using this additional argument, you can convert String from any format to Timestamp type.

Convert String to Spark Timestamp type

In the below example we convert string pattern which is in Spark default format to Timestamp type since the input DataFrame column is in default Timestamp format, we use the first signature for conversion.


import org.apache.spark.sql.functions._
import spark.sqlContext.implicits._

//String to timestamps
val data = Seq(("2019-07-01 12:01:19.000"),
    ("2019-06-24 12:01:19.000"),
    ("2019-11-16 16:44:55.406"),
    ("2019-11-16 16:50:59.406"))
val df=data.toDF("input_timestamp")
df.withColumn("datetype_timestamp",
        to_timestamp(col("input_timestamp")))
  .printSchema()

Yields below schema. in this snippet, we just added a new column datetype_timestamp by converting the input column from string to Timestamp type.


root
 |-- input_timestamp: string (nullable = true)
 |-- datetype_timestamp: timestamp (nullable = true)

Custom String Format to Timestamp type

This example convert input timestamp string from custom format to Spark Timestamp type, to do this, we use the second syntax where it takes an additional argument to specify user-defined patterns for date-time formatting,


import org.apache.spark.sql.functions._
//when dates are not in Spark DateType format 'yyyy-MM-dd  HH:mm:ss.SSS'.
//Note that when dates are not in Spark DateType format, all Spark functions returns null
//Hence, first convert the input dates to Spark DateType using to_timestamp function
val data2 = Seq(("07-01-2019 12 01 19 406"),
    ("06-24-2019 12 01 19 406"),
    ("11-16-2019 16 44 55 406"),
    ("11-16-2019 16 50 59 406"))
val dfFata=data2.toDF("input_timestamp")

dfDate.withColumn("datetype_timestamp",
     to_timestamp(col("input_timestamp"),"MM-dd-yyyy HH mm ss SSS"))
    .show(false)

Yields below output


+-----------------------+-------------------+
|input_timestamp        |datetype_timestamp |
+-----------------------+-------------------+
|07-01-2019 12 01 19 406|2019-07-01 12:01:19|
|06-24-2019 12 01 19 406|2019-06-24 12:01:19|
|11-16-2019 16 44 55 406|2019-11-16 16:44:55|
|11-16-2019 16 50 59 406|2019-11-16 16:50:59|
+-----------------------+-------------------+

Time String without Date to Timestamp type

Note that to_timestamp() function returns TimestampType (date and time), when date not present in input, by default it assigns 1970-01-01 as a date. This demonstrates in the below example.


import org.apache.spark.sql.functions._
//Convert string to timestamp when input string has just time
val data3= Seq(("12:01:19.345"),
    ("12:01:20.567"),
    ("16:02:44.406"),
    ("16:50:59.406"))
valdf1=data3.toDF("input_timestamp")

df1.withColumn("datetype_timestamp",
    to_timestamp(col("input_timestamp"),"HH:mm:ss.SSS"))
    .show(false)

Yields below output


+---------------+-------------------+
|input_timestamp|datetype_timestamp |
+---------------+-------------------+
|12:01:19.345   |1970-01-01 12:01:19|
|12:01:20.567   |1970-01-01 12:01:20|
|16:02:44.406   |1970-01-01 16:02:44|
|16:50:59.406   |1970-01-01 16:50:59|
+---------------+-------------------+

In case if you want to convert string to date format use to_date() function. And here is another example to convert Timestamp to custom string pattern format.

The complete example can be downloaded from GitHub

Complete Example for quick reference


package com.sparkbyexamples.spark.dataframe.functions.datetime

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.LongType

object StringToTimestamp extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  //String to timestamps
  val df = Seq(("2019-07-01 12:01:19.000"),
    ("2019-06-24 12:01:19.000"),
    ("2019-11-16 16:44:55.406"),
    ("2019-11-16 16:50:59.406")).toDF("input_timestamp")

  df.withColumn("datetype_timestamp",
        to_timestamp(col("input_timestamp")))
    .printSchema()


  //Convert string to timestamp when input string has just time
  val df1 = Seq(("12:01:19.345"),
    ("12:01:20.567"),
    ("16:02:44.406"),
    ("16:50:59.406"))
    .toDF("input_timestamp")

  df1.withColumn("datetype_timestamp",
    to_timestamp(col("input_timestamp"),"HH:mm:ss.SSS"))
    .show(false)

  //when dates are not in Spark DateType format 'yyyy-MM-dd  HH:mm:ss.SSS'.
  //Note that when dates are not in Spark DateType format, all Spark functions returns null
  //Hence, first convert the input dates to Spark DateType using to_timestamp function
  val dfDate = Seq(("07-01-2019 12 01 19 406"),
    ("06-24-2019 12 01 19 406"),
    ("11-16-2019 16 44 55 406"),
    ("11-16-2019 16 50 59 406")).toDF("input_timestamp")

  dfDate.withColumn("datetype_timestamp",
          to_timestamp(col("input_timestamp"),"MM-dd-yyyy HH mm ss SSS"))
    .show(false)
}

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has 4 Comments

  1. Richard Shepherd

    Yes, the disappearance of the milliseconds is a problem. One forum claims this is fixed for Spark 2.4+, but still there I also lose ms. Anyone knows how to fix this?

  2. Anonymous

    how do I just return yyyy-mm-dd hh:mm from a timestamp field? no matter what I do it either returns null or append 00 as seconds to it

  3. Anonymous

    Why are the milliseconds lost in each case?

  4. Anonymous

    In each of these examples the milliseconds have been lost, with no mention of why. the to_timestamp description at the beginning of the article specifies the SSS to represent the milliseconds.

Comments are closed.