In this tutorial, you will learn how to convert a String column to Timestamp using Spark <em>to_timestamp</em>()
function and the converted time would be in a format MM-dd-yyyy HH:mm:ss.SSS
, I will explain how to use this function with a few Scala examples.
Syntax – to_timestamp()
Syntax: to_timestamp(timestampString:Column)
Syntax: to_timestamp(timestampString:Column,format:String)
This function has two signatures, the first signature takes just one argument and the argument should be in Timestamp format MM-dd-yyyy HH:mm:ss.SSS
, when the format is not in this format, it returns null.
Related: Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.
The second signature takes an additional String argument to specify the format of the input Timestamp; this support formats specified in SimeDateFormat. Using this additional argument, you can convert String from any format to Timestamp type.
Convert String to Spark Timestamp type
In the below example we convert string pattern which is in Spark default format to Timestamp type since the input DataFrame column is in default Timestamp format, we use the first signature for conversion.
import org.apache.spark.sql.functions._
import spark.sqlContext.implicits._
//String to timestamps
val data = Seq(("2019-07-01 12:01:19.000"),
("2019-06-24 12:01:19.000"),
("2019-11-16 16:44:55.406"),
("2019-11-16 16:50:59.406"))
val df=data.toDF("input_timestamp")
df.withColumn("datetype_timestamp",
to_timestamp(col("input_timestamp")))
.printSchema()
Yields below schema. in this snippet, we just added a new column datetype_timestamp
by converting the input column from string to Timestamp type.
root
|-- input_timestamp: string (nullable = true)
|-- datetype_timestamp: timestamp (nullable = true)
Custom String Format to Timestamp type
This example convert input timestamp string from custom format to Spark Timestamp type, to do this, we use the second syntax where it takes an additional argument to specify user-defined patterns for date-time formatting,
import org.apache.spark.sql.functions._
//when dates are not in Spark DateType format 'yyyy-MM-dd HH:mm:ss.SSS'.
//Note that when dates are not in Spark DateType format, all Spark functions returns null
//Hence, first convert the input dates to Spark DateType using to_timestamp function
val data2 = Seq(("07-01-2019 12 01 19 406"),
("06-24-2019 12 01 19 406"),
("11-16-2019 16 44 55 406"),
("11-16-2019 16 50 59 406"))
val dfFata=data2.toDF("input_timestamp")
dfDate.withColumn("datetype_timestamp",
to_timestamp(col("input_timestamp"),"MM-dd-yyyy HH mm ss SSS"))
.show(false)
Yields below output
+-----------------------+-------------------+
|input_timestamp |datetype_timestamp |
+-----------------------+-------------------+
|07-01-2019 12 01 19 406|2019-07-01 12:01:19|
|06-24-2019 12 01 19 406|2019-06-24 12:01:19|
|11-16-2019 16 44 55 406|2019-11-16 16:44:55|
|11-16-2019 16 50 59 406|2019-11-16 16:50:59|
+-----------------------+-------------------+
Time String without Date to Timestamp type
Note that to_timestamp()
function returns TimestampType (date and time), when date not present in input, by default it assigns 1970-01-01
as a date. This demonstrates in the below example.
import org.apache.spark.sql.functions._
//Convert string to timestamp when input string has just time
val data3= Seq(("12:01:19.345"),
("12:01:20.567"),
("16:02:44.406"),
("16:50:59.406"))
valdf1=data3.toDF("input_timestamp")
df1.withColumn("datetype_timestamp",
to_timestamp(col("input_timestamp"),"HH:mm:ss.SSS"))
.show(false)
Yields below output
+---------------+-------------------+
|input_timestamp|datetype_timestamp |
+---------------+-------------------+
|12:01:19.345 |1970-01-01 12:01:19|
|12:01:20.567 |1970-01-01 12:01:20|
|16:02:44.406 |1970-01-01 16:02:44|
|16:50:59.406 |1970-01-01 16:50:59|
+---------------+-------------------+
In case if you want to convert string to date format use to_date()
function. And here is another example to convert Timestamp to custom string pattern format.
The complete example can be downloaded from GitHub
Complete Example for quick reference
package com.sparkbyexamples.spark.dataframe.functions.datetime
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.LongType
object StringToTimestamp extends App {
val spark:SparkSession = SparkSession.builder()
.master("local")
.appName("SparkByExamples.com")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.sqlContext.implicits._
//String to timestamps
val df = Seq(("2019-07-01 12:01:19.000"),
("2019-06-24 12:01:19.000"),
("2019-11-16 16:44:55.406"),
("2019-11-16 16:50:59.406")).toDF("input_timestamp")
df.withColumn("datetype_timestamp",
to_timestamp(col("input_timestamp")))
.printSchema()
//Convert string to timestamp when input string has just time
val df1 = Seq(("12:01:19.345"),
("12:01:20.567"),
("16:02:44.406"),
("16:50:59.406"))
.toDF("input_timestamp")
df1.withColumn("datetype_timestamp",
to_timestamp(col("input_timestamp"),"HH:mm:ss.SSS"))
.show(false)
//when dates are not in Spark DateType format 'yyyy-MM-dd HH:mm:ss.SSS'.
//Note that when dates are not in Spark DateType format, all Spark functions returns null
//Hence, first convert the input dates to Spark DateType using to_timestamp function
val dfDate = Seq(("07-01-2019 12 01 19 406"),
("06-24-2019 12 01 19 406"),
("11-16-2019 16 44 55 406"),
("11-16-2019 16 50 59 406")).toDF("input_timestamp")
dfDate.withColumn("datetype_timestamp",
to_timestamp(col("input_timestamp"),"MM-dd-yyyy HH mm ss SSS"))
.show(false)
}
Happy Learning !!
In each of these examples the milliseconds have been lost, with no mention of why. the to_timestamp description at the beginning of the article specifies the SSS to represent the milliseconds.
Why are the milliseconds lost in each case?
how do I just return yyyy-mm-dd hh:mm from a timestamp field? no matter what I do it either returns null or append 00 as seconds to it
Yes, the disappearance of the milliseconds is a problem. One forum claims this is fixed for Spark 2.4+, but still there I also lose ms. Anyone knows how to fix this?