Use to_timestamp()
function to convert String to Timestamp (TimestampType) in PySpark. The converted time would be in a default format of MM-dd-yyyy HH:mm:ss.SSS
, I will explain how to use this function with a few examples.
Syntax – to_timestamp()
Syntax: to_timestamp(timestampString:Column)
Syntax: to_timestamp(timestampString:Column,format:String)
This function has the above two signatures that are defined in PySpark SQL Date & Timestamp Functions, the first syntax takes just one argument and the argument should be in Timestamp format ‘MM-dd-yyyy HH:mm:ss.SSS
‘, when the format is not in this format, it returns null.
The second signature takes an additional String argument to specify the format of the input Timestamp; this supports formats specified in SimeDateFormat. Using this additional argument, you can cast String from any format to Timestamp type in PySpark.
Convert String to PySpark Timestamp type
In the below example, we convert the string pattern which is in PySpark default format to Timestamp type, since the input DataFrame column is in default Timestamp format, we use the first signature for conversion. And the second example uses the cast
function to do the same.
from pyspark.sql.functions import *
df=spark.createDataFrame(
data = [ ("1","2019-06-24 12:01:19.000")],
schema=["id","input_timestamp"])
df.printSchema()
#Timestamp String to DateType
df.withColumn("timestamp",to_timestamp("input_timestamp")) \
.show(truncate=False)
# Using Cast to convert TimestampType to DateType
df.withColumn('timestamp_string', \
to_timestamp('timestamp').cast('string')) \
.show(truncate=False)
In this snippet, we just add a new column timestamp
by converting the input column from string to Timestamp type.
root
|-- id: string (nullable = true)
|-- timestamp: string (nullable = true)
+---+-----------------------+-------------------+
|id |input_timestamp |timestamp |
+---+-----------------------+-------------------+
|1 |2019-06-24 12:01:19.000|2019-06-24 12:01:19|
+---+-----------------------+-------------------+
Custom string format to Timestamp type
This example converts the input timestamp string from custom format to PySpark Timestamp type, to do this, we use the second syntax where it takes an additional argument to specify user-defined patterns for date-time formatting,
#when dates are not in Spark TimestampType format 'yyyy-MM-dd HH:mm:ss.SSS'.
#Note that when dates are not in Spark Tiemstamp format, all Spark functions returns null
#Hence, first convert the input dates to Spark DateType using to_timestamp function
df.select(to_timestamp(lit('06-24-2019 12:01:19.000'),'MM-dd-yyyy HH:mm:ss.SSSS')) \
.show()
#Displays
+-------------------------------------------------------------------+
|to_timestamp('06-24-2019 12:01:19.000', 'MM-dd-yyyy HH:mm:ss.SSSS')|
+-------------------------------------------------------------------+
| 2019-06-24 12:01:19|
+-------------------------------------------------------------------+
In case you want to convert string to date format use to_date()
function. And here is another example to convert Timestamp to custom string pattern format.
SQL Example
#SQL string to TimestampType
spark.sql("select to_timestamp('2019-06-24 12:01:19.000') as timestamp")
#SQL CAST timestamp string to TimestampType
spark.sql("select timestamp('2019-06-24 12:01:19.000') as timestamp")
#SQL Custom string to TimestampType
spark.sql("select to_timestamp('06-24-2019 12:01:19.000','MM-dd-yyyy HH:mm:ss.SSSS') as timestamp")
Complete Example for quick reference
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.getOrCreate()
from pyspark.sql.functions import *
df=spark.createDataFrame(
data = [ ("1","2019-06-24 12:01:19.000")],
schema=["id","input_timestamp"])
df.printSchema()
#Timestamp String to DateType
df.withColumn("timestamp",to_timestamp("input_timestamp")) \
.show(truncate=False)
# Using Cast to convert TimestampType to DateType
df.withColumn('timestamp', \
to_timestamp('input_timestamp').cast('string')) \
.show(truncate=False)
df.select(to_timestamp(lit('06-24-2019 12:01:19.000'),'MM-dd-yyyy HH:mm:ss.SSSS')) \
.show(truncate=False)
#SQL string to TimestampType
spark.sql("select to_timestamp('2019-06-24 12:01:19.000') as timestamp")
#SQL CAST timestamp string to TimestampType
spark.sql("select timestamp('2019-06-24 12:01:19.000') as timestamp")
#SQL Custom string to TimestampType
spark.sql("select to_timestamp('06-24-2019 12:01:19.000','MM-dd-yyyy HH:mm:ss.SSSS') as timestamp")
Happy Learning !!
Related Articles:
- PySpark SQL – How to Get Current Date & Timestamp
- PySpark SQL – Working with Unix Time
- PySpark SQL – Date and Timestamp Functions
- PySpark SQL – Convert Date to String Format
- PySpark SQL – Convert String to Date Format
- PySpark SQL – Convert Timestamp to Date
- PySpark SQL – Working with Unix Time | Timestamp
- PySpark SQL expr() (Expression ) Function
- PySpark SQL Date and Timestamp Functions