• Post author:
  • Post category:PySpark
  • Post last modified:March 27, 2024
  • Reading time:3 mins read
You are currently viewing PySpark to_date() – Convert String to Date Format

PySpark SQL function provides to_date() function to convert String to Date fromat of a DataFrame column. Note that Spark Date Functions support all Java Date formats specified in DateTimeFormatter.

to_date() – function is used to format string (StringType) to date (DateType) column.


Syntax: to_date(column,format)
Example: to_date(col("string_column"),"MM-dd-yyyy") 

This function takes the first argument as a date string and the second argument takes the pattern the date is in the first argument.

Below code snippet takes the String and converts it to Data format.


from pyspark.sql.functions import *
df=spark.createDataFrame([["02-03-2013"],["05-06-2023"]],["input"])
df.select(col("input"),to_date(col("input"),"MM-dd-yyyy").alias("date")) \
  .show()

Output:


+----------+----------+
|     input|      date|
+----------+----------+
|02-03-2013|2013-02-03|
|05-06-2023|2023-05-06|
+----------+----------+

Alternatively, you can convert String to Date with SQL by using same functions.


spark.sql("select to_date('02-03-2013','MM-dd-yyyy') date") \
     .show()

Complete Example


from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder \
               .appName('SparkByExamples.com') \
               .getOrCreate()

from pyspark.sql.functions import *

df=spark.createDataFrame([["02-03-2013"],["05-06-2023"]],["input"])
df.select(col("input"),to_date(col("input"),"MM-dd-yyyy").alias("date")) \
  .show()

#SQL
spark.sql("select to_date('02-03-2013','MM-dd-yyyy') date").show()

Conclusion:

In this article, you have learned how to convert Date to String format using to_date() functions.

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium