You are currently viewing How to parse string and format dates on DataFrame
spark parsing and formatting Dates

In this tutorial, we will show you a Spark SQL DataFrame example of how to get the current system date-time, formatting Spark Date to a String date pattern and parsing String pattern to Spark DateType using Scala language and Spark SQL Date and Time functions.

1. Current Date and Timestamp

Spark provides current_date() function to get the current system date in DateType ‘yyyy-MM-dd’ format and current_timestamp() to get current timestamp in `yyyy-MM-dd HH:mm:ss.SSSS` format.


import org.apache.spark.sql.functions._
Seq(1).toDF("seq").select(
    current_date().as("current_date"), //returns date in `yyyy-MM-dd'
    current_timestamp().as("current_time") //returns date and time in `yyyy-MM-dd HH:mm:ss.SSSS`
  ).show(false)

Output:


// Output:
+------------+-----------------------+
|current_date|current_time           |
+------------+-----------------------+
|2019-07-24  |2019-07-24 20:50:08.667|
+------------+-----------------------+

2. Format Spark DateType to String formatting pattern

Let’s see how to format the Spark DateType column to String formatting pattern using the Spark date function date_format(). The function support all Date patterns described in Java DateTimeFormatter.


// 
import org.apache.spark.sql.functions._
Seq(1).toDF("seq").select(
    current_date().as("current_date"),
    date_format(current_timestamp(),"yyyy MM dd").as("yyyy MM dd"),
    date_format(current_timestamp(),"MM/dd/yyyy hh:mm").as("MM/dd/yyyy"),
    date_format(current_timestamp(),"yyyy MMM dd").as("yyyy MMMM dd"),
    date_format(current_timestamp(),"yyyy MMMM dd E").as("yyyy MMMM dd E")
  ).show(false)

Output:


// Output:
+------------+----------+----------------+------------+----------------+
|current_date|yyyy MM dd|MM/dd/yyyy      |yyyy MMMM dd|yyyy MMMM dd E  |
+------------+----------+----------------+------------+----------------+
|2019-07-24  |2019 07 24|07/24/2019 08:50|2019 Jul 24 |2019 July 24 Wed|
+------------+----------+----------------+------------+----------------+

3. Parsing Date from String object to Spark DateType

Spark Dataframe API also provides date function to_date() which parses Date from String object and converts to Spark DateType format. when dates are in ‘yyyy-MM-dd’ format, spark function auto-cast to DateType by casting rules. When dates are not in specified format this function returns null.


// Parsing Date from String object to Spark DateType
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07-24-2009")).toDF("Date").select(
    col("Date"),
    to_date(col("Date")).as("to_date")
  ).show()

Output:


// Output:
+----------+----------+
|      Date|   to_date|
+----------+----------+
|2019-07-24|2019-07-24|
|07-24-2009|      null|
+----------+----------+

However, Spark defines overloaded function of to_date() which takes the patterns of the Date string as an additional parameter. Using this, we should able to convert any date string to Spark DateType.


import org.apache.spark.sql.functions._
Seq(("06-03-2009"),("07-24-2009")).toDF("Date").select(
    col("Date"),
    to_date(col("Date"),"MM-dd-yyyy").as("to_date")
  ).show()

Output:


// Output:
+----------+----------+
|      Date|   to_date|
+----------+----------+
|06-03-2009|2009-06-03|
|07-24-2009|2009-07-24|
+----------+----------+

4. Format Different Date Strings to Spark DateType

Below examples shows how to convert different dates from a single column to Spark DateType.


// Format Different Date Strings to Spark DateType
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07/24/2019"),("2019 Jul 24"),("07-27-19"))
    .toDF("Date").select(
    col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
      to_date(col("Date"),"yyyy-MM-dd"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
      to_date(col("Date"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
      to_date(col("Date"),"yyyy MMMM dd"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Output:


// Output:
+-----------+--------------+
|       Date| Formated Date|
+-----------+--------------+
| 2019-07-24|    2019-07-24|
| 07/24/2019|    2019-07-24|
|2019 Jul 24|    2019-07-24|
|   07-27-19|Unknown Format|
+-----------+--------------+

5. Parsing Different Date Strings to Specific Date Formatting pattern

Below examples shows how to convert different dates strings from a single column to standard date string pattern using to_date() and date_format() function.


// Parsing Different Date Strings to Specific Date Formatting pattern
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07/24/2019"),("2019 Jul 24"),("07-27-19"))
    .toDF("Date").select(
    col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
      date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
      date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
      date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Output:


// Output:
+-----------+--------------+
|       Date| Formated Date|
+-----------+--------------+
| 2019-07-24|    07/24/2019|
| 07/24/2019|    07/24/2019|
|2019 Jul 24|    07/24/2019|
|   07-27-19|Unknown Format|
+-----------+--------------+

Conclusion:

In this article, you have learned how to use to_date() function which parses Date from String object to Spark DateType and date_format() function which formats Spark DateType to String formatting pattern. You can find more information about these functions at the following blog

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has 4 Comments

  1. NNK

    Thanks for the feedback. will update the articles.

  2. NNK

    Thanks for the feedback. will update the articles.

  3. Anonymous

    Your walkthroughs would be way easier if you mentioned the imported libraries

  4. Anonymous

    Your walkthroughs would be way easier if you mentioned the imported libraries

Comments are closed.