How to parse string and format dates on DataFrame

spark parsing and formatting Dates

In this tutorial, we will show you a Spark SQL DataFrame example of how to get the current system date-time, formatting Spark Date to a String date pattern and parsing String pattern to Spark DateType using Scala language and Spark SQL Date and Time functions.

Current Date and Timestamp

Spark provides current_date() function to get the current system date in DateType ‘yyyy-MM-dd’ format and current_timestamp() to get current timestamp in `yyyy-MM-dd HH:mm:ss.SSSS` format.


import org.apache.spark.sql.functions._
Seq(1).toDF("seq").select(
    current_date().as("current_date"), //returns date in `yyyy-MM-dd'
    current_timestamp().as("current_time") //returns date and time in `yyyy-MM-dd HH:mm:ss.SSSS`
  ).show(false)

Output:


+------------+-----------------------+
|current_date|current_time           |
+------------+-----------------------+
|2019-07-24  |2019-07-24 20:50:08.667|
+------------+-----------------------+

Format Spark DateType to String formatting pattern

Let’s see how to format the Spark DateType column to String formatting pattern using the Spark date function date_format(). The function support all Date patterns described in Java DateTimeFormatter.


import org.apache.spark.sql.functions._
Seq(1).toDF("seq").select(
    current_date().as("current_date"),
    date_format(current_timestamp(),"yyyy MM dd").as("yyyy MM dd"),
    date_format(current_timestamp(),"MM/dd/yyyy hh:mm").as("MM/dd/yyyy"),
    date_format(current_timestamp(),"yyyy MMM dd").as("yyyy MMMM dd"),
    date_format(current_timestamp(),"yyyy MMMM dd E").as("yyyy MMMM dd E")
  ).show(false)

Output:


+------------+----------+----------------+------------+----------------+
|current_date|yyyy MM dd|MM/dd/yyyy      |yyyy MMMM dd|yyyy MMMM dd E  |
+------------+----------+----------------+------------+----------------+
|2019-07-24  |2019 07 24|07/24/2019 08:50|2019 Jul 24 |2019 July 24 Wed|
+------------+----------+----------------+------------+----------------+

Parsing Date from String object to Spark DateType

Spark Dataframe API also provides date function to_date() which parses Date from String object and converts to Spark DateType format. when dates are in ‘yyyy-MM-dd’ format, spark function auto-cast to DateType by casting rules. When dates are not in specified format this function returns null.


import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07-24-2009")).toDF("Date").select(
    col("Date"),
    to_date(col("Date")).as("to_date")
  ).show()

Output:


+----------+----------+
|      Date|   to_date|
+----------+----------+
|2019-07-24|2019-07-24|
|07-24-2009|      null|
+----------+----------+

However, Spark defines overloaded function of to_date() which takes the patterns of the Date string as an additional parameter. Using this, we should able to convert any date string to Spark DateType.


import org.apache.spark.sql.functions._
Seq(("06-03-2009"),("07-24-2009")).toDF("Date").select(
    col("Date"),
    to_date(col("Date"),"MM-dd-yyyy").as("to_date")
  ).show()

Output:


+----------+----------+
|      Date|   to_date|
+----------+----------+
|06-03-2009|2009-06-03|
|07-24-2009|2009-07-24|
+----------+----------+

Format Different Date Strings to Spark DateType

Below examples shows how to convert different dates from a single column to Spark DateType.


import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07/24/2019"),("2019 Jul 24"),("07-27-19"))
    .toDF("Date").select(
    col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
      to_date(col("Date"),"yyyy-MM-dd"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
      to_date(col("Date"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
      to_date(col("Date"),"yyyy MMMM dd"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Output:


+-----------+--------------+
|       Date| Formated Date|
+-----------+--------------+
| 2019-07-24|    2019-07-24|
| 07/24/2019|    2019-07-24|
|2019 Jul 24|    2019-07-24|
|   07-27-19|Unknown Format|
+-----------+--------------+

Parsing Different Date Strings to Specific Date Formatting pattern

Below examples shows how to convert different dates strings from a single column to standard date string pattern using to_date() and date_format() function.


import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07/24/2019"),("2019 Jul 24"),("07-27-19"))
    .toDF("Date").select(
    col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
      date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
      date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
      date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Output:


+-----------+--------------+
|       Date| Formated Date|
+-----------+--------------+
| 2019-07-24|    07/24/2019|
| 07/24/2019|    07/24/2019|
|2019 Jul 24|    07/24/2019|
|   07-27-19|Unknown Format|
+-----------+--------------+

Conclusion:

In this article, you have learned how to use to_date() function which parses Date from String object to Spark DateType and date_format() function which formats Spark DateType to String formatting pattern. You can find more information about these functions at the following blog

Happy Learning !!

spark parsing and formatting Dates

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

This Post Has 2 Comments

  1. Anonymous

    Your walkthroughs would be way easier if you mentioned the imported libraries

    1. NNK

      Thanks for the feedback. will update the articles.

You are currently viewing How to parse string and format dates on DataFrame