Site icon Spark By {Examples}

How to parse string and format dates on DataFrame

spark parsing and formatting Dates

In this tutorial, we will show you a Spark SQL DataFrame example of how to get the current system date-time, formatting Spark Date to a String date pattern and parsing String pattern to Spark DateType using Scala language and Spark SQL Date and Time functions.

1. Current Date and Timestamp

Spark provides current_date() function to get the current system date in DateType ‘yyyy-MM-dd’ format and current_timestamp() to get current timestamp in `yyyy-MM-dd HH:mm:ss.SSSS` format.


import org.apache.spark.sql.functions._
Seq(1).toDF("seq").select(
    current_date().as("current_date"), //returns date in `yyyy-MM-dd'
    current_timestamp().as("current_time") //returns date and time in `yyyy-MM-dd HH:mm:ss.SSSS`
  ).show(false)

Output:


// Output:
+------------+-----------------------+
|current_date|current_time           |
+------------+-----------------------+
|2019-07-24  |2019-07-24 20:50:08.667|
+------------+-----------------------+

2. Format Spark DateType to String formatting pattern

Let’s see how to format the Spark DateType column to String formatting pattern using the Spark date function date_format(). The function support all Date patterns described in Java DateTimeFormatter.


// 
import org.apache.spark.sql.functions._
Seq(1).toDF("seq").select(
    current_date().as("current_date"),
    date_format(current_timestamp(),"yyyy MM dd").as("yyyy MM dd"),
    date_format(current_timestamp(),"MM/dd/yyyy hh:mm").as("MM/dd/yyyy"),
    date_format(current_timestamp(),"yyyy MMM dd").as("yyyy MMMM dd"),
    date_format(current_timestamp(),"yyyy MMMM dd E").as("yyyy MMMM dd E")
  ).show(false)

Output:


// Output:
+------------+----------+----------------+------------+----------------+
|current_date|yyyy MM dd|MM/dd/yyyy      |yyyy MMMM dd|yyyy MMMM dd E  |
+------------+----------+----------------+------------+----------------+
|2019-07-24  |2019 07 24|07/24/2019 08:50|2019 Jul 24 |2019 July 24 Wed|
+------------+----------+----------------+------------+----------------+

3. Parsing Date from String object to Spark DateType

Spark Dataframe API also provides date function to_date() which parses Date from String object and converts to Spark DateType format. when dates are in ‘yyyy-MM-dd’ format, spark function auto-cast to DateType by casting rules. When dates are not in specified format this function returns null.


// Parsing Date from String object to Spark DateType
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07-24-2009")).toDF("Date").select(
    col("Date"),
    to_date(col("Date")).as("to_date")
  ).show()

Output:


// Output:
+----------+----------+
|      Date|   to_date|
+----------+----------+
|2019-07-24|2019-07-24|
|07-24-2009|      null|
+----------+----------+

However, Spark defines overloaded function of to_date() which takes the patterns of the Date string as an additional parameter. Using this, we should able to convert any date string to Spark DateType.


import org.apache.spark.sql.functions._
Seq(("06-03-2009"),("07-24-2009")).toDF("Date").select(
    col("Date"),
    to_date(col("Date"),"MM-dd-yyyy").as("to_date")
  ).show()

Output:


// Output:
+----------+----------+
|      Date|   to_date|
+----------+----------+
|06-03-2009|2009-06-03|
|07-24-2009|2009-07-24|
+----------+----------+

4. Format Different Date Strings to Spark DateType

Below examples shows how to convert different dates from a single column to Spark DateType.


// Format Different Date Strings to Spark DateType
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07/24/2019"),("2019 Jul 24"),("07-27-19"))
    .toDF("Date").select(
    col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
      to_date(col("Date"),"yyyy-MM-dd"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
      to_date(col("Date"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
      to_date(col("Date"),"yyyy MMMM dd"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Output:


// Output:
+-----------+--------------+
|       Date| Formated Date|
+-----------+--------------+
| 2019-07-24|    2019-07-24|
| 07/24/2019|    2019-07-24|
|2019 Jul 24|    2019-07-24|
|   07-27-19|Unknown Format|
+-----------+--------------+

5. Parsing Different Date Strings to Specific Date Formatting pattern

Below examples shows how to convert different dates strings from a single column to standard date string pattern using to_date() and date_format() function.


// Parsing Different Date Strings to Specific Date Formatting pattern
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07/24/2019"),("2019 Jul 24"),("07-27-19"))
    .toDF("Date").select(
    col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
      date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
      date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
      date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Output:


// Output:
+-----------+--------------+
|       Date| Formated Date|
+-----------+--------------+
| 2019-07-24|    07/24/2019|
| 07/24/2019|    07/24/2019|
|2019 Jul 24|    07/24/2019|
|   07-27-19|Unknown Format|
+-----------+--------------+

Conclusion:

In this article, you have learned how to use to_date() function which parses Date from String object to Spark DateType and date_format() function which formats Spark DateType to String formatting pattern. You can find more information about these functions at the following blog

Happy Learning !!

Exit mobile version