Parse different date formats from a column

In this tutorial, we will show you a Spark SQL example of how to format different date formats from a single column to a standard date format using Scala language and Spark SQL Date and Time functions. In order to use Spark date functions, Date string should comply with Spark DateType format which is ‘yyyy-MM-dd’ .

Using Date in “yyyy-MM-dd” format

Since below dates are in a different format, let’s convert these to DateType by using to_date() function.


Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),
   ("2019 Jul 24"),("2019 July 24 Wed"),("07-27-19"))
    .toDF("Date").select(col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
           to_date(col("Date"),"yyyy-MM-dd"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
           to_date(col("Date"),"yyyy MM dd"))
    . when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
           to_date(col("Date"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
           to_date(col("Date"),"yyyy MMMM dd"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
           to_date(col("Date"),"yyyy MMMM dd E"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Outputs :


+----------------+--------------+
|            Date| Formated Date|
+----------------+--------------+
|      2019-07-24|    2019-07-24|
|      2019 07 24|    2019-07-24|
|      07/24/2019|    2019-07-24|
|     2019 Jul 24|    2019-07-24|
|2019 July 24 Wed|    2019-07-24|
|        07-27-19|Unknown Format|
+----------------+--------------+

Using Date from other formats

In case, if you want to convert to another date format, use date_format. Below converts all dates into “MM/dd/yyyy” format.


Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),
("2019 Jul 24"),("2019 July 24 Wed"),("07-27-19"))
    .toDF("Date").select(col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
         date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
         date_format(to_date(col("Date"),"yyyy MM dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
         date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
         date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
         date_format(to_date(col("Date"),"yyyy MMMM dd E"),"MM/dd/yyyy"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Outputs:


+----------------+--------------+
|            Date| Formated Date|
+----------------+--------------+
|      2019-07-24|    07/24/2019|
|      2019 07 24|    07/24/2019|
|      07/24/2019|    07/24/2019|
|     2019 Jul 24|    07/24/2019|
|2019 July 24 Wed|    07/24/2019|
|        07-27-19|Unknown Format|
+----------------+--------------+ 

Complete Code:


package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{to_date, _}

object DateFormat extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExample")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

 Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),
("2019 Jul 24"),("2019 July 24 Wed"),("07-27-19"))
.toDF("Date").select( col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
          to_date(col("Date"),"yyyy-MM-dd"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
          to_date(col("Date"),"yyyy MM dd"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
          to_date(col("Date"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
          to_date(col("Date"),"yyyy MMMM dd"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
          to_date(col("Date"),"yyyy MMMM dd E"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

  Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),("2019 Jul 24"),
            ("2019 July 24 Wed"),("07-27-19"))
    .toDF("Date").select(col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
          date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
          date_format(to_date(col("Date"),"yyyy MM dd"),"MM/dd/yyyy"))
    . when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
          date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
          date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
          date_format(to_date(col("Date"),"yyyy MMMM dd E"),"MM/dd/yyyy"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

}

Conclusion:

In this article, we have learned how to convert dates from a string and format Spark DateType to different formats. Also, we have learned how to format different date formats from a single column to a standard date format using Scala language and Spark SQL Date and Time functions.

Happy Learning !!

References:

Spark SQL DataFrame | convert string to date | Format Date to String

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

This Post Has One Comment

You are currently viewing Parse different date formats from a column
Spark SQL DataFrame | convert string to date | Format Date to String