Site icon Spark By {Examples}

Parse different date formats from a column

Spark SQL DataFrame | convert string to date | Format Date to String

Spark SQL DataFrame | convert string to date | Format Date to String

In this tutorial, we will show you a Spark SQL example of how to format different date formats from a single column to a standard date format using Scala language and Spark SQL Date and Time functions. In order to use Spark date functions, Date string should comply with Spark DateType format which is ‘yyyy-MM-dd’ .

1. Using Date in “yyyy-MM-dd” format

Since below dates are in a different format, let’s convert these to DateType by using to_date() function.


// Using Date in “yyyy-MM-dd” format 
Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),
   ("2019 Jul 24"),("2019 July 24 Wed"),("07-27-19"))
    .toDF("Date").select(col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
           to_date(col("Date"),"yyyy-MM-dd"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
           to_date(col("Date"),"yyyy MM dd"))
    . when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
           to_date(col("Date"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
           to_date(col("Date"),"yyyy MMMM dd"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
           to_date(col("Date"),"yyyy MMMM dd E"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Outputs :


// Output:

+----------------+--------------+
|            Date| Formated Date|
+----------------+--------------+
|      2019-07-24|    2019-07-24|
|      2019 07 24|    2019-07-24|
|      07/24/2019|    2019-07-24|
|     2019 Jul 24|    2019-07-24|
|2019 July 24 Wed|    2019-07-24|
|        07-27-19|Unknown Format|
+----------------+--------------+

2. Using Date from other formats

In case, if you want to convert to another date format, use date_format. Below converts all dates into “MM/dd/yyyy” format.


// Using Date from other formats
Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),
("2019 Jul 24"),("2019 July 24 Wed"),("07-27-19"))
    .toDF("Date").select(col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
         date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
         date_format(to_date(col("Date"),"yyyy MM dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
         date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
         date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
         date_format(to_date(col("Date"),"yyyy MMMM dd E"),"MM/dd/yyyy"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Outputs:


// Output:

+----------------+--------------+
|            Date| Formated Date|
+----------------+--------------+
|      2019-07-24|    07/24/2019|
|      2019 07 24|    07/24/2019|
|      07/24/2019|    07/24/2019|
|     2019 Jul 24|    07/24/2019|
|2019 July 24 Wed|    07/24/2019|
|        07-27-19|Unknown Format|
+----------------+--------------+ 

2.1 Complete Code:


// Complete Code:
package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{to_date, _}

object DateFormat extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExample")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

 Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),
("2019 Jul 24"),("2019 July 24 Wed"),("07-27-19"))
.toDF("Date").select( col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
          to_date(col("Date"),"yyyy-MM-dd"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
          to_date(col("Date"),"yyyy MM dd"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
          to_date(col("Date"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
          to_date(col("Date"),"yyyy MMMM dd"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
          to_date(col("Date"),"yyyy MMMM dd E"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

  Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),("2019 Jul 24"),
            ("2019 July 24 Wed"),("07-27-19"))
    .toDF("Date").select(col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
          date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
          date_format(to_date(col("Date"),"yyyy MM dd"),"MM/dd/yyyy"))
    . when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
          date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
          date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
          date_format(to_date(col("Date"),"yyyy MMMM dd E"),"MM/dd/yyyy"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

}

Conclusion:

In this article, we have learned how to convert dates from a string and format Spark DateType to different formats. Also, we have learned how to format different date formats from a single column to a standard date format using Scala language and Spark SQL Date and Time functions.

Happy Learning !!

References:

Exit mobile version