You are currently viewing Parse different date formats from a column
Spark SQL DataFrame | convert string to date | Format Date to String

In this tutorial, we will show you a Spark SQL example of how to format different date formats from a single column to a standard date format using Scala language and Spark SQL Date and Time functions. In order to use Spark date functions, Date string should comply with Spark DateType format which is ‘yyyy-MM-dd’ .

Advertisements

1. Using Date in “yyyy-MM-dd” format

Since below dates are in a different format, let’s convert these to DateType by using to_date() function.


// Using Date in “yyyy-MM-dd” format 
Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),
   ("2019 Jul 24"),("2019 July 24 Wed"),("07-27-19"))
    .toDF("Date").select(col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
           to_date(col("Date"),"yyyy-MM-dd"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
           to_date(col("Date"),"yyyy MM dd"))
    . when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
           to_date(col("Date"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
           to_date(col("Date"),"yyyy MMMM dd"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
           to_date(col("Date"),"yyyy MMMM dd E"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Outputs :


// Output:

+----------------+--------------+
|            Date| Formated Date|
+----------------+--------------+
|      2019-07-24|    2019-07-24|
|      2019 07 24|    2019-07-24|
|      07/24/2019|    2019-07-24|
|     2019 Jul 24|    2019-07-24|
|2019 July 24 Wed|    2019-07-24|
|        07-27-19|Unknown Format|
+----------------+--------------+

2. Using Date from other formats

In case, if you want to convert to another date format, use date_format. Below converts all dates into “MM/dd/yyyy” format.


// Using Date from other formats
Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),
("2019 Jul 24"),("2019 July 24 Wed"),("07-27-19"))
    .toDF("Date").select(col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
         date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
         date_format(to_date(col("Date"),"yyyy MM dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
         date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
         date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
         date_format(to_date(col("Date"),"yyyy MMMM dd E"),"MM/dd/yyyy"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

Outputs:


// Output:

+----------------+--------------+
|            Date| Formated Date|
+----------------+--------------+
|      2019-07-24|    07/24/2019|
|      2019 07 24|    07/24/2019|
|      07/24/2019|    07/24/2019|
|     2019 Jul 24|    07/24/2019|
|2019 July 24 Wed|    07/24/2019|
|        07-27-19|Unknown Format|
+----------------+--------------+ 

2.1 Complete Code:


// Complete Code:
package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{to_date, _}

object DateFormat extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExample")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

 Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),
("2019 Jul 24"),("2019 July 24 Wed"),("07-27-19"))
.toDF("Date").select( col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
          to_date(col("Date"),"yyyy-MM-dd"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
          to_date(col("Date"),"yyyy MM dd"))
    .when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
          to_date(col("Date"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
          to_date(col("Date"),"yyyy MMMM dd"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
          to_date(col("Date"),"yyyy MMMM dd E"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

  Seq(("2019-07-24"),("2019 07 24"),("07/24/2019"),("2019 Jul 24"),
            ("2019 July 24 Wed"),("07-27-19"))
    .toDF("Date").select(col("Date"),
    when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
          date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MM dd").isNotNull,
          date_format(to_date(col("Date"),"yyyy MM dd"),"MM/dd/yyyy"))
    . when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
          date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
          date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
    .when(to_date(col("Date"),"yyyy MMMM dd E").isNotNull,
          date_format(to_date(col("Date"),"yyyy MMMM dd E"),"MM/dd/yyyy"))
    .otherwise("Unknown Format").as("Formated Date")
  ).show()

}

Conclusion:

In this article, we have learned how to convert dates from a string and format Spark DateType to different formats. Also, we have learned how to format different date formats from a single column to a standard date format using Scala language and Spark SQL Date and Time functions.

Happy Learning !!

References:

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has One Comment

Comments are closed.