Site icon Spark By {Examples}

Spark SQL – Truncate Date Time by unit specified

spark truncate date time

In this tutorial, we will show you a Spark SQL DataFrame example of how to truncate Date and Time of a column using trunc() and date_trunc() functions and Scala language. We will also see how to use truncate when the DataFrame column has different date formats.

In this tutorial, we will show you a Spark SQL DataFrame example of how to truncate Date and Time of a DataFrame column using trunc() and date_trunc() functions with Scala language.

Truncating Date using trunc() Spark SQL function

Spark SQL DateFrame functions provide trunc() function to truncate Date at Year and Month units and returns Date in Spark DateType format “yyyy-MM-dd”. Note that Day doesn’t support by trunc() function and it returns null when used.


Seq("1").toDF("date").select(
    current_timestamp(),
    trunc(current_timestamp(),"Year").as("Year"),
    trunc(current_timestamp(),"Month").as("Month"),
    trunc(current_timestamp(),"Day").as("Day") // Not available so returns null
  ).show(false)

Output:


+-----------------------+----------+----------+----+
|current_timestamp()    |Year      |Month     |Day |
+-----------------------+----------+----------+----+
|2019-07-27 16:15:09.435|2019-01-01|2019-07-01|null|
+-----------------------+----------+----------+----+

Truncating Date and Time using date_trunc() Spark SQL function

Spark Dateframe SQL functions provide another truncate function date_trunc() to truncate at Year, Month, Day, Hour, Minute and Seconds units and returns Date in Spark DateType format “yyyy-MM-dd HH:mm:ss.SSSS”.


Seq("1").toDF("date").select(
    current_timestamp(),
    date_trunc("Year",current_timestamp()).as("Year"),
    date_trunc("Month",current_timestamp()).as("Month"),
    date_trunc("Day",current_timestamp()).as("Day"),
    date_trunc("Hour",current_timestamp()).as("Hour"),
    date_trunc("Minute",current_timestamp()).as("Minute")
  ).show(false)

Output:


+-----------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
|current_timestamp()    |Year               |Month              |Day                |Hour               |Minute             |
+-----------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
|2019-07-27 16:15:09.759|2019-01-01 00:00:00|2019-07-01 00:00:00|2019-07-27 00:00:00|2019-07-27 16:00:00|2019-07-27 16:15:00|
+-----------------------+-------------------+-------------------+-------------------+-------------------+-------------------+

Truncate when DataFrame column is in DateType format

Below is another example of trunc() function when Spark Date column has just Date without Time.


 Seq(("2019-01-23"),("2019-06-24"),("2019-09-20")).toDF("date").select(
    col("date"),
    trunc(col("date"),"Month").as("Month"),
    trunc(col("date"),"Year").as("Year")
  ).show()

Output:


+----------+----------+----------+
|      date|     Month|      Year|
+----------+----------+----------+
|2019-01-23|2019-01-01|2019-01-01|
|2019-06-24|2019-06-01|2019-01-01|
|2019-09-20|2019-09-01|2019-01-01|
+----------+----------+----------+

Truncate when Dataframe column, not in DateType format

In case, if your input Date is not in Spark DateType “yyyy-MM-dd” format, then first we should convert the Date String to DateType before we apply trunc() and date_trunc() function.


Seq(("01-23-2019"),("06-24-2019"),("09-20-2019")).toDF("date").select(
    col("date"),
    trunc(to_date(col("date"),"MM-dd-yyyy"),"Month").as("Month"),
    trunc(to_date(col("date"),"MM-dd-yyyy"),"Year").as("Year")
  ).show()

Output:


+----------+----------+----------+
|      date|     Month|      Year|
+----------+----------+----------+
|01-23-2019|2019-01-01|2019-01-01|
|06-24-2019|2019-06-01|2019-01-01|
|09-20-2019|2019-09-01|2019-01-01|
+----------+----------+----------+

Conclusion:

In this article, you have learned Spark SQL DataFrame example of how to truncate Date and Time of a column using Scala language and Date and Time functions. Also, you have learned the difference between trunc() and date_trunc() functions. You can find more information about these functions at the following blog

Happy Learning !!

Exit mobile version