In this tutorial, we will show you a Spark SQL DataFrame example of how to get the current system date-time, formatting Spark Date to a String date pattern and parsing String pattern to Spark DateType using Scala language and Spark SQL Date and Time functions.
- Current Date and Timestamp
- Format Spark DateType to String formatting pattern
- Parsing Date from String object to Spark DateType
- Format Different Date Strings to Spark DateType
- Parsing Different Date Strings to Specific Date Formatting pattern
1. Current Date and Timestamp
Spark provides current_date()
function to get the current system date in DateType
‘yyyy-MM-dd’ format and current_timestamp()
to get current timestamp in `yyyy-MM-dd HH:mm:ss.SSSS` format.
import org.apache.spark.sql.functions._
Seq(1).toDF("seq").select(
current_date().as("current_date"), //returns date in `yyyy-MM-dd'
current_timestamp().as("current_time") //returns date and time in `yyyy-MM-dd HH:mm:ss.SSSS`
).show(false)
Output:
// Output:
+------------+-----------------------+
|current_date|current_time |
+------------+-----------------------+
|2019-07-24 |2019-07-24 20:50:08.667|
+------------+-----------------------+
2. Format Spark DateType to String formatting pattern
Let’s see how to format the Spark DateType column to String formatting pattern using the Spark date function date_format()
. The function support all Date patterns described in Java DateTimeFormatter.
//
import org.apache.spark.sql.functions._
Seq(1).toDF("seq").select(
current_date().as("current_date"),
date_format(current_timestamp(),"yyyy MM dd").as("yyyy MM dd"),
date_format(current_timestamp(),"MM/dd/yyyy hh:mm").as("MM/dd/yyyy"),
date_format(current_timestamp(),"yyyy MMM dd").as("yyyy MMMM dd"),
date_format(current_timestamp(),"yyyy MMMM dd E").as("yyyy MMMM dd E")
).show(false)
Output:
// Output:
+------------+----------+----------------+------------+----------------+
|current_date|yyyy MM dd|MM/dd/yyyy |yyyy MMMM dd|yyyy MMMM dd E |
+------------+----------+----------------+------------+----------------+
|2019-07-24 |2019 07 24|07/24/2019 08:50|2019 Jul 24 |2019 July 24 Wed|
+------------+----------+----------------+------------+----------------+
3. Parsing Date from String object to Spark DateType
Spark Dataframe API also provides date function to_date()
which parses Date from String object and converts to Spark DateType format. when dates are in ‘yyyy-MM-dd’ format, spark function auto-cast to DateType by casting rules. When dates are not in specified format this function returns null.
// Parsing Date from String object to Spark DateType
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07-24-2009")).toDF("Date").select(
col("Date"),
to_date(col("Date")).as("to_date")
).show()
Output:
// Output:
+----------+----------+
| Date| to_date|
+----------+----------+
|2019-07-24|2019-07-24|
|07-24-2009| null|
+----------+----------+
However, Spark defines overloaded function of to_date()
which takes the patterns of the Date string as an additional parameter. Using this, we should able to convert any date string to Spark DateType.
import org.apache.spark.sql.functions._
Seq(("06-03-2009"),("07-24-2009")).toDF("Date").select(
col("Date"),
to_date(col("Date"),"MM-dd-yyyy").as("to_date")
).show()
Output:
// Output:
+----------+----------+
| Date| to_date|
+----------+----------+
|06-03-2009|2009-06-03|
|07-24-2009|2009-07-24|
+----------+----------+
4. Format Different Date Strings to Spark DateType
Below examples shows how to convert different dates from a single column to Spark DateType.
// Format Different Date Strings to Spark DateType
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07/24/2019"),("2019 Jul 24"),("07-27-19"))
.toDF("Date").select(
col("Date"),
when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
to_date(col("Date"),"yyyy-MM-dd"))
.when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
to_date(col("Date"),"MM/dd/yyyy"))
.when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
to_date(col("Date"),"yyyy MMMM dd"))
.otherwise("Unknown Format").as("Formated Date")
).show()
Output:
// Output:
+-----------+--------------+
| Date| Formated Date|
+-----------+--------------+
| 2019-07-24| 2019-07-24|
| 07/24/2019| 2019-07-24|
|2019 Jul 24| 2019-07-24|
| 07-27-19|Unknown Format|
+-----------+--------------+
5. Parsing Different Date Strings to Specific Date Formatting pattern
Below examples shows how to convert different dates strings from a single column to standard date string pattern using to_date()
and date_format()
function.
// Parsing Different Date Strings to Specific Date Formatting pattern
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07/24/2019"),("2019 Jul 24"),("07-27-19"))
.toDF("Date").select(
col("Date"),
when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
.when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
.when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
.otherwise("Unknown Format").as("Formated Date")
).show()
Output:
// Output:
+-----------+--------------+
| Date| Formated Date|
+-----------+--------------+
| 2019-07-24| 07/24/2019|
| 07/24/2019| 07/24/2019|
|2019 Jul 24| 07/24/2019|
| 07-27-19|Unknown Format|
+-----------+--------------+
Conclusion:
In this article, you have learned how to use to_date() function which parses Date from String object to Spark DateType and date_format() function which formats Spark DateType to String formatting pattern. You can find more information about these functions at the following blog
Happy Learning !!
Thanks for the feedback. will update the articles.
Your walkthroughs would be way easier if you mentioned the imported libraries