In this tutorial, we will show you a Spark SQL DataFrame example of how to get the current system date-time, formatting Spark Date to a String date pattern and parsing String pattern to Spark DateType using Scala language and Spark SQL Date and Time functions.
- Current Date and Timestamp
- Format Spark DateType to String formatting pattern
- Parsing Date from String object to Spark DateType
- Format Different Date Strings to Spark DateType
- Parsing Different Date Strings to Specific Date Formatting pattern
1. Current Date and Timestamp
Spark provides current_date()
function to get the current system date in DateType
‘yyyy-MM-dd’ format and current_timestamp()
to get current timestamp in `yyyy-MM-dd HH:mm:ss.SSSS` format.
import org.apache.spark.sql.functions._
Seq(1).toDF("seq").select(
current_date().as("current_date"), //returns date in `yyyy-MM-dd'
current_timestamp().as("current_time") //returns date and time in `yyyy-MM-dd HH:mm:ss.SSSS`
).show(false)
Output:
// Output:
+------------+-----------------------+
|current_date|current_time |
+------------+-----------------------+
|2019-07-24 |2019-07-24 20:50:08.667|
+------------+-----------------------+
2. Format Spark DateType to String formatting pattern
Let’s see how to format the Spark DateType column to String formatting pattern using the Spark date function date_format()
. The function support all Date patterns described in Java DateTimeFormatter.
//
import org.apache.spark.sql.functions._
Seq(1).toDF("seq").select(
current_date().as("current_date"),
date_format(current_timestamp(),"yyyy MM dd").as("yyyy MM dd"),
date_format(current_timestamp(),"MM/dd/yyyy hh:mm").as("MM/dd/yyyy"),
date_format(current_timestamp(),"yyyy MMM dd").as("yyyy MMMM dd"),
date_format(current_timestamp(),"yyyy MMMM dd E").as("yyyy MMMM dd E")
).show(false)
Output:
// Output:
+------------+----------+----------------+------------+----------------+
|current_date|yyyy MM dd|MM/dd/yyyy |yyyy MMMM dd|yyyy MMMM dd E |
+------------+----------+----------------+------------+----------------+
|2019-07-24 |2019 07 24|07/24/2019 08:50|2019 Jul 24 |2019 July 24 Wed|
+------------+----------+----------------+------------+----------------+
3. Parsing Date from String object to Spark DateType
Spark Dataframe API also provides date function to_date()
which parses Date from String object and converts to Spark DateType format. when dates are in ‘yyyy-MM-dd’ format, spark function auto-cast to DateType by casting rules. When dates are not in specified format this function returns null.
// Parsing Date from String object to Spark DateType
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07-24-2009")).toDF("Date").select(
col("Date"),
to_date(col("Date")).as("to_date")
).show()
Output:
// Output:
+----------+----------+
| Date| to_date|
+----------+----------+
|2019-07-24|2019-07-24|
|07-24-2009| null|
+----------+----------+
However, Spark defines overloaded function of to_date()
which takes the patterns of the Date string as an additional parameter. Using this, we should able to convert any date string to Spark DateType.
import org.apache.spark.sql.functions._
Seq(("06-03-2009"),("07-24-2009")).toDF("Date").select(
col("Date"),
to_date(col("Date"),"MM-dd-yyyy").as("to_date")
).show()
Output:
// Output:
+----------+----------+
| Date| to_date|
+----------+----------+
|06-03-2009|2009-06-03|
|07-24-2009|2009-07-24|
+----------+----------+
4. Format Different Date Strings to Spark DateType
Below examples shows how to convert different dates from a single column to Spark DateType.
// Format Different Date Strings to Spark DateType
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07/24/2019"),("2019 Jul 24"),("07-27-19"))
.toDF("Date").select(
col("Date"),
when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
to_date(col("Date"),"yyyy-MM-dd"))
.when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
to_date(col("Date"),"MM/dd/yyyy"))
.when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
to_date(col("Date"),"yyyy MMMM dd"))
.otherwise("Unknown Format").as("Formated Date")
).show()
Output:
// Output:
+-----------+--------------+
| Date| Formated Date|
+-----------+--------------+
| 2019-07-24| 2019-07-24|
| 07/24/2019| 2019-07-24|
|2019 Jul 24| 2019-07-24|
| 07-27-19|Unknown Format|
+-----------+--------------+
5. Parsing Different Date Strings to Specific Date Formatting pattern
Below examples shows how to convert different dates strings from a single column to standard date string pattern using to_date()
and date_format()
function.
// Parsing Different Date Strings to Specific Date Formatting pattern
import org.apache.spark.sql.functions._
Seq(("2019-07-24"),("07/24/2019"),("2019 Jul 24"),("07-27-19"))
.toDF("Date").select(
col("Date"),
when(to_date(col("Date"),"yyyy-MM-dd").isNotNull,
date_format(to_date(col("Date"),"yyyy-MM-dd"),"MM/dd/yyyy"))
.when(to_date(col("Date"),"MM/dd/yyyy").isNotNull,
date_format(to_date(col("Date"),"MM/dd/yyyy"),"MM/dd/yyyy"))
.when(to_date(col("Date"),"yyyy MMMM dd").isNotNull,
date_format(to_date(col("Date"),"yyyy MMMM dd"),"MM/dd/yyyy"))
.otherwise("Unknown Format").as("Formated Date")
).show()
Output:
// Output:
+-----------+--------------+
| Date| Formated Date|
+-----------+--------------+
| 2019-07-24| 07/24/2019|
| 07/24/2019| 07/24/2019|
|2019 Jul 24| 07/24/2019|
| 07-27-19|Unknown Format|
+-----------+--------------+
Conclusion:
In this article, you have learned how to use to_date() function which parses Date from String object to Spark DateType and date_format() function which formats Spark DateType to String formatting pattern. You can find more information about these functions at the following blog
Happy Learning !!