Problem: How to get a day of the week and week of the month from the Spark DataFrame Date and Timestamp column?
Solution: Using Spark SQL date_format() function, we can get any date & time unit from a Date & Timestamp columns, let’s see with some examples in Scala.
Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.
We can get the unit of the date by using the String formatting pattern using the Spark date function date_format(). This function supports all Date patterns described in Java DateTimeFormatter. for example
Before we start with examples, first let’s create a DataFrame to work with.
import org.apache.spark.sql.functions._
val spark: SparkSession = SparkSession.builder()
.master("local")
.appName("SparkByExamples.com")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.sqlContext.implicits._
val data = Seq(("2019-07-01 12:01:19.000"),
("2019-06-24 12:01:19.000"),
("2019-11-16 16:44:55.406"),
("2019-11-16 16:50:59.406"))
val df=data.toDF("input_timestamp")
Get Day of the week from Date and Timestamp columns
This example uses Date formatting patterns to extracts Day of the week from the Spark date and timestamp DataFrame columns and the value would be between 1 to 7 where 1 is for Monday and 7 is for Sunday.
u – date formatting pattern is used to get the day of the week in a number. for example 1 for Monday and 7 for Sunday
E – date formatting pattern is used to get the day of the week in 3 characters for example ‘Mon’ for Monday.
import org.apache.spark.sql.functions._
df.withColumn("input_timestamp",
to_timestamp(col("input_timestamp")))
.withColumn("week_day_number", date_format(col("input_timestamp"), "u"))
.withColumn("week_day_abb", date_format(col("input_timestamp"), "E"))
.show(false)
Yields below output
+-----------------------+---------------+------------+
|input_timestamp |week_day_number|week_day_abb|
+-----------------------+---------------+------------+
|2019-07-01 12:01:19 |1 |Mon |
|2019-06-24 12:01:19 |1 |Mon |
|2019-11-16 16:44:55.406|6 |Sat |
|2019-11-16 16:50:59.406|6 |Sat |
+-----------------------+---------------+------------+
Get Week of the Month from Date and Timestamp columns
This example extracts the week of the month from Spark Date and Timestamp DataFrame columns, possible return values would be 0 to 5.
W – date formatting pattern is used to get the week number of the month
EEEE – date formatting pattern is used to get the full week name for example Monday.
import org.apache.spark.sql.functions._
df.withColumn("input_timestamp",
to_timestamp(col("input_timestamp")))
.withColumn("week_day_full", date_format(col("input_timestamp"), "EEEE"))
.withColumn("week_of_month", date_format(col("input_timestamp"), "W"))
.show(false)
Yields below output.
+-----------------------+-------------+-------------+
|input_timestamp |week_day_full|week_of_month|
+-----------------------+-------------+-------------+
|2019-07-01 12:01:19 |Monday |1 |
|2019-06-24 12:01:19 |Monday |5 |
|2019-11-16 16:44:55.406|Saturday |3 |
|2019-11-16 16:50:59.406|Saturday |3 |
+-----------------------+-------------+-------------+
This complete example is available for download from GitHub
Complete Example of Day of Week & Week of the Month
package com.sparkbyexamples.spark.dataframe.functions.datetime
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{col, to_timestamp,date_format}
object DayWeekAndWeekMonth extends App {
val spark: SparkSession = SparkSession.builder()
.master("local")
.appName("SparkByExamples.com")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import spark.sqlContext.implicits._
val df = Seq(("2019-07-01 12:01:19.000"),
("2019-06-24 12:01:19.000"),
("2019-11-16 16:44:55.406"),
("2019-11-16 16:50:59.406")).toDF("input_timestamp")
df.withColumn("input_timestamp",
to_timestamp(col("input_timestamp")))
.withColumn("week_day_number", date_format(col("input_timestamp"), "u"))
.withColumn("week_day_abb", date_format(col("input_timestamp"), "E"))
.show(false)
df.withColumn("input_timestamp",
to_timestamp(col("input_timestamp")))
.withColumn("week_day_full", date_format(col("input_timestamp"), "EEEE"))
.withColumn("week_of_month", date_format(col("input_timestamp"), "W"))
.show(false)
}
Happy Learning !!
format of “W” doesn’t seem to work with spark 3.0 upgrade, at least not in Databricks.
How will you find a weeknumber from date column? If starting should be from Sunday instead of Monday(default)
Weekofyear function calculate weeknumber starting from Monday.