You are currently viewing Spark – Get a Day of Year and Week of the Year

Problem: How to get a day of year and week of the year in numbers from the Spark DataFrame date and timestamp column?

Solution: Using the Spark SQL date_format() function along with date formatting patterns, we can extract a day of the year and week of the year from a Date & Timestamp columns. Let’s see with some examples in Scala and Pyspark.

Related: Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.

We can get the unit of the date by using the String formatting pattern using the Spark date function date_format(). The function support all Date patterns described in Java DateTimeFormatter.

Before we start with examples, first let’s Create a Spark DataFrame with few timestamp columns.


val spark: SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
import org.apache.spark.sql.functions._
import spark.sqlContext.implicits._

val data = Seq(("2019-01-03 12:01:19.000"),
    ("2019-02-01 12:01:19.000"),
    ("2019-7-16 16:44:55.406"),
    ("2019-11-16 16:50:59.406"))
val df=data.toDF("input_timestamp")

1. Spark Get Day of the Year from Date column

As mentioned earlier we use Date formatting patterns to get the Day of the Year from Date or Timestamp type columns.

D – date formatting pattern is used to extract the day of the year. for example date ‘2019-02-01’ return 32.


import org.apache.spark.sql.functions._
val df = Seq(("2019-01-03 12:01:19.000"),
    ("2019-02-01 12:01:19.000"),
    ("2019-7-16 16:44:55.406"),
    ("2019-11-16 16:50:59.406")).toDF("input_timestamp")

df.withColumn("input_timestamp",
    to_timestamp(col("input_timestamp")))
  .withColumn("day_of_year", date_format(col("input_timestamp"), "D"))
  .show(false)

This snippet yields below output.


// Output:
+-----------------------+-----------+
|input_timestamp        |day_of_year|
+-----------------------+-----------+
|2019-01-03 12:01:19    |3          |
|2019-02-01 12:01:19    |32         |
|2019-07-16 16:44:55.406|197        |
|2019-11-16 16:50:59.406|320        |
+-----------------------+-----------+

2. Spark Get Week of the Year from Date column

Here we will see an example of date formatting pattern to extract the week of the year in number form the date and timestamp columns. for example, this returns 1 for the first week of the year and 52 for the last week of the year.

w – date formatting pattern is used to get a week of the year for example date ‘2019-01-03’ returns 1 as it is the first week in the 2019 year


import org.apache.spark.sql.functions._
df.withColumn("input_timestamp",
    to_timestamp(col("input_timestamp")))
  .withColumn("week_of_year", date_format(col("input_timestamp"), "w"))
  .show(false)

This snippet, yields below output.


// Output:
+-----------------------+------------+
|input_timestamp        |week_of_year|
+-----------------------+------------+
|2019-01-03 12:01:19    |1           |
|2019-02-01 12:01:19    |5           |
|2019-07-16 16:44:55.406|29          |
|2019-11-16 16:50:59.406|46          |
+-----------------------+------------+

On the above examples, withColumn() is to used to add a new column to the DataFrame.

3. Complete Example

Below is a complete example of how to get a day of year and week of the year, In order to run this, use your favorite IDE with scala and spark setup.


package com.sparkbyexamples.spark.dataframe.functions.datetime

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{col, date_format, to_timestamp}

object DayAndWeekOfYear extends App {

  val spark: SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  val df = Seq(("2019-01-03 12:01:19.000"),
    ("2019-02-01 12:01:19.000"),
    ("2019-7-16 16:44:55.406"),
    ("2019-11-16 16:50:59.406")).toDF("input_timestamp")

  // Get Day of the Year example
  df.withColumn("input_timestamp",
    to_timestamp(col("input_timestamp")))
    .withColumn("day_of_year", date_format(col("input_timestamp"), "D"))
    .show(false)

  // Get Week of the Year example
  df.withColumn("input_timestamp",
    to_timestamp(col("input_timestamp")))
    .withColumn("week_of_year", date_format(col("input_timestamp"), "w"))
    .show(false)
}

This complete code can be downloaded from GitHub project.

Happy Learning

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium