You are currently viewing Spark SQL – Get Last Day of a Month

In this tutorial, I will show you a Spark SQL DataFrame example of how to retrieve the last day or end date of a month by using last_day() function and Scala language. We will also see how to get last_day of a month when the DataFrame column has different date formats.

Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.

Get Last Day From “yyyy-MM-dd” Format

Spark SQL provides last_day() function, which returns/get the last day of a month when the input Date is in yyyy-MM-dd format. For example, 2019-01-31 would be returned for input date 2019-01-25, where 31 is the last day in January month.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val data=Seq(("2019-01-01"),("2020-02-24"),("2019-02-24"),
      ("2019-05-01"),("2018-03-24"),("2007-12-19"))
val df=data.toDF("Date")
df.select(col("Date"),last_day(col("Date")).as("last_day"))
  .show()

Output:


+----------+----------+
|      Date|  last_day|
+----------+----------+
|2019-01-01|2019-01-31|
|2020-02-24|2020-02-29|
|2019-02-24|2019-02-28|
|2019-05-01|2019-05-31|
|2018-03-24|2018-03-31|
|2007-12-19|2007-12-31|
+----------+----------+

Get Last Day From Other Date Formats

In case, if your input Date is not in Spark SQL DateType yyyy-MM-dd format, then first you need to convert the Date String to DateType before you apply last_day() function.


import org.apache.spark.sql.functions._
val df2=Seq(("06-03-2009"),("07-24-2009")).toDF("Date")
df2.select(col("Date"),
        last_day(to_date(col("Date"),"MM-dd-yyyy")).as("last_day")
   ).show()

Output:


+----------+----------+
|      Date|  last_day|
+----------+----------+
|06-03-2009|2009-06-30|
|07-24-2009|2009-07-31|
+----------+----------+

Complete Code:


package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
object DateLastDay extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  val df=Seq(("2019-01-01"),("2020-02-24"),("2019-02-24"),
      ("2019-05-01"),("2018-03-24"),("2007-12-19"))
    .toDF("Date")
  df.select(col("Date"),last_day(col("Date")).as("last_day"))
    .show()


  val df2=Seq(("06-03-2009"),("07-24-2009")).toDF("Date")
  df2.select(col("Date"),
           last_day(to_date(col("Date"),"MM-dd-yyyy")).as("last_day")
     ).show()

}

Conclusion:

In this article, you have learned how to get the last day/end date of the month from Spark SQL Dataframe column which contains different Date string formats. Hope you like this article. You can find more information about these functions at the following blog

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium