Site icon Spark By {Examples}

Spark SQL – Get Last Day of a Month

spark dataframe last day month

In this tutorial, I will show you a Spark SQL DataFrame example of how to retrieve the last day or end date of a month by using last_day() function and Scala language. We will also see how to get last_day of a month when the DataFrame column has different date formats.

Refer to Spark SQL Date and Timestamp Functions for all Date & Time functions.

Get Last Day From “yyyy-MM-dd” Format

Spark SQL provides last_day() function, which returns/get the last day of a month when the input Date is in yyyy-MM-dd format. For example, 2019-01-31 would be returned for input date 2019-01-25, where 31 is the last day in January month.


import spark.sqlContext.implicits._
import org.apache.spark.sql.functions._
val data=Seq(("2019-01-01"),("2020-02-24"),("2019-02-24"),
      ("2019-05-01"),("2018-03-24"),("2007-12-19"))
val df=data.toDF("Date")
df.select(col("Date"),last_day(col("Date")).as("last_day"))
  .show()

Output:


+----------+----------+
|      Date|  last_day|
+----------+----------+
|2019-01-01|2019-01-31|
|2020-02-24|2020-02-29|
|2019-02-24|2019-02-28|
|2019-05-01|2019-05-31|
|2018-03-24|2018-03-31|
|2007-12-19|2007-12-31|
+----------+----------+

Get Last Day From Other Date Formats

In case, if your input Date is not in Spark SQL DateType yyyy-MM-dd format, then first you need to convert the Date String to DateType before you apply last_day() function.


import org.apache.spark.sql.functions._
val df2=Seq(("06-03-2009"),("07-24-2009")).toDF("Date")
df2.select(col("Date"),
        last_day(to_date(col("Date"),"MM-dd-yyyy")).as("last_day")
   ).show()

Output:


+----------+----------+
|      Date|  last_day|
+----------+----------+
|06-03-2009|2009-06-30|
|07-24-2009|2009-07-31|
+----------+----------+

Complete Code:


package com.sparkbyexamples.spark.dataframe.functions

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
object DateLastDay extends App {

  val spark:SparkSession = SparkSession.builder()
    .master("local")
    .appName("SparkByExamples.com")
    .getOrCreate()
  spark.sparkContext.setLogLevel("ERROR")

  import spark.sqlContext.implicits._

  val df=Seq(("2019-01-01"),("2020-02-24"),("2019-02-24"),
      ("2019-05-01"),("2018-03-24"),("2007-12-19"))
    .toDF("Date")
  df.select(col("Date"),last_day(col("Date")).as("last_day"))
    .show()


  val df2=Seq(("06-03-2009"),("07-24-2009")).toDF("Date")
  df2.select(col("Date"),
           last_day(to_date(col("Date"),"MM-dd-yyyy")).as("last_day")
     ).show()

}

Conclusion:

In this article, you have learned how to get the last day/end date of the month from Spark SQL Dataframe column which contains different Date string formats. Hope you like this article. You can find more information about these functions at the following blog

Happy Learning !!

Exit mobile version