PySpark SQL provides current_date() and current_timestamp() functions which return the system current date (without timestamp) and the current timestamp respectively, Let’s see how to get these with examples.
current_date() – function return current system date without time in PySpark DateType
which is in format yyyy-MM-dd
.
current_timestamp() – function returns current system date & timestamp in PySpark TimestampType
which is in format yyyy-MM-dd HH:mm:ss.SSS
Note that I’ve used PySpark wihtColumn() to add new columns to the DataFrame
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.getOrCreate()
data=[["1"]]
df=spark.createDataFrame(data,["id"])
from pyspark.sql.functions import *
#current_date() & current_timestamp()
df.withColumn("current_date",current_date()) \
.withColumn("current_timestamp",current_timestamp()) \
.show(truncate=False)
Yields below output
+---+------------+-----------------------+
|id |current_date|current_timestamp |
+---+------------+-----------------------+
|1 |2021-02-23 |2021-02-23 08:19:18.036|
+---+------------+-----------------------+
PySpark SQL- Get Current Date & Timestamp
If you are using SQL, you can also get current Date and Timestamp using.
spark.sql("select current_date(), current_timestamp()")
.show(truncate=False)
Now see how to format the current date & timestamp into a custom format using date patterns. PySpark supports all patterns supports on Java DateTimeFormatter.
This example converts the date to MM-dd-yyyy
using date_format()
function and timestamp to MM-dd-yyyy HH mm ss SSS
using to_timestamp()
.
df.withColumn("date_format",date_format(current_date(),"MM-dd-yyyy")) \
.withColumn("to_timestamp",to_timestamp(current_timestamp(),"MM-dd-yyyy HH mm ss SSS")) \
.show(truncate=False)
#SQL
spark.sql("select date_format(current_date(),'MM-dd-yyyy') as date_format ," + \
"to_timestamp(current_timestamp(),'MM-dd-yyyy HH mm ss SSS') as to_timestamp") \
.show(truncate=False)
For DataFrame example, it yields below output.
+---+-----------+-----------------------+
|id |date_format|to_timestamp |
+---+-----------+-----------------------+
|1 |02-23-2021 |2021-02-23 08:26:22.794|
+---+-----------+-----------------------+
Complete Example for reference
from pyspark.sql import SparkSession
# Create SparkSession
spark = SparkSession.builder \
.appName('SparkByExamples.com') \
.getOrCreate()
data=[["1"]]
df=spark.createDataFrame(data,["id"])
from pyspark.sql.functions import *
#current_date() & current_timestamp()
df.withColumn("current_date",current_date()) \
.withColumn("current_timestamp",current_timestamp()) \
.show(truncate=False)
#SQL
spark.sql("select current_date(), current_timestamp()") \
.show(truncate=False)
# Date & Timestamp into custom format
df.withColumn("date_format",date_format(current_date(),"MM-dd-yyyy")) \
.withColumn("to_timestamp",to_timestamp(current_timestamp(),"MM-dd-yyyy HH mm ss SSS")) \
.show(truncate=False)
#SQL
spark.sql("select date_format(current_date(),'MM-dd-yyyy') as date_format ," + \
"to_timestamp(current_timestamp(),'MM-dd-yyyy HH mm ss SSS') as to_timestamp") \
.show(truncate=False)
The complete code can be downloaded from GitHub project
Happy Learning !!
Related Articles
- PySpark SQL – Working with Unix Time
- PySpark SQL – Date and Timestamp Functions
- PySpark SQL – Convert Date to String Format
- PySpark SQL – Convert String to Date Format
- PySpark SQL – Convert Timestamp to Date
- PySpark Get Number of Rows and Columns
- PySpark Groupby on Multiple Columns
- PySpark count() – Different Methods Explained
- PySpark Count of Non null, nan Values in DataFrame