How to Create a PySpark DataFrame with a Timestamp Column for a Date Range? You can use several built-in PySpark SQL functions like sequence(), explode(), and to_date()to create a PySpark DataFrame with a timestamp column.
PySpark provides a rich set of Date and Timestamp functions that work seamlessly on DataFrames and in SQL queries, similar to traditional SQL syntax. Date and time operations are essential in PySpark, especially if you’re using it for ETL processes where handling time-series data is common.
Most of these functions accept input as DateType, TimestampType, or StringType. When using strings, they must be in a format that can be cast to a valid date or timestamp.
This capability is valuable for generating dummy time-series data, building time-based reports, or testing ETL pipelines, all of which require creating continuous ranges of dates or timestamps in a DataFrame.
In this article, you’ll learn how to create a PySpark DataFrame with a timestamp column containing a continuous range of dates.
Key Points-
- PySpark supports two methods: SQL functions (
sequence(),explode()) andpandas.date_range(). - The PySpark-native method keeps the workflow distributed and scalable.
pandas.date_range()is perfect for quick prototyping but processes dates locally.sequence()creates a continuous list of dates.explode()transforms date arrays into individual rows.to_date()ensures string dates are converted to DateType for use withsequence().- Casting dates to timestamps enables compatibility with time-based operations.
pandas.date_range()returns Python datetime objects for easy conversion.- Use
spark.createDataFrame()withTimestampType()to convert pandas dates to PySpark. - Choose PySpark-native for production ETL, and
pandasfor small-scale testing or data generation.
Initialize SparkSession
First, start your Spark session
# Initialize the SparkSession
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("CreateDateRangeDF") \
.getOrCreate()
Create a DataFrame with Start and End Date columns
You can start a Spark session and build a PySpark DataFrame that includes the given start and end dates as columns, then display the resulting DataFrame, which contains a single row with these dates.
# Create DataFrame with start and end date
from pyspark.sql.functions import expr, to_date
from pyspark.sql.types import TimestampType
spark = SparkSession.builder \
.appName("CreateDateRangeDF") \
.getOrCreate()
start_date = '2025-7-1'
end_date = '2025-7-10'
df = spark.createDataFrame([(start_date, end_date)], ["start_date", "end_date"])
df.show()
Convert the Single Column with a Range of Dates
You can use the sequence() function inside selectExpr() together with explode() to create one row for each date in a range. By passing the start and end dates to sequence(), you generate a continuous sequence of dates, which you can then expand into individual rows using explode().
# Convert the Single Column with a Range of Dates
df = df.selectExpr("explode(sequence(to_date(start_date), to_date(end_date))) as date")
df.show()
Yields below the output.
Convert PySpark Date Column to Timestamp
You can use the cast() function inside select() or selectExpr() to convert a date column to a timestamp column. By casting the date as a timestamp, the resulting DataFrame contains complete date-time values, including hours, minutes, and seconds.
# Convert PySpark Date Column to Timestamp
df = df.select(expr("cast(date as timestamp)").alias("timestamp"))
df.show()
Yields below the output.

Create Date Range with pandas and Convert to PySpark DataFrame
Alternatively, you can use pandas to generate a continuous range of dates between the specified start and end dates. You can implement this approach by following these steps.
Generate a Range of Dates using pandas
You can define the start and end dates as strings, then use pandas’ date_range() to generate a continuous sequence of dates between them. This creates a DatetimeIndex containing each date in the specified range.
# Create Date Range with pandas
# Imports
from pyspark.sql import SparkSession
from pyspark.sql.types import TimestampType
import pandas as pd
spark = SparkSession.builder.appName("CreateDFWithTimestamp").getOrCreate()
# Define the start and end dates for the time period
start_date = '2025-7-1'
end_date = '2025-7-10'
# Generate a range of dates using pandas
dates = pd.date_range(start=start_date, end=end_date)
print(dates)
# Output:
# DatetimeIndex(['2025-07-01', '2025-07-02', '2025-07-03', '2025-07-04',
# '2025-07-05', '2025-07-06', '2025-07-07', '2025-07-08',
# '2025-07-09', '2025-07-10'],
# dtype='datetime64[ns]', freq='D')
Convert the Dates to Python Datetime objects
You can convert each date in the DatetimeIndex to a native Python datetime object by using a list comprehension along with to_pydatetime(). This creates a list of datetime objects, each representing one date at midnight. You can print this list, you’ll see the complete set of datetime values covering the entire date range.
# Convert the dates to Python datetime objects
datetimes = [date.to_pydatetime() for date in dates]
print(datetimes)
# Output:
# [datetime.datetime(2025, 7, 1, 0, 0), datetime.datetime(2025, 7, 2, 0, 0), datetime.datetime(2025, 7, 3, 0, # 0), datetime.datetime(2025, 7, 4, 0, 0), datetime.datetime(2025, 7, 5, 0, 0), datetime.datetime(2025, 7, 6, # 0, 0), datetime.datetime(2025, 7, 7, 0, 0), datetime.datetime(2025, 7, 8, 0, 0), datetime.datetime(2025, 7, 9, # 0, 0), datetime.datetime(2025, 7, 10, 0, 0)]
Create a PySpark DataFrame with a Timestamp Column
Finally, you can create a PySpark DataFrame from the list of Python datetime objects by passing it to createDataFrame() along with TimestampType() to specify the timestamp data type. Since you didn’t specify a column name, PySpark creates a single-column DataFrame named value by default, containing the timestamps.
# Create a PySpark DataFrame from the list of datetime objects
df = spark.createDataFrame(datetimes, TimestampType())
# Show the DataFrame
df.show()
# Output:
# +-------------------+
# | value|
# +-------------------+
# |2025-07-01 00:00:00|
# |2025-07-02 00:00:00|
# |2025-07-03 00:00:00|
# |2025-07-04 00:00:00|
# |2025-07-05 00:00:00|
# |2025-07-06 00:00:00|
# |2025-07-07 00:00:00|
# |2025-07-08 00:00:00|
# |2025-07-09 00:00:00|
# |2025-07-10 00:00:00|
+-------------------+
Frequently ASked Questions of Create a PySpark DataFrame with a Timestamp
You can use the sequence() function along with explode() in a SQL expression to generate a continuous series of dates between a start and end date.
The to_date() function converts string dates to DateType so that sequence() can correctly generate a list of dates. Without it, the function wouldn’t work with plain string values.
explode() takes the array of dates produced by sequence() and transforms it into multiple rows, with one row for each date in the range.
Casting to TimestampType allows you to perform time-based calculations or join with other timestamp columns. This is especially useful if your downstream tasks require time precision beyond just dates.
ou can use PySpark’s DataFrame API functions like explode() and sequence() directly with select(), but using selectExpr() with SQL expressions often makes the code more concise and readable.
Conclusion
In this article, you explored two effective methods for creating a PySpark DataFrame with a timestamp column covering a continuous range of dates. The PySpark-native approach using sequence() and explode() is ideal for scalable, production-grade ETL pipelines, while the pandas-based method is great for quickly generating small datasets for testing or prototyping.
By using these techniques, you can easily prepare time-series data for analysis, validation, or reporting in your PySpark workflows, and handle both quick experiments and large-scale data processing confidently.
Happy Learning!!
Related Articles
- PySpark SQL – Working with Unix Time | Timestamp
- PySpark SQL Types (DataType) with Examples
- PySpark SQL – How to Get Current Date & Timestamp
- PySpark SQL – Date and Timestamp Functions
- PySpark SQL – Convert Date to String Format
- PySpark SQL – Convert String to Date Format
- PySpark – Difference between two dates (days, months, years)
- PySpark Timestamp Difference (seconds, minutes, hours)
- PySpark – How to Get Current Date & Timestamp
- PySpark – Convert String to Timestamp type