Let’s see how to select/filter rows between two dates in Pandas DataFrame, in real-time applications you would often be required to select rows between two dates (similar to a greater than a start date and less than an end date), In pandas, you can do this in several ways, for example, using between()
, between_time()
, date_range()
e.t.c.
In this article, I will explain how to select Pandas DataFrame rows between two dates by using the boolean mask with the loc[]
attribute and DataFrame indexing. You can also use DataFrame.query()
, DataFrame.isin()
, and pandas.Series.between()
methods. These methods are used to select rows based on the date in Pandas.
Key Points –
- Convert the date column to datetime format using
pd.to_datetime()
if necessary. - Use boolean conditions to filter rows between two dates, e.g.,
(df['date'] >= start_date) & (df['date'] <= end_date)
. - Apply conditions within
df.loc[]
for flexible row and column filtering. - The
.query()
method can make date filtering syntax more concise. - If the date is the index, use
df.loc[start_date:end_date]
for direct slicing. df['date'].between(start_date, end_date)
provides a concise alternative for date filtering.
Quick Examples of Select Rows Between Two Dates
If you are in a hurry, below are some quick examples of how to select rows between two dates from DataFrame.
# Quick examples of select rows between two dates
# Example 1: Select DataFrame rows between two dates
mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date)
df2 = df.loc[mask]
# Example 2: Using pandas.DataFrame.loc[]
# To Filter Rows by Dates
df2 = df.loc[between_two_dates]
# Example 3: Using pandas.DataFrame.query()
# To select dataframe rows
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates >= @start_date and InsertedDates <= @end_date')
# Example 4: Select rows between two dates
# Using DataFrame.query()
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates > @start_date and InsertedDates < @end_date')
# Example 5: Pandas.Series.between() function
# Using two dates
df2 = df.loc[df["InsertedDates"].between("2021-11-16", "2021-11-18")]
# Example 6: Select DataFrame rows between two dates
# Using DataFrame.isin()
df2 = df[df["InsertedDates"].isin(pd.date_range("2021-11-15", "2021-11-17"))]
Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names Courses
, Fee
, Duration
,Discount
and Inserted Dates
.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000],
'Duration':['30days','50days','55days','40days','60days','35days','55days'],
'Discount':[1000,2300,1000,1200,2500,1300,1400],
'InsertedDates':["2021-11-14","2021-11-15","2021-11-16","2021-11-17","2021-11-18","2021-11-19","2021-11-20"]
})
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
# Output:
Courses Fee Duration Discount InsertedDates
0 Spark 22000 30days 1000 2021-11-14
1 PySpark 25000 50days 2300 2021-11-15
2 Hadoop 23000 55days 1000 2021-11-16
3 Python 24000 40days 1200 2021-11-17
4 Pandas 26000 60days 2500 2021-11-18
5 Hadoop 25000 35days 1300 2021-11-19
6 Spark 25000 55days 1400 2021-11-20
Select DataFrame Rows Between Two Dates
In order to select rows between two dates in Pandas DataFrame, first, create a boolean mask using mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date)
to represent the start and end of the date range. Then you select the DataFrame that lies within the range using the DataFrame.loc[]
method.
# Select DataFrame rows between two dates
df = pd.DataFrame(technologies)
start_date = '2021-11-15'
end_date = '2021-11-18'
# Select DataFrame rows between two dates
mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date)
df2 = df.loc[mask]
print("After selecting rows between the two dates:\n", df2)
Yields below output. This returns the rows greater and less than the date selected
# Output:
Courses Fee Duration Discount InsertedDates
2 Hadoop 23000 55days 1000 2021-11-16
3 Python 24000 40days 1200 2021-11-17
4 Pandas 26000 60days 2500 2021-11-18
Filter Rows by Dates Using DataFrame.loc[] Method
Use DataFrame.loc[]
method to return a boolean array with start_date
ane end_date
(date is in between 15th and 19th September) . Then the loc[]
function filters the rows only having between two dates.
# Filter rows by dates
# Using DataFrame.loc[] method
df = pd.DataFrame(technologies)
start_date = '2021-11-15'
end_date = '2021-11-19'
after_start_date = df["InsertedDates"] >= start_date
before_end_date = df["InsertedDates"] <= end_date
between_two_dates = after_start_date & before_end_date
# Using pandas.DataFrame.loc
# To filter rows by dates
df2 = df.loc[between_two_dates]
print("After selecting rows between the two dates:\n", df2)
Yields below output.
Select Rows Between Two Dates Using DataFrame.query()
You can also select DataFrame rows between two dates in Pandas using the pandas.DataFrame.query()
method. This method returns a DataFrame result from the provided query expression. For instance df.query('Dates >= @start_date and Dates <= @end_date')
.
# Using pandas.DataFrame.query()
# To select DataFrame rows
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates >= @start_date and InsertedDates <= @end_date')
print("After selecting the rows between the two dates:\n", df2)
Yields below output.
If you don’t want to include boundaries, just change the condition like below the following syntax: df.query('Dates > @start_date and Dates < @end_date')
.
# Select rows between two dates
# Using DataFrame.query()
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates > @start_date and InsertedDates < @end_date')
print("After selecting the rows between the two dates:\n", df2)
Yields below output.
# Output:
# After selecting the rows between the two dates:
Courses Fee Duration Discount InsertedDates
2 Hadoop 23000 55days 1000 2021-11-16
3 Python 24000 40days 1200 2021-11-17
Use pandas.Series.between() to Two Dates
You can use pandas.Series.between()
method to select DataFrame rows between two dates. This method returns a boolean vector representing whether a series element lies in the specified range or not. We have to pass the obtained boolean vector to the DataFrame.loc() method for extracting DataFrame.
# Pandas.Series.between() function Using two dates
df2 = df.loc[df["InsertedDates"].between("2021-11-16", "2021-11-18")]
print("After selecting the rows between the two dates:\n", df2)
Yields below output.
# Output:
# After selecting the rows between the two dates:
Courses Fees Duration Discount InsertedDates
2 Hadoop 23000 55days 1000 2021-11-16
3 Python 24000 40days 1200 2021-11-17
4 Pandas 26000 60days 2500 2021-11-18
Select DataFrame Rows Between Two Dates Using DataFrame.isin()
Use pandas.DataFrame.isin()
to filter DataFrame rows based on the date in Pandas. pandas.date_range()
returns a fixed DateTimeIndex. Its first and second parameters are starting and ending dates.
import pandas as pd
import numpy as np
import datetime
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"]
Fee =[22000,25000,23000,24000,26000,25000,25000]
Duration =['30days','50days','55days','40days','60days','35days','55days']
Discount =[1000,2300,1000,1200,2500,1300,1400]
InsertedDates =["2021-11-14","2021-11-15","2021-11-16","2021-11-17","2021-11-18","2021-11-19","2021-11-20"]
df = pd.DataFrame({"Courses":Courses,"Fees":Fee,"Duration":Duration,"Discount":Discount,'InsertedDates': pd.to_datetime(InsertedDates)})
# Select DataFrame rows between two dates using DataFrame.isin()
df2 = df[df["InsertedDates"].isin(pd.date_range("2021-11-15", "2021-11-17"))]
print("After selecting the rows between the two dates:\n", df2)
Yields below output.
# Output:
# After selecting the rows between the two dates:
Courses Fees Duration Discount InsertedDates
1 PySpark 25000 50days 2300 2021-11-15
2 Hadoop 23000 55days 1000 2021-11-16
3 Python 24000 40days 1200 2021-11-17
Complete Example For Select Rows Between Two Dates
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000],
'Duration':['30days','50days','55days','40days','60days','35days','55days'],
'Discount':[1000,2300,1000,1200,2500,1300,1400],
'InsertedDates':["2021-11-14","2021-11-15","2021-11-16","2021-11-17","2021-11-18","2021-11-19","2021-11-20"]
})
df = pd.DataFrame(technologies)
print(df)
# Select DataFrame rows between two dates
mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date)
df2 = df.loc[mask]
print(df2)
start_date = '2021-11-15'
end_date = '2021-11-19'
after_start_date = df["InsertedDates"] >= start_date
before_end_date = df["InsertedDates"] <= end_date
between_two_dates = after_start_date & before_end_date
# Using pandas.DataFrame.loc
# To filter rows by dates
df2 = df.loc[between_two_dates]
print(df2)
# Using pandas.DataFrame.query()
# To select DataFrame Rows
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates >= @start_date and InsertedDates <= @end_date')
print(df2)
# Select rows between two dates
# Using DataFrame.query()
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates > @start_date and InsertedDates < @end_date')
print(df2)
# Pandas.Series.between() function
# Using two dates
df2 = df.loc[df["InsertedDates"].between("2021-11-16", "2021-11-18")]
print(df2)
# Select DataFrame rows between two dates
# Using DataFrame.isin()
df2 = df[df["InsertedDates"].isin(pd.date_range("2021-11-15", "2021-11-17"))]
print(df2)
FAQ on Pandas Select DataFrame Rows Between Two Dates
To select rows between two dates, you can use boolean indexing with a condition that compares the datetime
column against the start and end dates.
It’s a good practice to ensure the column is in datetime
format before filtering. You can use pd.to_datetime()
to convert the column if it’s not already in datetime
format.
To select rows where the date is after a specific date in a Pandas DataFrame, you can use boolean indexing with the >
operator. This filters the DataFrame by checking if the date is greater than the specified date.
To select rows where the date is before a specific date in a Pandas DataFrame, you can use boolean indexing with the <
operator. This allows you to filter rows where the date is less than the specified date.
You can extract the year, month, or other components from a datetime
column using the .dt
accessor, and then filter based on that.
Conclusion
In this article, you have learned how to select DataFrame rows between two dates in pandas using DataFrame.loc[]
, DataFrame.query()
, DataFrame.isin()
and pandas.Series.between()
with more examples.
Happy Learning !!
Related Articles
- Pandas Convert String Column To DateTime
- Convert String to Float in Pandas DataFrame
- How to Format Pandas Datetime?
- Pandas Convert Column To DateTime
- Pandas DatetimeIndex Usage Explained
- Convert Pandas DatetimeIndex to String
- pandas Convert Datetime to Seconds
- Pandas Extract Year from Datetime
- Pandas Get Day, Month and Year from DateTime
- Pandas Extract Month and Year from Datetime
- Pandas Convert Integer to Datetime Type
- Pandas Convert Datetime to Date Column
- Pandas Convert Date (datetime) to String Format
- Pandas Convert Multiple Columns To DateTime Type