Pandas Filter DataFrame Rows by matching datetime (date) – To filter/select DataFrame rows by conditionally checking date use DataFrame.loc[]
and DataFrame.query().
In order to use these methods, the dates on DataFrame should be in Datetime format (datetime64
type), you can do this using pandas.to_datetime()
.
In this article, I will explain how to filter pandas DataFrame rows on dates by using the above methods also explain how to convert to date time in order to use these methods.
Key Points –
- Ensure that the column containing date data is in a consistent and recognizable format (e.g.,
datetime64[ns]
). - Use
pd.to_datetime()
to convert string representations of dates intodatetime
objects for easier filtering. - Apply logical operators (
>
,<
,>=
,<=
,==
,!=
) to filter date ranges. - Utilize
.loc[]
or boolean indexing to filter rows based on specific start and end dates. - Use functions like
pd.DateOffset
orpd.Timedelta
to dynamically adjust dates for filtering.
1. Quick Examples of Filter DataFrame Rows on Dates
If you are in a hurry, below are some quick examples of how to filter pandas DataFrame rows on dates.
# Quick examples of filter dataframe rows on dates
# Example 1: Filter rows by dates in pandas dataframe
df2 = df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]
# Example 2: Filter by a single date
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]
# Example 3: Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']
# Example 4: Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']
# Example 5: Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
# Example 6: Filter data for specific weekday
df2 = df.loc[df['Date'].dt.weekday == 2]
# Example 7: Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date < '2021-11-17'")
Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, Discount
, and Date
.
# Create a DataFrame
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000],
'Duration':['30days','50days','55days','40days','60days','35days','55days'],
'Discount':[1000,2300,1000,1200,2500,1300,1400],
'Date':["2020-08-14","2020-09-20","2020-10-16","2021-09-26","2021-10-08","2021-11-17","2021-11-29"]
})
df = pd.DataFrame(technologies)
# Convert the date to datetime64
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
print("Create DataFrame:\n",df)
Yields below output. When you create a DataFrame by default all non-numeric values are represented as objects. For Dates, you need to use pandas.to_datetime() to convert from String to Datetime.
2. Filter Rows by Dates in Pandas DataFrame
If you have already converted the string to a datetime format using pandas.to_datetime()
you can just use df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]
. If dates are not in datetime64
type, then this approach doesn’t work. Use df.dtypes
to get the data type of all columns.
# Filter rows by dates in pandas DataFrame
df2 = df[(df['Date'] > "2020-09-20") & (df['Date']< "2021-11-17")]
print("Filter rows by dates:\n",df2)
Yields below output.
3. Use pandas.Series.dt.strftime() to Filter DataFrame Rows on Dates
You can also use pandas.Series.dt.strftime()
to filder DataFrame rows by dates. Use df[df['Date'].dt.strftime('%Y-%m-%d')=="2021-10-08"]
method to filter rows by matching single date value. This returns all rows that match date column value with 2021-10-08
# Filter by single day
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]
print(df2)
# Output:
# Courses Fee Duration Discount Date
# 4 Pandas 26000 60days 2500 2021-10-08
Alternatively, you can use df[df['Date'].dt.strftime('%Y-%m')=='2021-11']
method to filter by month. This method is a valid and effective way to filter a Pandas DataFrame by a specific month. This approach works by using the strftime
method to format the datetime values in the 'Date' column to the year and month ("%Y-%m") and then comparing them to the specified month string.
# Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']
print(df2)
# Output:
# Courses Fee Duration Discount Date
# 5 Hadoop 25000 35days 1300 2021-11-17
# 6 Spark 25000 55days 1400 2021-11-29
Similarly, you can also use df[df['Date'].dt.strftime('%Y')=='2021']
method to filter by year. This function is a useful technique. This approach formats the datetime values in the 'Date' column to the year ("%Y") and then compares them to the specified year string.
# Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']
print(df2)
# Output:
# Courses Fee Duration Discount Date
# 3 Python 24000 40days 1200 2021-09-26
# 4 Pandas 26000 60days 2500 2021-10-08
# 5 Hadoop 25000 35days 1300 2021-11-17
# 6 Spark 25000 55days 1400 2021-11-29
4. Use DataFrame.loc[] Function to Filter DataFrame Rows on Dates
DataFrame.loc[]
is used to access a group of rows and columns of a DataFrame through labels or a boolean array. By using this also you can filter rows. For instance, df.loc[(df['Date']>='2020-09-20') & (df['Date']<'2021-11-08')]
returns only rows having between two dates.
# Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount Date
1 PySpark 25000 50days 2300 2020-09-20
2 Hadoop 23000 55days 1000 2020-10-16
3 Python 24000 40days 1200 2021-09-26
4 Pandas 26000 60days 2500 2021-10-08
5. Use DataFrame.loc[] & DateTimeIndex(dt)
Use the DataFrame.loc[]
& DateTimeIndex(dt)
to access separate date time attributes such as year, month, day, weekday, hours, minutes, seconds, microseconds etc.
This program is used df['Date'].dt.weekday
to extract the weekday information for each date in the 'Date' column. The condition == 2
filters the DataFrame for rows where the weekday is Tuesday. If you want to filter for a different weekday, you can change the 2
to the desired weekday (0 for Monday, 1 for Tuesday, and so on).
# Filter data for specific weekday
df2 = df.loc[df['Date'].dt.weekday == 2]
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount Date
5 Hadoop 25000 35days 1300 2021-11-17
6. Use DataFrame.query() to Filter DataFrame Rows on Dates
DataFrame.query() method is used to filter rows from pandas DataFrame, you can use this even to filter rows by dates. This method returns a DataFrame result from the provided query expression. As shown below, the condition inside DataFrame.query()
is to select the data with dates in greater than equal '2020-08-14'
and less than equal '2021-11-17'
(range of dates is specified).
# Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date <'2021-11-17'")
print(df2)
Yields below output.
# Output:
Courses Fee Duration Discount Date
0 Spark 22000 30days 1000 2020-08-14
1 PySpark 25000 50days 2300 2020-09-20
2 Hadoop 23000 55days 1000 2020-10-16
3 Python 24000 40days 1200 2021-09-26
4 Pandas 26000 60days 2500 2021-10-08
7. Complete Example For Filter DataFrame Rows on Dates
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000],
'Duration':['30days','50days','55days','40days','60days','35days','55days'],
'Discount':[1000,2300,1000,1200,2500,1300,1400],
'Date':["2020-08-14","2020-09-20","2020-10-16","2021-09-26","2021-10-08","2021-11-17","2021-11-29"]
})
df = pd.DataFrame(technologies)
# Convert the date to datetime64
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
print(df)
# Use pandas.to_datetime() to filter dates
df2 = df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]
print(df2)
# Filter by single day
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]
print(df2)
# Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']
print(df2)
# Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']
print(df2)
# Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
print(df2)
# Filter data for specific weekday
df2 = df.loc[df['Date'].dt.weekday == 2]
print(df2)
# Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date < '2021-11-17'")
print(df2)
Frequently Asked Questions on Pandas Filter DataFrame Rows on Dates
To use boolean indexing to filter rows based on a specific date. For example, if your DataFrame is named df
and you want to filter rows for a date '2023-01-01.
To filter rows for a specific range of dates in a DataFrame, you can use boolean indexing with the &
(and) operator.
If your date column is not in datetime format, you'll need to convert it before filtering based on dates. You can use the pd.to_datetime
function to achieve this. for example, the pd.to_datetime
function is used to convert the 'Date' column to datetime format. After this conversion, you can proceed to filter the DataFrame based on the date range using boolean indexing, as shown in the code.
To filter rows for a specific month or year, you can use the dt
accessor in Pandas, which is available for datetime columns.
You can filter rows based on weekdays or weekends using the dt.weekday
property of the datetime column. For example, filters the DataFrame for rows where the weekday component of the 'Date' column is less than 5, representing Monday to Friday.
To filter rows for the latest N dates in a DataFrame, you can use the nlargest
method. For example, the code uses the nlargest
method to retrieve the top N rows based on the 'Date' column in descending order. Adjust the column names and the value of n_latest
according to your specific DataFrame.
Conclusion
In this article, you have learned how to filter DataFrame rows on dates using pandas.to_datetime()
pandas.Series.dt.strftime()
, DataFrame.loc[]
and DataFrame.query()
function with more examples.
Happy Learning !!
Related Articles
- Pandas Series filter() Function
- Pandas filter by column value
- How to Use NOT IN Filter in Pandas
- Pandas Filter Rows Using IN Like SQL
- Pandas Convert Datetime to Seconds
- Pandas Find Row Values for Column Maximal
- Pandas Filter DataFrame by Multiple Conditions
- pandas DataFrame filter() – Usage & Examples
- Pandas Filter DataFrame by Substring criteria
- Apply Multiple Filters to Pandas DataFrame or Series
- Pandas Operator Chaining to Filter DataFrame Rows
- Pandas Filter Rows with NAN Value from DataFrame Column