Pandas Filter DataFrame Rows by matching datetime (date) – To filter/select DataFrame rows by conditionally checking date use DataFrame.loc[]
and DataFrame.query().
In order to use these methods, the dates on DataFrame should be in Datetime format (datetime64
type), you can do this using pandas.to_datetime()
.
In this article, I will explain how to filter pandas DataFrame rows on dates by using the above methods also explain how to convert to date time in order to use these methods.
1. Quick Examples of Filter DataFrame Rows on Dates
If you are in a hurry, below are some quick examples of how to filter pandas DataFrame rows on dates.
# Below are quick example
# Filter Rows by Dates in pandas DataFrame
df2 = df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]
# Filter by a single date
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]
# Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']
# Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']
# Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
# Filter data for specific weekday
df2 = df.loc[df['Date'].dt.weekday == 2]
# Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date < '2021-11-17'")
Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, Discount
, and Date
.
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000],
'Duration':['30days','50days','55days','40days','60days','35days','55days'],
'Discount':[1000,2300,1000,1200,2500,1300,1400],
'Date':["2020-08-14","2020-09-20","2020-10-16","2021-09-26","2021-10-08","2021-11-17","2021-11-29"]
})
df = pd.DataFrame(technologies)
# Convert the date to datetime64
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
print(df)
Yields below output. When you create a DataFrame by default all non-numeric values are represented as objects. For Dates, you need to use pandas.to_datetime() to convert from String to Datetime.
Courses Fee Duration Discount Date
0 Spark 22000 30days 1000 2020-08-14
1 PySpark 25000 50days 2300 2020-09-20
2 Hadoop 23000 55days 1000 2020-10-16
3 Python 24000 40days 1200 2021-09-26
4 Pandas 26000 60days 2500 2021-10-08
5 Hadoop 25000 35days 1300 2021-11-17
6 Spark 25000 55days 1400 2021-11-29
2. Filter Rows by Dates in pandas DataFrame
If you have already converted the string to a datetime format using pandas.to_datetime()
you can just use df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]
. If dates are not in datetime64
type, then this approach doesn’t work. Use df.dtypes
to get the data type of all columns.
# Filter Rows by Dates in pandas DataFrame
df2 = df[(df['Date'] > "2020-09-20") & (df['Date']< "2021-11-17")]
print(df2)
Yields below output.
Courses Fee Duration Discount Date
2 Hadoop 23000 55days 1000 2020-10-16
3 Python 24000 40days 1200 2021-09-26
4 Pandas 26000 60days 2500 2021-10-08
3. Use pandas.Series.dt.strftime() to Filter DataFrame Rows on Dates
You can also use pandas.Series.dt.strftime() to filder dataframe rows by dates. Use df[df['Date'].dt.strftime('%Y-%m-%d')=="2021-10-08"]
method to filter rows by matching single date value. This returns all rows that match date column value with 2021-10-08
# Filter by single day
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]
print(df2)
# Output:
# Courses Fee Duration Discount Date
# 4 Pandas 26000 60days 2500 2021-10-08
You can use df[df['Date'].dt.strftime('%Y-%m')=='2021-11']
method to filter by month.
# Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']
print(df2)
# Output:
# Courses Fee Duration Discount Date
# 5 Hadoop 25000 35days 1300 2021-11-17
# 6 Spark 25000 55days 1400 2021-11-29
You can also use df[df['Date'].dt.strftime('%Y')=='2021']
method to filter by year.
# Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']
print(df2)
# Output:
# Courses Fee Duration Discount Date
# 3 Python 24000 40days 1200 2021-09-26
# 4 Pandas 26000 60days 2500 2021-10-08
# 5 Hadoop 25000 35days 1300 2021-11-17
# 6 Spark 25000 55days 1400 2021-11-29
4. Use DataFrame.loc[] Function to Filter DataFrame Rows on Dates
DataFrame.loc[]
is used to access a group of rows and columns of a DataFrame through labels or a boolean array. By using this also you can filter rows. For instance, df.loc[(df['Date']>='2020-09-20') & (df['Date']<'2021-11-08')]
returns only rows having between two dates.
# Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
print(df2)
Yields below output.
Courses Fee Duration Discount Date
1 PySpark 25000 50days 2300 2020-09-20
2 Hadoop 23000 55days 1000 2020-10-16
3 Python 24000 40days 1200 2021-09-26
4 Pandas 26000 60days 2500 2021-10-08
5. Use DataFrame.loc[] & DateTimeIndex(dt)
Use the DataFrame.loc[]
& DateTimeIndex(dt)
to access separate date time attributes such as year, month, day, weekday, hours, minutes, seconds, microseconds etc.
# Filter data for specific weekday
df2 = df.loc[df['Date'].dt.weekday == 2]
print(df2)
Yields below output.
Courses Fee Duration Discount Date
5 Hadoop 25000 35days 1300 2021-11-17
6. Use DataFrame.query() to Filter DataFrame Rows on Dates
DataFrame.query() method is used to filter rows from pandas DataFrame, you can use this even to filter rows by dates. This method returns a DataFrame result from the provided query expression. As shown below, the condition inside DataFrame.query()
is to select the data with dates in greater than equal '2020-08-14'
and less than equal '2021-11-17'
(range of dates is specified).
# Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date <'2021-11-17'")
print(df2)
Yields below output.
Courses Fee Duration Discount Date
0 Spark 22000 30days 1000 2020-08-14
1 PySpark 25000 50days 2300 2020-09-20
2 Hadoop 23000 55days 1000 2020-10-16
3 Python 24000 40days 1200 2021-09-26
4 Pandas 26000 60days 2500 2021-10-08
7. Complete Example For Filter DataFrame Rows on Dates
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
'Fee' :[22000,25000,23000,24000,26000,25000,25000],
'Duration':['30days','50days','55days','40days','60days','35days','55days'],
'Discount':[1000,2300,1000,1200,2500,1300,1400],
'Date':["2020-08-14","2020-09-20","2020-10-16","2021-09-26","2021-10-08","2021-11-17","2021-11-29"]
})
df = pd.DataFrame(technologies)
# Convert the date to datetime64
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
print(df)
# Use pandas.to_datetime() to filter dates
df2 = df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]
print(df2)
# Filter by single day
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]
print(df2)
# Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']
print(df2)
# Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']
print(df2)
# Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
print(df2)
# Filter data for specific weekday
df2 = df.loc[df['Date'].dt.weekday == 2]
print(df2)
# Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date < '2021-11-17'")
print(df2)
Conclusion
In this article, you have learned how to filter DataFrame rows on dates using pandas.to_datetime()
pandas.Series.dt.strftime()
, DataFrame.loc[]
and DataFrame.query()
function with more examples.
Happy Learning !!
Related Articles
- Add Constant Column to Pandas DataFrame
- Sum Pandas DataFrame Columns With Examples
- Create Pandas DataFrame With Working Examples
- Select Pandas DataFrame Rows Between Two Dates
- Pandas Convert String Column To DateTime
- Pandas Extract Year from Datetime
- Pandas Extract Column Value Based on Another Column
- Pandas Extract Month and Year from Datetime
- Pandas Filter DataFrame by Multiple Conditions