• Post author:
  • Post category:Pandas
  • Post last modified:October 4, 2024
  • Reading time:20 mins read
You are currently viewing Pandas Filter DataFrame Rows on Dates

Pandas Filter DataFrame Rows by matching datetime (date) – To filter/select DataFrame rows by conditionally checking date use DataFrame.loc[] and DataFrame.query(). In order to use these methods, the dates on DataFrame should be in Datetime format (datetime64 type), you can do this using pandas.to_datetime().

Advertisements

In this article, I will explain how to filter pandas DataFrame rows on dates by using the above methods also explain how to convert to date time in order to use these methods.

Key Points –

  • Ensure that the column containing date data is in a consistent and recognizable format (e.g., datetime64[ns]).
  • Use pd.to_datetime() to convert string representations of dates into datetime objects for easier filtering.
  • Apply logical operators (>, <, >=, <=, ==, !=) to filter date ranges.
  • Utilize .loc[] or boolean indexing to filter rows based on specific start and end dates.
  • Use functions like pd.DateOffset or pd.Timedelta to dynamically adjust dates for filtering.

1. Quick Examples of Filter DataFrame Rows on Dates

If you are in a hurry, below are some quick examples of how to filter pandas DataFrame rows on dates.


# Quick examples of filter dataframe rows on dates

# Example 1: Filter rows by dates in pandas dataframe
df2 = df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]

# Example 2: Filter by a single date
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]

# Example 3: Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']

# Example 4: Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']

# Example 5: Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
      
# Example 6: Filter data for specific weekday 
df2 = df.loc[df['Date'].dt.weekday == 2]

# Example 7: Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date < '2021-11-17'")

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses, Fee, Duration, Discount, and Date.


# Create a DataFrame
import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000],
    'Duration':['30days','50days','55days','40days','60days','35days','55days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400],
    'Date':["2020-08-14","2020-09-20","2020-10-16","2021-09-26","2021-10-08","2021-11-17","2021-11-29"]
               })
df = pd.DataFrame(technologies)

# Convert the date to datetime64
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
print("Create DataFrame:\n",df)

Yields below output. When you create a DataFrame by default all non-numeric values are represented as objects. For Dates, you need to use pandas.to_datetime() to convert from String to Datetime.

pandas filter dates

2. Filter Rows by Dates in Pandas DataFrame

If you have already converted the string to a datetime format using pandas.to_datetime() you can just use df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]. If dates are not in datetime64 type, then this approach doesn’t work. Use df.dtypes to get the data type of all columns.


# Filter rows by dates in pandas DataFrame
df2 = df[(df['Date'] > "2020-09-20") & (df['Date']< "2021-11-17")]
print("Filter rows by dates:\n",df2)

Yields below output.

pandas filter dates

3. Use pandas.Series.dt.strftime() to Filter DataFrame Rows on Dates

You can also use pandas.Series.dt.strftime() to filder DataFrame rows by dates. Use df[df['Date'].dt.strftime('%Y-%m-%d')=="2021-10-08"] method to filter rows by matching single date value. This returns all rows that match date column value with 2021-10-08


# Filter by single day
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]
print(df2)

# Output:
#     Courses    Fee Duration  Discount   Date
#  4  Pandas  26000   60days      2500 2021-10-08

Alternatively, you can use df[df['Date'].dt.strftime('%Y-%m')=='2021-11'] method to filter by month. This method is a valid and effective way to filter a Pandas DataFrame by a specific month. This approach works by using the strftime method to format the datetime values in the 'Date' column to the year and month ("%Y-%m") and then comparing them to the specified month string.


# Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']
print(df2)

# Output:
#    Courses    Fee Duration  Discount    Date
# 5  Hadoop  25000   35days      1300 2021-11-17
# 6   Spark  25000   55days      1400 2021-11-29

Similarly, you can also use df[df['Date'].dt.strftime('%Y')=='2021'] method to filter by year. This function is a useful technique. This approach formats the datetime values in the 'Date' column to the year ("%Y") and then compares them to the specified year string.


# Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']
print(df2)

# Output:
#     Courses    Fee Duration  Discount    Date
#  3  Python  24000   40days      1200 2021-09-26
#  4  Pandas  26000   60days      2500 2021-10-08
#  5  Hadoop  25000   35days      1300 2021-11-17
#  6   Spark  25000   55days      1400 2021-11-29

4. Use DataFrame.loc[] Function to Filter DataFrame Rows on Dates

DataFrame.loc[] is used to access a group of rows and columns of a DataFrame through labels or a boolean array. By using this also you can filter rows. For instance, df.loc[(df['Date']>='2020-09-20') & (df['Date']<'2021-11-08')] returns only rows having between two dates.


# Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
print(df2)

Yields below output.


# Output:
   Courses    Fee Duration  Discount       Date
1  PySpark  25000   50days      2300 2020-09-20
2   Hadoop  23000   55days      1000 2020-10-16
3   Python  24000   40days      1200 2021-09-26
4   Pandas  26000   60days      2500 2021-10-08

5. Use DataFrame.loc[] & DateTimeIndex(dt)

Use the DataFrame.loc[] & DateTimeIndex(dt) to access separate date time attributes such as year, month, day, weekday, hours, minutes, seconds, microseconds etc.

This program is used df['Date'].dt.weekday to extract the weekday information for each date in the 'Date' column. The condition == 2 filters the DataFrame for rows where the weekday is Tuesday. If you want to filter for a different weekday, you can change the 2 to the desired weekday (0 for Monday, 1 for Tuesday, and so on).


# Filter data for specific weekday 
df2 = df.loc[df['Date'].dt.weekday == 2]
print(df2)

Yields below output.


# Output:
  Courses    Fee Duration  Discount       Date
5  Hadoop  25000   35days      1300 2021-11-17

6. Use DataFrame.query() to Filter DataFrame Rows on Dates

DataFrame.query() method is used to filter rows from pandas DataFrame, you can use this even to filter rows by dates. This method returns a DataFrame result from the provided query expression. As shown below, the condition inside DataFrame.query() is to select the data with dates in greater than equal '2020-08-14' and less than equal '2021-11-17'(range of dates is specified).


# Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date <'2021-11-17'")
print(df2) 

Yields below output.


# Output:
   Courses    Fee Duration  Discount       Date
0    Spark  22000   30days      1000 2020-08-14
1  PySpark  25000   50days      2300 2020-09-20
2   Hadoop  23000   55days      1000 2020-10-16
3   Python  24000   40days      1200 2021-09-26
4   Pandas  26000   60days      2500 2021-10-08

7. Complete Example For Filter DataFrame Rows on Dates


import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000],
    'Duration':['30days','50days','55days','40days','60days','35days','55days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400],
    'Date':["2020-08-14","2020-09-20","2020-10-16","2021-09-26","2021-10-08","2021-11-17","2021-11-29"]
               })
df = pd.DataFrame(technologies)
# Convert the date to datetime64
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
print(df)

# Use pandas.to_datetime() to filter dates
df2 = df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]
print(df2)

# Filter by single day
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]
print(df2)

# Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']
print(df2)

# Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']
print(df2)

# Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
print(df2)
      
# Filter data for specific weekday 
df2 = df.loc[df['Date'].dt.weekday == 2]
print(df2)

# Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date < '2021-11-17'")
print(df2) 

Frequently Asked Questions on Pandas Filter DataFrame Rows on Dates

How do I filter a DataFrame based on a specific date?

To use boolean indexing to filter rows based on a specific date. For example, if your DataFrame is named df and you want to filter rows for a date '2023-01-01.

How can I filter rows for a specific range of dates?

To filter rows for a specific range of dates in a DataFrame, you can use boolean indexing with the & (and) operator.

What if my date column is not in the datetime format?

If your date column is not in datetime format, you'll need to convert it before filtering based on dates. You can use the pd.to_datetime function to achieve this. for example, the pd.to_datetime function is used to convert the 'Date' column to datetime format. After this conversion, you can proceed to filter the DataFrame based on the date range using boolean indexing, as shown in the code.

How do I filter rows for a specific month or year?

To filter rows for a specific month or year, you can use the dt accessor in Pandas, which is available for datetime columns.

Can I filter rows for weekdays or weekends?

You can filter rows based on weekdays or weekends using the dt.weekday property of the datetime column. For example, filters the DataFrame for rows where the weekday component of the 'Date' column is less than 5, representing Monday to Friday.

How can I filter rows for the latest N dates?

To filter rows for the latest N dates in a DataFrame, you can use the nlargest method. For example, the code uses the nlargest method to retrieve the top N rows based on the 'Date' column in descending order. Adjust the column names and the value of n_latest according to your specific DataFrame.

Conclusion

In this article, you have learned how to filter DataFrame rows on dates using pandas.to_datetime() pandas.Series.dt.strftime(), DataFrame.loc[] and DataFrame.query() function with more examples.

Happy Learning !!

References