Pandas Filter DataFrame Rows on Dates

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:January 31, 2023
Spread the love

Pandas Filter DataFrame Rows by matching datetime (date) – To filter/select DataFrame rows by conditionally checking date use DataFrame.loc[] and DataFrame.query(). In order to use these methods, the dates on DataFrame should be in Datetime format (datetime64 type), you can do this using pandas.to_datetime().

In this article, I will explain how to filter pandas DataFrame rows on dates by using the above methods also explain how to convert to date time in order to use these methods.

1. Quick Examples of Filter DataFrame Rows on Dates

If you are in a hurry, below are some quick examples of how to filter pandas DataFrame rows on dates.


# Below are quick example
# Filter Rows by Dates in pandas DataFrame
df2 = df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]

# Filter by a single date
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]

# Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']

# Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']

# Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
      
# Filter data for specific weekday 
df2 = df.loc[df['Date'].dt.weekday == 2]

# Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date < '2021-11-17'")

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses, Fee, Duration, Discount, and Date.


import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000],
    'Duration':['30days','50days','55days','40days','60days','35days','55days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400],
    'Date':["2020-08-14","2020-09-20","2020-10-16","2021-09-26","2021-10-08","2021-11-17","2021-11-29"]
               })
df = pd.DataFrame(technologies)
# Convert the date to datetime64
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
print(df)

Yields below output. When you create a DataFrame by default all non-numeric values are represented as objects. For Dates, you need to use pandas.to_datetime() to convert from String to Datetime.


   Courses    Fee Duration  Discount       Date
0    Spark  22000   30days     1000   2020-08-14
1  PySpark  25000   50days     2300   2020-09-20
2   Hadoop  23000   55days     1000   2020-10-16
3   Python  24000   40days     1200   2021-09-26
4   Pandas  26000   60days     2500   2021-10-08
5   Hadoop  25000   35days     1300   2021-11-17
6    Spark  25000   55days     1400   2021-11-29

2. Filter Rows by Dates in pandas DataFrame

If you have already converted the string to a datetime format using pandas.to_datetime() you can just use df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]. If dates are not in datetime64 type, then this approach doesn’t work. Use df.dtypes to get the data type of all columns.


# Filter Rows by Dates in pandas DataFrame
df2 = df[(df['Date'] > "2020-09-20") & (df['Date']< "2021-11-17")]
print(df2)

Yields below output.


  Courses    Fee Duration  Discount       Date
2  Hadoop  23000   55days      1000 2020-10-16
3  Python  24000   40days      1200 2021-09-26
4  Pandas  26000   60days      2500 2021-10-08

3. Use pandas.Series.dt.strftime() to Filter DataFrame Rows on Dates

You can also use pandas.Series.dt.strftime() to filder dataframe rows by dates. Use df[df['Date'].dt.strftime('%Y-%m-%d')=="2021-10-08"] method to filter rows by matching single date value. This returns all rows that match date column value with 2021-10-08


# Filter by single day
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]
print(df2)

# Output:
#     Courses    Fee Duration  Discount   Date
#  4  Pandas  26000   60days      2500 2021-10-08

You can use df[df['Date'].dt.strftime('%Y-%m')=='2021-11'] method to filter by month.


# Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']
print(df2)

# Output:
#    Courses    Fee Duration  Discount    Date
# 5  Hadoop  25000   35days      1300 2021-11-17
# 6   Spark  25000   55days      1400 2021-11-29

You can also use df[df['Date'].dt.strftime('%Y')=='2021'] method to filter by year.


# Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']
print(df2)

# Output:
#     Courses    Fee Duration  Discount    Date
#  3  Python  24000   40days      1200 2021-09-26
#  4  Pandas  26000   60days      2500 2021-10-08
#  5  Hadoop  25000   35days      1300 2021-11-17
#  6   Spark  25000   55days      1400 2021-11-29

4. Use DataFrame.loc[] Function to Filter DataFrame Rows on Dates

DataFrame.loc[] is used to access a group of rows and columns of a DataFrame through labels or a boolean array. By using this also you can filter rows. For instance, df.loc[(df['Date']>='2020-09-20') & (df['Date']<'2021-11-08')] returns only rows having between two dates.


# Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
print(df2)

Yields below output.


   Courses    Fee Duration  Discount       Date
1  PySpark  25000   50days      2300 2020-09-20
2   Hadoop  23000   55days      1000 2020-10-16
3   Python  24000   40days      1200 2021-09-26
4   Pandas  26000   60days      2500 2021-10-08

5. Use DataFrame.loc[] & DateTimeIndex(dt)

Use the DataFrame.loc[] & DateTimeIndex(dt) to access separate date time attributes such as year, month, day, weekday, hours, minutes, seconds, microseconds etc.


# Filter data for specific weekday 
df2 = df.loc[df['Date'].dt.weekday == 2]
print(df2)

Yields below output.


  Courses    Fee Duration  Discount       Date
5  Hadoop  25000   35days      1300 2021-11-17

6. Use DataFrame.query() to Filter DataFrame Rows on Dates

DataFrame.query() method is used to filter rows from pandas DataFrame, you can use this even to filter rows by dates. This method returns a DataFrame result from the provided query expression. As shown below, the condition inside DataFrame.query() is to select the data with dates in greater than equal '2020-08-14' and less than equal '2021-11-17'(range of dates is specified).


# Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date <'2021-11-17'")
print(df2) 

Yields below output.


   Courses    Fee Duration  Discount       Date
0    Spark  22000   30days      1000 2020-08-14
1  PySpark  25000   50days      2300 2020-09-20
2   Hadoop  23000   55days      1000 2020-10-16
3   Python  24000   40days      1200 2021-09-26
4   Pandas  26000   60days      2500 2021-10-08

7. Complete Example For Filter DataFrame Rows on Dates


import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000],
    'Duration':['30days','50days','55days','40days','60days','35days','55days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400],
    'Date':["2020-08-14","2020-09-20","2020-10-16","2021-09-26","2021-10-08","2021-11-17","2021-11-29"]
               })
df = pd.DataFrame(technologies)
# Convert the date to datetime64
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
print(df)

# Use pandas.to_datetime() to filter dates
df2 = df[(df['Date'] > "2020-09-20") & (df['Date'] < "2021-11-17")]
print(df2)

# Filter by single day
df2 = df[df['Date'].dt.strftime('%Y-%m-%d') == "2021-10-08"]
print(df2)

# Filter by single month
df2 = df[df['Date'].dt.strftime('%Y-%m') == '2021-11']
print(df2)

# Filter by single year
df2 = df[df['Date'].dt.strftime('%Y') == '2021']
print(df2)

# Filter dates using DataFrame.loc[]
df2 = df.loc[(df['Date'] >= '2020-09-20') & (df['Date'] <'2021-11-08')]
print(df2)
      
# Filter data for specific weekday 
df2 = df.loc[df['Date'].dt.weekday == 2]
print(df2)

# Use DataFrame.query() to filter DataFrame on dates
df2 = df.query("Date >= '2020-08-14' and Date < '2021-11-17'")
print(df2) 

Conclusion

In this article, you have learned how to filter DataFrame rows on dates using pandas.to_datetime() pandas.Series.dt.strftime(), DataFrame.loc[] and DataFrame.query() function with more examples.

Happy Learning !!

References

Leave a Reply

You are currently viewing Pandas Filter DataFrame Rows on Dates