Pandas Select DataFrame Rows Between Two Dates

Let’s see how to select/filter rows between two dates in Pandas DataFrame, in the real-time applications you would often be required to select rows between two dates (similar to great then start date and less than an end date), In pandas, you can do this in several ways, for example, using between(), between_time(), date_range() e.t.c.

In this article, I will explain how to select pandas DataFrame rows between two dates by using the boolean mask with the loc[] method and DataFrame indexing. You can also use DataFrame.query(), DataFrame.isin(), and pandas.Series.between() methods. These methods are used to select rows based on the date in Pandas.

1. Quick Examples of Select Rows Between Two Dates

If you are in a hurry, below are some quick examples of how to select rows between two dates from DataFrame.


# Below are quick example

# Select DataFrame rows between two dates
mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date)
df2 = df.loc[mask]

# Using pandas.DataFrame.loc[] to Filter Rows by Dates
df2 = df.loc[between_two_dates]

# Using pandas.DataFrame.query() to select DataFrame Rows
start_date = '2021-11-15'
end_date   = '2021-11-18'
df2 = df.query('InsertedDates >= @start_date and InsertedDates <= @end_date')

# Select rows between two dates using DataFrame.query()
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates > @start_date and InsertedDates < @end_date')

# pandas.Series.between() function Using two dates
df2 = df.loc[df["InsertedDates"].between("2021-11-16", "2021-11-18")]

# Select DataFrame rows between two dates using DataFrame.isin()
df2 = df[df["InsertedDates"].isin(pd.date_range("2021-11-15", "2021-11-17"))]

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses, Fee, Duration,Discount and Inserted Dates.


import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000],
    'Duration':['30days','50days','55days','40days','60days','35days','55days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400],
    'InsertedDates':["2021-11-14","2021-11-15","2021-11-16","2021-11-17","2021-11-18","2021-11-19","2021-11-20"]
               })
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee Duration  Discount   InsertedDates
0    Spark  22000   30days      1000     2021-11-14
1  PySpark  25000   50days      2300     2021-11-15
2   Hadoop  23000   55days      1000     2021-11-16
3   Python  24000   40days      1200     2021-11-17
4   Pandas  26000   60days      2500     2021-11-18
5   Hadoop  25000   35days      1300     2021-11-19
6    Spark  25000   55days      1400     2021-11-20

2. Select DataFrame Rows Between Two Dates

In order to select rows between two dates in pandas DataFrame, first, create a boolean mask using mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date) to represent the start and end of the date range. Then you select the DataFrame that lies within the range using the DataFrame.loc[] method.


df = pd.DataFrame(technologies)
start_date = '2021-11-15'
end_date = '2021-11-18'
# Select DataFrame rows between two dates
mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date)
df2 = df.loc[mask]
print(df2)

Yields below output. This returns the rows greater and less than the date selected


  Courses    Fee Duration  Discount   InsertedDates
2  Hadoop  23000   55days      1000     2021-11-16
3  Python  24000   40days      1200     2021-11-17
4  Pandas  26000   60days      2500     2021-11-18

3. Filter Rows by Dates Using DataFrame.loc[] Method

Use DataFrame.loc[] method to returns a boolean array with start_date ane end_date (date is in between 15th and 19th September) . Then the loc[] function returns only those rows having between two dates.


df = pd.DataFrame(technologies)
start_date = '2021-11-15'
end_date = '2021-11-19'
after_start_date = df["InsertedDates"] >= start_date
before_end_date = df["InsertedDates"] <= end_date
between_two_dates = after_start_date & before_end_date
# Using pandas.DataFrame.loc to Filter Rows by Dates
df2 = df.loc[between_two_dates]
print(df2)

Yields below output.


   Courses    Fee  Duration   Discount  InsertedDates
1  PySpark  25000   50days      2300     2021-11-15
2   Hadoop  23000   55days      1000     2021-11-16
3   Python  24000   40days      1200     2021-11-17
4   Pandas  26000   60days      2500     2021-11-18
5   Hadoop  25000   35days      1300     2021-11-19

4. Select Rows Between Two Dates Using DataFrame.query()

You can also select DataFrame rows between two dates in Pandas using the pandas.DataFrame.query() method. This method returns a DataFrame result from the provided query expression. For instance df.query('Dates >= @start_date and Dates <= @end_date').


# Using pandas.DataFrame.query() to select DataFrame Rows
start_date = '2021-11-15'
end_date   = '2021-11-18'
df2 = df.query('InsertedDates >= @start_date and InsertedDates <= @end_date')
print(df2)

Yields below output.


   Courses    Fee Duration  Discount InsertedDates
1  PySpark  25000   50days      2300    2021-11-15
2   Hadoop  23000   55days      1000    2021-11-16
3   Python  24000   40days      1200    2021-11-17
4   Pandas  26000   60days      2500    2021-11-18

If you don’t want to include boundaries, just change the condition like below following syntax: df.query('Dates > @start_date and Dates < @end_date').


# Select rows between two dates using DataFrame.query()
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates > @start_date and InsertedDates < @end_date')
print(df2)

Yields below output.


  Courses    Fee Duration  Discount InsertedDates
2  Hadoop  23000   55days      1000    2021-11-16
3  Python  24000   40days      1200    2021-11-17

5. Use pandas.Series.between() to Two Dates

You can use pandas.Series.between() method to select DataFrame rows between two dates. This method returns a boolean vector representing whether series element lies in the specified range or not. We have to pass the obtained boolean vector to DataFrame.loc() method for extracting DataFrame.


# pandas.Series.between() function Using two dates
df2 = df.loc[df["InsertedDates"].between("2021-11-16", "2021-11-18")]
print(df2)

Yields below output.


  Courses   Fees Duration  Discount InsertedDates
2  Hadoop  23000   55days      1000    2021-11-16
3  Python  24000   40days      1200    2021-11-17
4  Pandas  26000   60days      2500    2021-11-18

6. Select DataFrame Rows Between Two Dates Using DataFrame.isin()

Use pandas.DataFrame.isin() to filter DataFrame rows based on the date in Pandas. pandas.date_range() returns a fixed DateTimeIndex. Its first and second parameters are starting and ending dates.


import pandas as pd
import numpy as np
import datetime
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"]
Fee =[22000,25000,23000,24000,26000,25000,25000]
Duration =['30days','50days','55days','40days','60days','35days','55days']
Discount =[1000,2300,1000,1200,2500,1300,1400]
InsertedDates =["2021-11-14","2021-11-15","2021-11-16","2021-11-17","2021-11-18","2021-11-19","2021-11-20"]

df = pd.DataFrame({"Courses":Courses,"Fees":Fee,"Duration":Duration,"Discount":Discount,'InsertedDates': pd.to_datetime(InsertedDates)})
# Select DataFrame rows between two dates using DataFrame.isin()
df2 = df[df["InsertedDates"].isin(pd.date_range("2021-11-15", "2021-11-17"))]
print(df2)

Yields below output.


   Courses   Fees Duration  Discount InsertedDates
1  PySpark  25000   50days      2300    2021-11-15
2   Hadoop  23000   55days      1000    2021-11-16
3   Python  24000   40days      1200    2021-11-17

7. Complete Example For Select Rows Between Two Dates


import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000],
    'Duration':['30days','50days','55days','40days','60days','35days','55days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400],
    'InsertedDates':["2021-11-14","2021-11-15","2021-11-16","2021-11-17","2021-11-18","2021-11-19","2021-11-20"]
               })
df = pd.DataFrame(technologies)
print(df)

# Select DataFrame rows between two dates
mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date)
df2 = df.loc[mask]
print(df2)

start_date = '2021-11-15'
end_date = '2021-11-19'
after_start_date = df["InsertedDates"] >= start_date
before_end_date = df["InsertedDates"] <= end_date
between_two_dates = after_start_date & before_end_date
# Using pandas.DataFrame.loc to Filter Rows by Dates
df2 = df.loc[between_two_dates]
print(df2)

# Using pandas.DataFrame.query() to select DataFrame Rows
start_date = '2021-11-15'
end_date   = '2021-11-18'
df2 = df.query('InsertedDates >= @start_date and InsertedDates <= @end_date')
print(df2)

# Select rows between two dates using DataFrame.query()
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates > @start_date and InsertedDates < @end_date')
print(df2)

# pandas.Series.between() function Using two dates
df2 = df.loc[df["InsertedDates"].between("2021-11-16", "2021-11-18")]
print(df2)

# Select DataFrame rows between two dates using DataFrame.isin()
df2 = df[df["InsertedDates"].isin(pd.date_range("2021-11-15", "2021-11-17"))]
print(df2)

Conclusion

In this article, you have learned how to select DataFrame rows between two dates in pandas using DataFrame.loc[], DataFrame.query(), DataFrame.isin() and pandas.Series.between() with more examples.

Happy Learning !!

You May Also Like

References

pandas rows two dates

Leave a Reply

You are currently viewing Pandas Select DataFrame Rows Between Two Dates