• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:13 mins read
You are currently viewing Pandas Select DataFrame Rows Between Two Dates

Let’s see how to select/filter rows between two dates in Pandas DataFrame, in real-time applications you would often be required to select rows between two dates (similar to a greater than a start date and less than an end date), In pandas, you can do this in several ways, for example, using between(), between_time(), date_range() e.t.c.

In this article, I will explain how to select Pandas DataFrame rows between two dates by using the boolean mask with the loc[] attribute and DataFrame indexing. You can also use DataFrame.query(), DataFrame.isin(), and pandas.Series.between() methods. These methods are used to select rows based on the date in Pandas.

1. Quick Examples of Select Rows Between Two Dates

If you are in a hurry, below are some quick examples of how to select rows between two dates from DataFrame.


# Below are the quick examples

# Example 1: Select DataFrame rows between two dates
mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date)
df2 = df.loc[mask]

# Example 2: Using pandas.DataFrame.loc[] to Filter Rows by Dates
df2 = df.loc[between_two_dates]

# Example 3: Using pandas.DataFrame.query() to select DataFrame Rows
start_date = '2021-11-15'
end_date   = '2021-11-18'
df2 = df.query('InsertedDates >= @start_date and InsertedDates <= @end_date')

# Example 4: Select rows between two dates using DataFrame.query()
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates > @start_date and InsertedDates < @end_date')

# Example 5: Pandas.Series.between() function Using two dates
df2 = df.loc[df["InsertedDates"].between("2021-11-16", "2021-11-18")]

# Example 6: Select DataFrame rows between two dates using DataFrame.isin()
df2 = df[df["InsertedDates"].isin(pd.date_range("2021-11-15", "2021-11-17"))]

Now, let’s create a DataFrame with a few rows and columns, execute these examples, and validate the results. Our DataFrame contains column names Courses, Fee, Duration,Discount and Inserted Dates.


import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000],
    'Duration':['30days','50days','55days','40days','60days','35days','55days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400],
    'InsertedDates':["2021-11-14","2021-11-15","2021-11-16","2021-11-17","2021-11-18","2021-11-19","2021-11-20"]
               })
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.


# Output:
   Courses    Fee Duration  Discount   InsertedDates
0    Spark  22000   30days      1000     2021-11-14
1  PySpark  25000   50days      2300     2021-11-15
2   Hadoop  23000   55days      1000     2021-11-16
3   Python  24000   40days      1200     2021-11-17
4   Pandas  26000   60days      2500     2021-11-18
5   Hadoop  25000   35days      1300     2021-11-19
6    Spark  25000   55days      1400     2021-11-20

2. Select DataFrame Rows Between Two Dates

In order to select rows between two dates in Pandas DataFrame, first, create a boolean mask using mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date) to represent the start and end of the date range. Then you select the DataFrame that lies within the range using the DataFrame.loc[] method.


# Select DataFrame Rows Between Two Dates
df = pd.DataFrame(technologies)
start_date = '2021-11-15'
end_date = '2021-11-18'

# Select DataFrame rows between two dates
mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date)
df2 = df.loc[mask]
print("After selceting rows between the two dates:\n", df2)

Yields below output. This returns the rows greater and less than the date selected


# Output:
  Courses    Fee Duration  Discount   InsertedDates
2  Hadoop  23000   55days      1000     2021-11-16
3  Python  24000   40days      1200     2021-11-17
4  Pandas  26000   60days      2500     2021-11-18

3. Filter Rows by Dates Using DataFrame.loc[] Method

Use DataFrame.loc[] method to return a boolean array with start_date ane end_date (date is in between 15th and 19th September) . Then the loc[] function filters the rows only having between two dates.


# Filter Rows by Dates Using DataFrame.loc[] Method
df = pd.DataFrame(technologies)
start_date = '2021-11-15'
end_date = '2021-11-19'
after_start_date = df["InsertedDates"] >= start_date
before_end_date = df["InsertedDates"] <= end_date
between_two_dates = after_start_date & before_end_date

# Using pandas.DataFrame.loc to Filter Rows by Dates
df2 = df.loc[between_two_dates]
print("After selceting rows between the two dates:\n", df2)

Yields below output.

pandas rows two dates

4. Select Rows Between Two Dates Using DataFrame.query()

You can also select DataFrame rows between two dates in Pandas using the pandas.DataFrame.query() method. This method returns a DataFrame result from the provided query expression. For instance df.query('Dates >= @start_date and Dates <= @end_date').


# Using pandas.DataFrame.query() to select DataFrame Rows
start_date = '2021-11-15'
end_date   = '2021-11-18'
df2 = df.query('InsertedDates >= @start_date and InsertedDates <= @end_date')
print("After selecting the rows between the two dates:\n", df2)

Yields below output.

pandas rows two dates

If you don’t want to include boundaries, just change the condition like below the following syntax: df.query('Dates > @start_date and Dates < @end_date').


# Select rows between two dates using DataFrame.query()
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates > @start_date and InsertedDates < @end_date')
print("After selecting the rows between the two dates:\n", df2)

Yields below output.


# Output:
# After selecting the rows between the two dates:
  Courses    Fee Duration  Discount InsertedDates
2  Hadoop  23000   55days      1000    2021-11-16
3  Python  24000   40days      1200    2021-11-17

5. Use pandas.Series.between() to Two Dates

You can use pandas.Series.between() method to select DataFrame rows between two dates. This method returns a boolean vector representing whether a series element lies in the specified range or not. We have to pass the obtained boolean vector to the DataFrame.loc() method for extracting DataFrame.


# Pandas.Series.between() function Using two dates
df2 = df.loc[df["InsertedDates"].between("2021-11-16", "2021-11-18")]
print("After selecting the rows between the two dates:\n", df2)

Yields below output.


# Output:
# After selecting the rows between the two dates:
  Courses   Fees Duration  Discount InsertedDates
2  Hadoop  23000   55days      1000    2021-11-16
3  Python  24000   40days      1200    2021-11-17
4  Pandas  26000   60days      2500    2021-11-18

6. Select DataFrame Rows Between Two Dates Using DataFrame.isin()

Use pandas.DataFrame.isin() to filter DataFrame rows based on the date in Pandas. pandas.date_range() returns a fixed DateTimeIndex. Its first and second parameters are starting and ending dates.


import pandas as pd
import numpy as np
import datetime
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"]
Fee =[22000,25000,23000,24000,26000,25000,25000]
Duration =['30days','50days','55days','40days','60days','35days','55days']
Discount =[1000,2300,1000,1200,2500,1300,1400]
InsertedDates =["2021-11-14","2021-11-15","2021-11-16","2021-11-17","2021-11-18","2021-11-19","2021-11-20"]

df = pd.DataFrame({"Courses":Courses,"Fees":Fee,"Duration":Duration,"Discount":Discount,'InsertedDates': pd.to_datetime(InsertedDates)})
# Select DataFrame rows between two dates using DataFrame.isin()
df2 = df[df["InsertedDates"].isin(pd.date_range("2021-11-15", "2021-11-17"))]
print("After selecting the rows between the two dates:\n", df2)

Yields below output.


# Output:
# After selecting the rows between the two dates:
   Courses   Fees Duration  Discount InsertedDates
1  PySpark  25000   50days      2300    2021-11-15
2   Hadoop  23000   55days      1000    2021-11-16
3   Python  24000   40days      1200    2021-11-17

7. Complete Example For Select Rows Between Two Dates


import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Hadoop","Spark"],
    'Fee' :[22000,25000,23000,24000,26000,25000,25000],
    'Duration':['30days','50days','55days','40days','60days','35days','55days'],
    'Discount':[1000,2300,1000,1200,2500,1300,1400],
    'InsertedDates':["2021-11-14","2021-11-15","2021-11-16","2021-11-17","2021-11-18","2021-11-19","2021-11-20"]
               })
df = pd.DataFrame(technologies)
print(df)

# Select DataFrame rows between two dates
mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date)
df2 = df.loc[mask]
print(df2)

start_date = '2021-11-15'
end_date = '2021-11-19'
after_start_date = df["InsertedDates"] >= start_date
before_end_date = df["InsertedDates"] <= end_date
between_two_dates = after_start_date & before_end_date

# Using pandas.DataFrame.loc to Filter Rows by Dates
df2 = df.loc[between_two_dates]
print(df2)

# Using pandas.DataFrame.query() to select DataFrame Rows
start_date = '2021-11-15'
end_date   = '2021-11-18'
df2 = df.query('InsertedDates >= @start_date and InsertedDates <= @end_date')
print(df2)

# Select rows between two dates using DataFrame.query()
start_date = '2021-11-15'
end_date = '2021-11-18'
df2 = df.query('InsertedDates > @start_date and InsertedDates < @end_date')
print(df2)

# Pandas.Series.between() function Using two dates
df2 = df.loc[df["InsertedDates"].between("2021-11-16", "2021-11-18")]
print(df2)

# Select DataFrame rows between two dates using DataFrame.isin()
df2 = df[df["InsertedDates"].isin(pd.date_range("2021-11-15", "2021-11-17"))]
print(df2)

8. Conclusion

In this article, you have learned how to select DataFrame rows between two dates in pandas using DataFrame.loc[], DataFrame.query(), DataFrame.isin() and pandas.Series.between() with more examples.

Happy Learning !!

References

Malli

Malli is an experienced technical writer with a passion for translating complex Python concepts into clear, concise, and user-friendly articles. Over the years, he has written hundreds of articles in Pandas, NumPy, Python, and takes pride in ability to bridge the gap between technical experts and end-users.

Leave a Reply