In Pandas, the first()
method is used to select the initial entries of a DataFrame or Series based on a specified time offset. This method is particularly useful when working with time-indexed data (time series) and allows for easy filtering of data within a specified range of time from the start.
In this article, I will explain the Pandas DataFrame first()
method, including its syntax, parameters, and usage. I will demonstrate how to return a subset of the original DataFrame that includes the initial rows within a specified time offset.
Key Points –
- The
first()
method is used to select rows from a Pandas DataFrame or Series with aDatetimeIndex
. - The offset parameter accepts strings representing time periods, such as days (
D
), weeks (W
), months (M
), or years (Y
). - It returns a subset of the DataFrame, containing the rows from the start until the defined time range.
- You can also use specific dates in the format
YYYY-MM-DD
to filter data up to that point. - The
DatetimeIndex
is required for the method to function; it doesn’t work on regular integer or string indexes. - Commonly used for analyzing time series data, making it easier to focus on initial periods like days, months, or years.
Pandas first() Method
The first()
method in Pandas is designed to retrieve the first subset of rows from a DataFrame or Series that have a DatetimeIndex. This method is particularly useful when you’re working with time-series data and want to filter the data based on a specified time period or date.
Syntax of Pandas DataFrame first() Method
Let’s know the syntax of the first() method.
# Syntax of DataFrame.first() method
DataFrame.first(offset)
Parameters of the DataFrame.first()
Following are the parameters of the first() method.
offset
– A string representing the time duration, using Pandas’ offset aliases (e.g., ‘5D’ for 5 days, ‘2M’ for 2 months, ‘1Y’ for 1 year). This defines the time range from the start of the DataFrame to include in the output.
Return Value
It returns a subset of the DataFrame that contains all rows from the beginning up to the specified time offset.
Usage of Pandas DataFrame first() Method
The first()
method is used to select the first subset of rows from a DataFrame or Series based on a specified time offset. It works with DataFrames that have a DatetimeIndex
.
To run some examples of Pandas DataFrame first()
method, let’s create a sample DataFrame with a DatetimeIndex.
import pandas as pd
# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=10, freq='D')
df = pd.DataFrame({'value': range(10)}, index=dates)
print("Create DataFrame with a DatetimeIndex:\n",df)
This DataFrame has 10 rows with a DatetimeIndex
starting from 2024-01-01 and increasing by 1 day for each row. The column value
contains numbers from 0 to 9 corresponding to each date. This example yields the below output.
Selecting the First 5 Days of Data
To retrieve the first 5 days of data from the DataFrame, you can apply the first()
method. For instance, using first(5D)
will return all rows corresponding to the first 5 days within that time frame.
# Selecting the first 5 days of data
df2 = df.first('5D')
print("First 5 days of data:\n", df2)
Yields below output.
Selecting the First 2 Weeks of Data
Alternatively, to select the first 2 weeks of data from a DataFrame with a DatetimeIndex
, you can use the first()
method with the 2W
time offset.
import pandas as pd
# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=30, freq='D')
df = pd.DataFrame({'value': range(30)}, index=dates)
# Selecting the first 2 weeks of data
df2 = df.first('2W')
print("First 2 weeks of data:\n", df2)
In the above example, df.first(2W)
selects all rows from the beginning of the DataFrame up to 2 weeks (14 days) based on the datetime index. This will include the data from January 1, 2024, through January 14, 2024.
# Output:
First 2 weeks of data:
value
2024-01-01 0
2024-01-02 1
2024-01-03 2
2024-01-04 3
2024-01-05 4
2024-01-06 5
2024-01-07 6
2024-01-08 7
2024-01-09 8
2024-01-10 9
2024-01-11 10
2024-01-12 11
2024-01-13 12
First 10 Hours from an Hourly Time Series
You can use the first()
method with the time offset 10H
(representing 10 hours) to select the first 10 hours from a DataFrame with an hourly DatetimeIndex
.
import pandas as pd
# Create a sample DataFrame with an hourly DatetimeIndex
dates = pd.date_range('2024-01-01', periods=24, freq='H')
df = pd.DataFrame({'value': range(24)}, index=dates)
# Get the first 10 hours of data
df2 = df.first('10H')
print("First 10 hours of data:\n", df2)
In this example, df.first('10H')
retrieves the rows from the start of the DataFrame up to 10 hours later, including all hourly data from January 1, 2024, 00:00 through January 1, 2024, 09:00.
First 10 hours of data:
value
2024-01-01 00:00:00 0
2024-01-01 01:00:00 1
2024-01-01 02:00:00 2
2024-01-01 03:00:00 3
2024-01-01 04:00:00 4
2024-01-01 05:00:00 5
2024-01-01 06:00:00 6
2024-01-01 07:00:00 7
2024-01-01 08:00:00 8
2024-01-01 09:00:00 9
Selecting the First 3 Months of Data
Similarly,you can use the first()
method with the 3M
offset to select the initial 3 months of data from a DataFrame that has a DatetimeIndex
.
import pandas as pd
# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=100, freq='D')
df = pd.DataFrame({'value': range(100)}, index=dates)
# Get the first 3 months of data
df2 = df.first('3M')
print("First 3 months of data:\n", df2)
In this example, df.first('3M')
retrieves all rows from the start of the DataFrame up to 3 months from the beginning, including data from January 1, 2024, through March 31, 2024, assuming the index is continuous and evenly spaced.
First 3 months of data:
value
2024-01-01 0
2024-01-02 1
2024-01-03 2
2024-01-04 3
2024-01-05 4
... ...
2024-03-27 86
2024-03-28 87
2024-03-29 88
2024-03-30 89
2024-03-31 90
[91 rows x 1 columns]
Selecting the First Year of Data
Finally, select the first year of data from a DataFrame with a DatetimeIndex
, use the first()
method with the time offset 1Y
(representing 1 year).
import pandas as pd
# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=365, freq='D')
df = pd.DataFrame({'value': range(365)}, index=dates)
# Selecting the first year of data
df2 = df.first('1Y')
print("First year of data:\n", df2)
In this example, df.first('1Y')
retrieves all rows from the start of the DataFrame up to 1 year later, covering data from January 1, 2024, to December 31, 2024.
# Output:
First year of data:
value
2024-01-01 0
2024-01-02 1
2024-01-03 2
2024-01-04 3
2024-01-05 4
... ...
2024-12-26 360
2024-12-27 361
2024-12-28 362
2024-12-29 363
2024-12-30 364
[365 rows x 1 columns]
FAQ on Pandas DataFrame first() Method
The first()
method in Pandas is primarily used for retrieving the first few rows of a DataFrame or Series that are indexed with DatetimeIndex
, based on a specified time period or date.
The time offset is specified as a string in formats such as '5D'
(5 days), '2W'
(2 weeks), '3M'
(3 months), or '1Y'
(1 year). You can also use a specific date string (e.g., '2024-03-15'
).
The first()
method requires a DatetimeIndex
. It will not work with other types of indexes like integer or string indexes.
The first()
method will include all rows that fall within the specified time offset period from the beginning of the DataFrame, even if the period does not end exactly at a data point.
The first()
method can be used on a Series with a DatetimeIndex
in the same way it is used with DataFrames.
Conclusion
In conclusion, the Pandas first()
method is a powerful tool for selecting the initial entries from a DataFrame or Series based on a specified time offset. It is particularly useful when working with time-indexed data, allowing you to easily filter out subsets of data over specific time intervals, such as days, weeks, months, or even hours.
Happy Learning!!
Related Articles
- Pandas DataFrame all() Method
- Pandas DataFrame dot() Method
- Pandas DataFrame abs() Method
- Pandas DataFrame std() Method
- Pandas DataFrame round() Method
- Pandas DataFrame min() Method
- Pandas DataFrame eval() Function
- Pandas DataFrame bfill() Method
- Pandas DataFrame cumprod() Method
- Pandas DataFrame cumsum() Method