In Pandas, the `first()`

method is used to select the initial entries of a DataFrame or Series based on a specified time offset. This method is particularly useful when working with time-indexed data (time series) and allows for easy filtering of data within a specified range of time from the start.

In this article, I will explain the Pandas DataFrame `first()`

method, including its syntax, parameters, and usage. I will demonstrate how to return a subset of the original DataFrame that includes the initial rows within a specified time offset.

**Key Points –**

- The
`first()`

method is used to select rows from a Pandas DataFrame or Series with a`DatetimeIndex`

. - The offset parameter accepts strings representing time periods, such as days (
`D`

), weeks (`W`

), months (`M`

), or years (`Y`

). - It returns a subset of the DataFrame, containing the rows from the start until the defined time range.
- You can also use specific dates in the format
`YYYY-MM-DD`

to filter data up to that point. - The
`DatetimeIndex`

is required for the method to function; it doesn’t work on regular integer or string indexes. - Commonly used for analyzing time series data, making it easier to focus on initial periods like days, months, or years.

## Pandas first() Method

The `first()`

method in Pandas is designed to retrieve the first subset of rows from a DataFrame or Series that have a DatetimeIndex. This method is particularly useful when you’re working with time-series data and want to filter the data based on a specified time period or date.

### Syntax of Pandas DataFrame first() Method

Let’s know the syntax of the first() method.

```
# Syntax of DataFrame.first() method
DataFrame.first(offset)
```

### Parameters of the DataFrame.first()

Following are the parameters of the first() method.

`offset`

– A string representing the time duration, using Pandas’ offset aliases (e.g., ‘5D’ for 5 days, ‘2M’ for 2 months, ‘1Y’ for 1 year). This defines the time range from the start of the DataFrame to include in the output.

### Return Value

It returns a subset of the DataFrame that contains all rows from the beginning up to the specified time offset.

## Usage of Pandas DataFrame first() Method

The `first()`

method is used to select the first subset of rows from a DataFrame or Series based on a specified time offset. It works with DataFrames that have a `DatetimeIndex`

.

To run some examples of Pandas DataFrame `first()`

method, let’s create a sample DataFrame with a DatetimeIndex.

```
import pandas as pd
# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=10, freq='D')
df = pd.DataFrame({'value': range(10)}, index=dates)
print("Create DataFrame with a DatetimeIndex:\n",df)
```

This DataFrame has 10 rows with a `DatetimeIndex`

starting from 2024-01-01 and increasing by 1 day for each row. The column `value`

contains numbers from 0 to 9 corresponding to each date. This example yields the below output.

## Selecting the First 5 Days of Data

To retrieve the first 5 days of data from the DataFrame, you can apply the `first()`

method. For instance, using `first(5D)`

will return all rows corresponding to the first 5 days within that time frame.

```
# Selecting the first 5 days of data
df2 = df.first('5D')
print("First 5 days of data:\n", df2)
```

Yields below output.

## Selecting the First 2 Weeks of Data

Alternatively, to select the first 2 weeks of data from a DataFrame with a `DatetimeIndex`

, you can use the `first()`

method with the `2W`

time offset.

```
import pandas as pd
# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=30, freq='D')
df = pd.DataFrame({'value': range(30)}, index=dates)
# Selecting the first 2 weeks of data
df2 = df.first('2W')
print("First 2 weeks of data:\n", df2)
```

In the above example, `df.first(2W)`

selects all rows from the beginning of the DataFrame up to 2 weeks (14 days) based on the datetime index. This will include the data from January 1, 2024, through January 14, 2024.

```
# Output:
First 2 weeks of data:
value
2024-01-01 0
2024-01-02 1
2024-01-03 2
2024-01-04 3
2024-01-05 4
2024-01-06 5
2024-01-07 6
2024-01-08 7
2024-01-09 8
2024-01-10 9
2024-01-11 10
2024-01-12 11
2024-01-13 12
```

## First 10 Hours from an Hourly Time Series

You can use the `first()`

method with the time offset `10H`

(representing 10 hours) to select the first 10 hours from a DataFrame with an hourly `DatetimeIndex`

.

```
import pandas as pd
# Create a sample DataFrame with an hourly DatetimeIndex
dates = pd.date_range('2024-01-01', periods=24, freq='H')
df = pd.DataFrame({'value': range(24)}, index=dates)
# Get the first 10 hours of data
df2 = df.first('10H')
print("First 10 hours of data:\n", df2)
```

In this example, `df.first('10H')`

retrieves the rows from the start of the DataFrame up to 10 hours later, including all hourly data from January 1, 2024, 00:00 through January 1, 2024, 09:00.

```
First 10 hours of data:
value
2024-01-01 00:00:00 0
2024-01-01 01:00:00 1
2024-01-01 02:00:00 2
2024-01-01 03:00:00 3
2024-01-01 04:00:00 4
2024-01-01 05:00:00 5
2024-01-01 06:00:00 6
2024-01-01 07:00:00 7
2024-01-01 08:00:00 8
2024-01-01 09:00:00 9
```

## Selecting the First 3 Months of Data

Similarly,you can use the `first()`

method with the `3M`

offset to select the initial 3 months of data from a DataFrame that has a `DatetimeIndex`

.

```
import pandas as pd
# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=100, freq='D')
df = pd.DataFrame({'value': range(100)}, index=dates)
# Get the first 3 months of data
df2 = df.first('3M')
print("First 3 months of data:\n", df2)
```

In this example, `df.first('3M')`

retrieves all rows from the start of the DataFrame up to 3 months from the beginning, including data from January 1, 2024, through March 31, 2024, assuming the index is continuous and evenly spaced.

```
First 3 months of data:
value
2024-01-01 0
2024-01-02 1
2024-01-03 2
2024-01-04 3
2024-01-05 4
... ...
2024-03-27 86
2024-03-28 87
2024-03-29 88
2024-03-30 89
2024-03-31 90
[91 rows x 1 columns]
```

## Selecting the First Year of Data

Finally, select the first year of data from a DataFrame with a `DatetimeIndex`

, use the `first()`

method with the time offset `1Y`

(representing 1 year).

```
import pandas as pd
# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=365, freq='D')
df = pd.DataFrame({'value': range(365)}, index=dates)
# Selecting the first year of data
df2 = df.first('1Y')
print("First year of data:\n", df2)
```

In this example, `df.first('1Y')`

retrieves all rows from the start of the DataFrame up to 1 year later, covering data from January 1, 2024, to December 31, 2024.

```
# Output:
First year of data:
value
2024-01-01 0
2024-01-02 1
2024-01-03 2
2024-01-04 3
2024-01-05 4
... ...
2024-12-26 360
2024-12-27 361
2024-12-28 362
2024-12-29 363
2024-12-30 364
[365 rows x 1 columns]
```

## FAQ on Pandas DataFrame first() Method

**What does the first() method do in Pandas?**

The `first()`

method in Pandas is primarily used for retrieving the first few rows of a DataFrame or Series that are indexed with `DatetimeIndex`

, based on a specified time period or date.

**How do I specify the time offset for the first() method?**

The time offset is specified as a string in formats such as `'5D'`

(5 days), `'2W'`

(2 weeks), `'3M'`

(3 months), or `'1Y'`

(1 year). You can also use a specific date string (e.g., `'2024-03-15'`

).

**Can I use the first() method with a non-time-based index?**

The `first()`

method requires a `DatetimeIndex`

. It will not work with other types of indexes like integer or string indexes.

**How does the first() method handle incomplete data periods?**

The `first()`

method will include all rows that fall within the specified time offset period from the beginning of the DataFrame, even if the period does not end exactly at a data point.

**Can the first() method be used on a Series?**

The `first()`

method can be used on a Series with a `DatetimeIndex`

in the same way it is used with DataFrames.

## Conclusion

In conclusion, the Pandas `first()`

method is a powerful tool for selecting the initial entries from a DataFrame or Series based on a specified time offset. It is particularly useful when working with time-indexed data, allowing you to easily filter out subsets of data over specific time intervals, such as days, weeks, months, or even hours.

Happy Learning!!

