• Post author:
  • Post category:Pandas
  • Post last modified:September 20, 2024
  • Reading time:16 mins read
You are currently viewing Pandas DataFrame first() Method

In Pandas, the first() method is used to select the initial entries of a DataFrame or Series based on a specified time offset. This method is particularly useful when working with time-indexed data (time series) and allows for easy filtering of data within a specified range of time from the start.

Advertisements

In this article, I will explain the Pandas DataFrame first() method, including its syntax, parameters, and usage. I will demonstrate how to return a subset of the original DataFrame that includes the initial rows within a specified time offset.

Key Points –

  • The first() method is used to select rows from a Pandas DataFrame or Series with a DatetimeIndex.
  • The offset parameter accepts strings representing time periods, such as days (D), weeks (W), months (M), or years (Y).
  • It returns a subset of the DataFrame, containing the rows from the start until the defined time range.
  • You can also use specific dates in the format YYYY-MM-DD to filter data up to that point.
  • The DatetimeIndex is required for the method to function; it doesn’t work on regular integer or string indexes.
  • Commonly used for analyzing time series data, making it easier to focus on initial periods like days, months, or years.

Pandas first() Method

The first() method in Pandas is designed to retrieve the first subset of rows from a DataFrame or Series that have a DatetimeIndex. This method is particularly useful when you’re working with time-series data and want to filter the data based on a specified time period or date.

Syntax of Pandas DataFrame first() Method

Let’s know the syntax of the first() method.


# Syntax of DataFrame.first() method
DataFrame.first(offset)

Parameters of the DataFrame.first()

Following are the parameters of the first() method.

  • offset – A string representing the time duration, using Pandas’ offset aliases (e.g., ‘5D’ for 5 days, ‘2M’ for 2 months, ‘1Y’ for 1 year). This defines the time range from the start of the DataFrame to include in the output.

Return Value

It returns a subset of the DataFrame that contains all rows from the beginning up to the specified time offset.

Usage of Pandas DataFrame first() Method

The first() method is used to select the first subset of rows from a DataFrame or Series based on a specified time offset. It works with DataFrames that have a DatetimeIndex.

To run some examples of Pandas DataFrame first() method, let’s create a sample DataFrame with a DatetimeIndex.


import pandas as pd

# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=10, freq='D')
df = pd.DataFrame({'value': range(10)}, index=dates)
print("Create DataFrame with a DatetimeIndex:\n",df)

This DataFrame has 10 rows with a DatetimeIndex starting from 2024-01-01 and increasing by 1 day for each row. The column value contains numbers from 0 to 9 corresponding to each date. This example yields the below output.

pandas first

Selecting the First 5 Days of Data

To retrieve the first 5 days of data from the DataFrame, you can apply the first() method. For instance, using first(5D) will return all rows corresponding to the first 5 days within that time frame.


# Selecting the first 5 days of data
df2 = df.first('5D')
print("First 5 days of data:\n", df2)

Yields below output.

pandas first

Selecting the First 2 Weeks of Data

Alternatively, to select the first 2 weeks of data from a DataFrame with a DatetimeIndex, you can use the first() method with the 2W time offset.


import pandas as pd

# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=30, freq='D')
df = pd.DataFrame({'value': range(30)}, index=dates)

# Selecting the first 2 weeks of data
df2 = df.first('2W')
print("First 2 weeks of data:\n", df2)

In the above example, df.first(2W) selects all rows from the beginning of the DataFrame up to 2 weeks (14 days) based on the datetime index. This will include the data from January 1, 2024, through January 14, 2024.


# Output:
First 2 weeks of data:
             value
2024-01-01      0
2024-01-02      1
2024-01-03      2
2024-01-04      3
2024-01-05      4
2024-01-06      5
2024-01-07      6
2024-01-08      7
2024-01-09      8
2024-01-10      9
2024-01-11     10
2024-01-12     11
2024-01-13     12

First 10 Hours from an Hourly Time Series

You can use the first() method with the time offset 10H (representing 10 hours) to select the first 10 hours from a DataFrame with an hourly DatetimeIndex.


import pandas as pd

# Create a sample DataFrame with an hourly DatetimeIndex
dates = pd.date_range('2024-01-01', periods=24, freq='H')
df = pd.DataFrame({'value': range(24)}, index=dates)

# Get the first 10 hours of data
df2 = df.first('10H')
print("First 10 hours of data:\n", df2)

In this example, df.first('10H') retrieves the rows from the start of the DataFrame up to 10 hours later, including all hourly data from January 1, 2024, 00:00 through January 1, 2024, 09:00.


First 10 hours of data:
                      value
2024-01-01 00:00:00      0
2024-01-01 01:00:00      1
2024-01-01 02:00:00      2
2024-01-01 03:00:00      3
2024-01-01 04:00:00      4
2024-01-01 05:00:00      5
2024-01-01 06:00:00      6
2024-01-01 07:00:00      7
2024-01-01 08:00:00      8
2024-01-01 09:00:00      9

Selecting the First 3 Months of Data

Similarly,you can use the first() method with the 3M offset to select the initial 3 months of data from a DataFrame that has a DatetimeIndex.


import pandas as pd

# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=100, freq='D')
df = pd.DataFrame({'value': range(100)}, index=dates)

# Get the first 3 months of data
df2 = df.first('3M')
print("First 3 months of data:\n", df2)

In this example, df.first('3M') retrieves all rows from the start of the DataFrame up to 3 months from the beginning, including data from January 1, 2024, through March 31, 2024, assuming the index is continuous and evenly spaced.


First 3 months of data:
             value
2024-01-01      0
2024-01-02      1
2024-01-03      2
2024-01-04      3
2024-01-05      4
...           ...
2024-03-27     86
2024-03-28     87
2024-03-29     88
2024-03-30     89
2024-03-31     90

[91 rows x 1 columns]

Selecting the First Year of Data

Finally, select the first year of data from a DataFrame with a DatetimeIndex, use the first() method with the time offset 1Y (representing 1 year).


import pandas as pd

# Create a sample DataFrame with a DatetimeIndex
dates = pd.date_range('2024-01-01', periods=365, freq='D')
df = pd.DataFrame({'value': range(365)}, index=dates)

# Selecting the first year of data
df2 = df.first('1Y')
print("First year of data:\n", df2)

In this example, df.first('1Y') retrieves all rows from the start of the DataFrame up to 1 year later, covering data from January 1, 2024, to December 31, 2024.


# Output:
First year of data:
             value
2024-01-01      0
2024-01-02      1
2024-01-03      2
2024-01-04      3
2024-01-05      4
...           ...
2024-12-26    360
2024-12-27    361
2024-12-28    362
2024-12-29    363
2024-12-30    364

[365 rows x 1 columns]

FAQ on Pandas DataFrame first() Method

What does the first() method do in Pandas?

The first() method in Pandas is primarily used for retrieving the first few rows of a DataFrame or Series that are indexed with DatetimeIndex, based on a specified time period or date.

How do I specify the time offset for the first() method?

The time offset is specified as a string in formats such as '5D' (5 days), '2W' (2 weeks), '3M' (3 months), or '1Y' (1 year). You can also use a specific date string (e.g., '2024-03-15').

Can I use the first() method with a non-time-based index?

The first() method requires a DatetimeIndex. It will not work with other types of indexes like integer or string indexes.

How does the first() method handle incomplete data periods?

The first() method will include all rows that fall within the specified time offset period from the beginning of the DataFrame, even if the period does not end exactly at a data point.

Can the first() method be used on a Series?

The first() method can be used on a Series with a DatetimeIndex in the same way it is used with DataFrames.

Conclusion

In conclusion, the Pandas first() method is a powerful tool for selecting the initial entries from a DataFrame or Series based on a specified time offset. It is particularly useful when working with time-indexed data, allowing you to easily filter out subsets of data over specific time intervals, such as days, weeks, months, or even hours.

Happy Learning!!