Site icon Spark By {Examples}

Pandas Extract Year from Datetime

Pandas extract year

To extract the year from a DateTime column in a Pandas DataFrame, you can use the dt.year accessor. In this article, I will explain how to extract the year from the Datetime column using pandas.Series.dt.year, pandas.DatetimeIndex properties and strftime() functions.

If your data is not in DateTime type initially, you should convert it first using pd.to_datetime() before extracting the year.

Key Points –

Quick Examples of Extract Year from Datetime

If you are in a hurry, below are some quick examples of how to extract the year from the pandas DataFrame DateTime column.


# Quick examples of extract year from datetime

# Example 1: Use Datetime.strftime() method to extract year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')

# Example 2: Using pandas.Series.dt.year()
df['Year'] = df['InsertedDate'].dt.year  

# Example 3: Using pandas.DatetimeIndex() to extract year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year

# Example 4: Use datetime.to_period() method to extract year
df['Month_Year'] = df['InsertedDate'].dt.to_period('y')

# Example 5: Use DataFrame.apply() with lambda function and strftime()
df['Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%Y')) 

# Example 6: Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y')

Pandas Extract Year using Datetime.strftime()

To run some examples of pandas extract year from Datetime, let’s create a Pandas DataFrame with the column of Datetime in the form of year, month, and day and use Pandas attributes and functions to extract the year from a given Datetime column.


import pandas as pd
import numpy as np
import datetime
Dates = ["2018-08-14","2019-10-17","2020-11-14","2020-05-17","2021-09-15","2021-12-14"]
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop"]
df = pd.DataFrame({'InsertedDate': pd.to_datetime(Dates)},index=Courses)
print("Create DataFrame:\n", df)

Yields below output. This example extracts the year and add as a new column to DataFrame.

Yields below output.

Pandas extract year

strftime() method takes the datetime format and returns a string representing the specific format. You can use %Y as format code to extract the year from the DataFrame. Here, pd.to_datetime() is used to convert String to Datetime.


# Use Datetime.strftime() method to extract year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')
print("Get the year from the datetime column:\n", df)

In the above examples, dt.strftime('%Y') extracts the year component from the InsertedDate’ column of the DataFrame df using the strftime() method with the format %Y, which represents the year with century as a decimal number. This example yields the below output.

Pandas extract year

Extract Year Using Series.dt.year()

We can use pandas.Series.dt.year() to extract year but, this function returns a series object. Assign these to a column to get a DataFrame with year columns.


# Using pandas.Series.dt.year()
df['Year'] = df['InsertedDate'].dt.year 
print("Get the year from the datetime column:\n", df)

Yields below output.


# Output:
# Get the year from the datetime column:
        InsertedDate  Year
Spark     2018-08-14  2018
PySpark   2019-10-17  2019
Hadoop    2020-11-14  2020
Python    2020-05-17  2020
Pandas    2021-09-15  2021
Hadoop    2021-12-14  2021

Use Pandas DatetimeIndex() to Extract Year

We can also extract the year from the Pandas Datetime column, using the DatetimeIndex.year attribute. Note that this method takes a date as an argument.


# Using pandas.DatetimeIndex() to extract year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year
print("Get the year from the datetime column:\n", df)

Yields the same output as above.

Use Datetime.to_period() Method to Extract Year

You can also use df['Year']=df['InsertedDate'].dt.to_period('Y') method. The df['date_column'] has to be in datetime format.


# Use datetime.to_period() method to year
df['Year'] = df['InsertedDate'].dt.to_period('Y')
print("Get the year from the datetime column:\n", df)

Yields below output.


# Output:
# Get the year from the datetime column:
        InsertedDate  Year
Spark     2018-08-14  2018
PySpark   2019-10-17  2019
Hadoop    2020-11-14  2020
Python    2020-05-17  2020
Pandas    2021-09-15  2021
Hadoop    2021-12-14  2021

Use DataFrame.apply() With Lambda Function and strftime()

You can utilize DataFrame.apply() with a lambda function and strftime() to extract the year from the DateTime column. Let’s see how to get the year by using Pandas DataFrame.apply() and lambda function.


# Use DataFrame.apply() with lambda function and strftime()
df['Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%Y')) 
print("Get the year from the datetime column:\n", df)

Yields below output.


# Output:
# Get the year from the datetime column:
        InsertedDate  Year
Spark     2018-08-14  2018
PySpark   2019-10-17  2019
Hadoop    2020-11-14  2020
Python    2020-05-17  2020
Pandas    2021-09-15  2021
Hadoop    2021-12-14  2021

Use Pandas.to_datetime() and datetime.strftime() Method

You can use pd.to_datetime() and strftime() method from the datetime module to extract the year from a datetime column in a DataFrame.


# Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y')
print("Get the year from the datetime column:\n", df)

Yields below output.


# Output:
# Get the year from the datetime column:
        InsertedDate  yyyy
Spark     2018-08-14  2018
PySpark   2019-10-17  2019
Hadoop    2020-11-14  2020
Python    2020-05-17  2020
Pandas    2021-09-15  2021
Hadoop    2021-12-14  2021

Frequently Asked Questions on Extract Year from Datetime

How can I extract the year from a datetime column in a Pandas DataFrame?

You can use the dt attribute in Pandas to extract the year from a datetime column. For example, df['year'] = df['datetime_column'].dt.year

How can I extract the year directly without creating a new column?

you can extract the year without creating a new column by simply accessing the dt.year attribute. For example, df['year'] = pd.to_datetime(df['datetime_column']).dt.year

How can I extract the year from a datetime index in a DataFrame?

If your DataFrame has a datetime index, you can use the year attribute directly on the index. For example, df.set_index('timestamp', inplace=True)
df['year'] = df.index.year

Conclusion

In this article, you have learned how to extract the year from the Pandas DateTime column by using pandas.Series.dt.strftime(), pandas.DatetimeIndex(), datetime.to_period() and DataFrame.apply() methods with examples.

Happy Learning !!

References

Exit mobile version