In Pandas, you can extract the year from a datetime column using the dt.year
accessor. Before extracting the year, it’s advisable to convert data that is not initially in DateTime type using pd.to_datetime()
.
In this article, I will explain how to extract the year from the Datetime column using pandas.Series.dt.year
, pandas.DatetimeIndex properties and strftime()
functions.
Key Points –
- Use the
.dt.year
accessor to extract the year from a DateTime column in Pandas. - Ensure the DateTime column is in the correct format using
pd.to_datetime()
if needed. - Pandas provides the
.dt
accessor for datetime series, allowing you to access various components like year, month, day, etc. - Utilize string slicing to extract the year portion from a datetime column.
- Consider using regular expressions to extract year from datetime strings with varying formats.
Quick Examples of Extract Year from Datetime
Following are quick examples of extracting the year from the pandas DataFrame DateTime column.
# Quick examples of extract year from datetime
# Example 1: Use Datetime.strftime() method to extract year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')
# Example 2: Using pandas.Series.dt.year()
df['Year'] = df['InsertedDate'].dt.year
# Example 3: Using pandas.DatetimeIndex() to extract year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year
# Example 4: Use datetime.to_period() method to extract year
df['Month_Year'] = df['InsertedDate'].dt.to_period('y')
# Example 5: Use DataFrame.apply() with lambda function and strftime()
df['Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%Y'))
# Example 6: Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y')
Pandas Extract Year using Datetime.strftime()
To run some examples of pandas extract year from Datetime, let’s create a Pandas DataFrame with the column of Datetime in the form of year, month, and day and use Pandas attributes and functions to extract the year from a given Datetime column.
import pandas as pd
import numpy as np
import datetime
Dates = ["2018-08-14","2019-10-17","2020-11-14","2020-05-17","2021-09-15","2021-12-14"]
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop"]
df = pd.DataFrame({'InsertedDate': pd.to_datetime(Dates)},index=Courses)
print("Create DataFrame:\n", df)
This example extracts the year and add as a new column to DataFrame. This example yields the below output.
strftime()
method takes the datetime format and returns a string representing the specific format. You can use %Y
as format code to extract the year from the DataFrame. Here, pd.to_datetime()
is used to convert String to Datetime.
# Use Datetime.strftime() method to extract year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')
print("Get the year from the datetime column:\n", df)
In the above examples, dt.strftime('%Y')
extract the year component from the InsertedDate’ column of the DataFrame df
using the strftime()
method with the format %Y
, which represents the year with century as a decimal number. This example yields the below output.
Extract Year Using Series.dt.year()
We can use pandas.Series.dt.year()
to extract year but, this function returns a series object. Assign these to a column to get a DataFrame with year columns.
# Using pandas.Series.dt.year()
df['Year'] = df['InsertedDate'].dt.year
print("Get the year from the datetime column:\n", df)
Yields below output.
# Output:
# Get the year from the datetime column:
InsertedDate Year
Spark 2018-08-14 2018
PySpark 2019-10-17 2019
Hadoop 2020-11-14 2020
Python 2020-05-17 2020
Pandas 2021-09-15 2021
Hadoop 2021-12-14 2021
Use Pandas DatetimeIndex() to Extract Year
We can also extract the year from the Pandas Datetime column, using the DatetimeIndex.year
attribute. Note that this method takes a date as an argument.
# Using pandas.DatetimeIndex() to extract year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year
print("Get the year from the datetime column:\n", df)
Yields the same output as above.
Use Datetime.to_period() Method to Extract Year
You can also use df['Year']=df['InsertedDate'].dt.to_period('Y')
method. The df['date_column']
has to be in datetime format.
# Use datetime.to_period() method to year
df['Year'] = df['InsertedDate'].dt.to_period('Y')
print("Get the year from the datetime column:\n", df)
Yields below output.
# Output:
# Get the year from the datetime column:
InsertedDate Year
Spark 2018-08-14 2018
PySpark 2019-10-17 2019
Hadoop 2020-11-14 2020
Python 2020-05-17 2020
Pandas 2021-09-15 2021
Hadoop 2021-12-14 2021
Use DataFrame.apply() With Lambda Function and strftime()
You can utilize DataFrame.apply()
with a lambda function and strftime()
to extract the year from the DateTime column. Let’s see how to get the year by using Pandas DataFrame.apply() and lambda function.
# Use DataFrame.apply() with lambda function and strftime()
df['Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%Y'))
print("Get the year from the datetime column:\n", df)
Yields below output.
# Output:
# Get the year from the datetime column:
InsertedDate Year
Spark 2018-08-14 2018
PySpark 2019-10-17 2019
Hadoop 2020-11-14 2020
Python 2020-05-17 2020
Pandas 2021-09-15 2021
Hadoop 2021-12-14 2021
Use Pandas.to_datetime() and datetime.strftime() Method
You can use pd.to_datetime()
and strftime()
method from the datetime module to extract the year from a datetime column in a DataFrame.
# Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y')
print("Get the year from the datetime column:\n", df)
Yields below output.
# Output:
# Get the year from the datetime column:
InsertedDate yyyy
Spark 2018-08-14 2018
PySpark 2019-10-17 2019
Hadoop 2020-11-14 2020
Python 2020-05-17 2020
Pandas 2021-09-15 2021
Hadoop 2021-12-14 2021
Frequently Asked Questions on Extract Year from Datetime
You can use the dt
attribute in Pandas to extract the year from a datetime column. For example, df['year'] = df['datetime_column'].dt.year
you can extract the year without creating a new column by simply accessing the dt.year
attribute. For example, df['year'] = pd.to_datetime(df['datetime_column']).dt.year
If your DataFrame has a datetime index, you can use the year
attribute directly on the index. For example, df.set_index('timestamp', inplace=True)
df['year'] = df.index.year
Conclusion
In conclusion, this article has covered several techniques for extracting the year from a Pandas DateTime column. By exploring methods such as pandas.Series.dt.strftime()
, pandas.DatetimeIndex()
, datetime.to_period()
, and DataFrame.apply()
, you now have a comprehensive understanding of how to perform this task efficiently.
Happy Learning !!
Related Articles
- How to Format Pandas Datetime?
- Pandas DatetimeIndex Usage Explained
- Convert Pandas DatetimeIndex to String
- pandas Convert Datetime to Seconds
- Sort Pandas DataFrame by Date (Datetime)
- Pandas Get Day, Month and Year from DateTime
- Pandas Extract Month and Year from Datetime
- Pandas Convert Integer to Datetime Type
- Pandas Convert Datetime to Date Column
- Pandas Convert Date (datetime) to String Format