Pandas Extract Year from Datetime

You can extract year from the DateTime (date) column in pandas in several ways. In this article, I will explain how to get a year from the Datetime column using pandas.Series.dt.year, pandas.DatetimeIndex properties and strftime() functions.

If the data is not in Datetime type, you need to convert it first to Datetime by using the pd.to_datetime() method.

1. Quick Examples of Extract Year from Datetime

If you are in a hurry, below are some quick examples of how to extract the year from the pandas DataFrame DateTime column.


# Use Datetime.strftime() Method to extract year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')

# Using pandas.Series.dt.year()
df['Year'] = df['InsertedDate'].dt.year  

# Using pandas.DatetimeIndex() to extract year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year

# Use datetime.to_period() method to extract year
df['Month_Year'] = df['InsertedDate'].dt.to_period('y')

# Use DataFrame.apply() with lambda function and strftime()
df['Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%Y')) 

# Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y')

2. Pandas Extract Year using Datetime.strftime()

strftime() method takes the datetime format and returns a string representing the specific format. You can use %Y as format code to extract the year from the DataFrame. Here, pd.to_datetime() is used to convert String to Datetime.


import pandas as pd
import numpy as np
import datetime
Dates = ["2018-08-14","2019-10-17","2020-11-14","2020-05-17","2021-09-15","2021-12-14"]
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop"]
df = pd.DataFrame({'InsertedDate': pd.to_datetime(Dates)},index=Courses)

# Use Datetime.strftime() Method to extract year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')
print(df)

Yields below output. This example extracts the year and add as a new column to DataFrame.

Pandas extract year datetime
Extract year column from Pandas DataFrame

3. Extract Year Using Series.dt.year()

We can use pandas.Series.dt.year() to extract year but, this function returns a series object. Assign these to a column to get a DataFrame with year columns.


# Using pandas.Series.dt.year()
df['Year'] = df['InsertedDate'].dt.year 
print(df)

Yields below output.


# Output:
        InsertedDate  Year
Spark     2018-08-14  2018
PySpark   2019-10-17  2019
Hadoop    2020-11-14  2020
Python    2020-05-17  2020
Pandas    2021-09-15  2021
Hadoop    2021-12-14  2021

4. Use Pandas DatetimeIndex() to Extract Year

We can also extract the year from the Pandas Datetime column, using DatetimeIndex.year attribute. Note that this method takes a date as an argument.


# Using pandas.DatetimeIndex() to extract year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year
print(df)

Yields the same output as above.

5. Use Datetime.to_period() Method to Extract Year

You can also use df['Year']=df['InsertedDate'].dt.to_period('Y') method. The df['date_column'] has to be in datetime format.


# Use datetime.to_period() method to year
df['Year'] = df['InsertedDate'].dt.to_period('Y')
print(df)

Yields below output.


# Output:
        InsertedDate  Year
Spark     2018-08-14  2018
PySpark   2019-10-17  2019
Hadoop    2020-11-14  2020
Python    2020-05-17  2020
Pandas    2021-09-15  2021
Hadoop    2021-12-14  2021

6. Use DataFrame.apply() With Lambda Function and strftime()

Let’s see how to get the year by using Pandas DataFrame.apply() and lambda function.


# Use DataFrame.apply() with lambda function and strftime()
df['Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%Y')) 
print(df)

Yields below output.


# Output:
        InsertedDate  Year
Spark     2018-08-14  2018
PySpark   2019-10-17  2019
Hadoop    2020-11-14  2020
Python    2020-05-17  2020
Pandas    2021-09-15  2021
Hadoop    2021-12-14  2021

7. Use Pandas.to_datetime() and datetime.strftime() Method

Use Pandas.to_datetime() and datetime.strftime() to get year.


# Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y')
print(df)

Yields below output.


# Output:
        InsertedDate  yyyy
Spark     2018-08-14  2018
PySpark   2019-10-17  2019
Hadoop    2020-11-14  2020
Python    2020-05-17  2020
Pandas    2021-09-15  2021
Hadoop    2021-12-14  2021

8. Conclusion

In this article, you have learned how to extract year from the pandas DateTime column by using pandas.Series.dt.strftime(), pandas.DatetimeIndex(), datetime.to_period() and DataFrame.apply() methods with examples.

Happy Learning !!

References

Leave a Reply