You can extract the month and year from a datetime column in a pandas DataFrame using the dt.month
and dt.year
attributes of the datetime column. If the data is not already in DateTime format, you need to convert it using pd.to_datetime() before extracting the month and year. In this article, I will cover extracting the year and month from the DataFrame column using pandas.DatetimeIndex property and strftime()
method.
key points –
- Use
pd.to_datetime()
to convert datetime columns from string format to datetime format. - Access the month and year attributes of a datetime column using
dt.month
anddt.year
. - Utilize the
strftime()
method to extract specific date components like month and year as strings. - Extracting the month and year separately indeed facilitates easier analysis and manipulation of temporal data.
- DateTime operations in pandas facilitate efficient data handling and analysis, particularly in time series datasets.
Pandas Extract Month and Year using Datetime.strftime()
To run some examples of extracting month and year from DateTime, let’s create DataFrame with a column of DateTime values and convert this column into a DateTime column using the pd.to_datetime()
function, and finally, use the strftime()
method to extract the month and year from a DateTime column.
import pandas as pd
import numpy as np
import datetime
Dates = ["2018-08-14","2019-10-17","2020-11-14","2020-05-17","2021-09-15","2021-12-14"]
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop"]
df = pd.DataFrame({'InsertedDate': pd.to_datetime(Dates)},index=Courses)
print("DataFrame:\n", df)
Yields below output.
The strftime()
method in Python’s datetime module (as well as in pandas’ datetime functionality) takes a datetime object and returns a string representing that datetime according to a specific format. You can use %Y
and %m
as format codes to extract year and month
respectively from the pandas DataFrame.
# Use Datetime.strftime() method
# To extract month and year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')
df['Month'] = df['InsertedDate'].dt.strftime('%m')
print("Get month and year from datetime column:\n", df)
Yields below output.
Using Pandas.Series.dt.year() & month()
Alternatively, pd.Series.dt.year()
and pd.Series.dt.month()
attributes return a Series object containing the extracted year and month values, respectively. By assigning these Series objects to new columns in the DataFrame, we can construct a DataFrame with separate columns for year and month.
# Using pandas.Series.dt.year() & pandas.Series.dt.month() method
df['Year'] = df['InsertedDate'].dt.year
df['Month'] = df['InsertedDate'].dt.month
print("Get month and year from datetime column:\n", df)
This code assigns the Series objects returned by pd.Series.dt.year()
and pd.Series.dt.month()
to new columns Year
and Month
, respectively, creating a DataFrame with separate columns for year and month. This example yields the below output.
# Output:
# Get month and year from datetime column:
InsertedDate Year Month
Spark 2018-08-14 2018 8
PySpark 2019-10-17 2019 10
Hadoop 2020-11-14 2020 11
Python 2020-05-17 2020 5
Pandas 2021-09-15 2021 9
Hadoop 2021-12-14 2021 12
Use pandas DatetimeIndex()
Also, the DatetimeIndex.month
and DatetimeIndex.year
attributes can be used to extract the month and year, respectively, from a pandas Datetime column. These attributes take a date as an argument and return the corresponding month or year.
# Using pandas.DatetimeIndex() to extract month and year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year
df['month'] = pd.DatetimeIndex(df['InsertedDate']).month
print("Get month and year from datetime column:\n", df)
In the above examples, extract the month and year using DatetimeIndex.month
and DatetimeIndex.year
. It assigns the extracted values to new columns month
and year
, respectively, creating a DataFrame with separate columns for the month and year. Yields the same output as above.
Use Datetime.to_period() Method
You can also use the to_period()
method to extract the month and year from the DateTime column and assign the result to a new column called Month_Year
.
# Use datetime.to_period() method to extract month and year
df['Month_Year'] = df['InsertedDate'].dt.to_period('M')
print("Get month and year from datetime column:\n", df)
In the above examples, to create a new column Month_Year
containing the period (in this case, the month and year) extracted from the InsertedDate
column using the to_period()
method. The resulting DataFrame now has the month and year represented as a Period object in the Month_Year
column. Yields the same output as above.
# Output:
# Get month and year from datetime column:
InsertedDate Month_Year
Spark 2018-08-14 2018-08
PySpark 2019-10-17 2019-10
Hadoop 2020-11-14 2020-11
Python 2020-05-17 2020-05
Pandas 2021-09-15 2021-09
Hadoop 2021-12-14 2021-12
Use DataFrame.apply() With Lambda Function and strftime()
You can also use the apply() function along with a lambda function and strftime()
to format the datetime column and extract the month and year.
# Use DataFrame.apply() with lambda function and strftime()
df['Month_Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%B-%Y'))
print("Get month and year from datetime column:\n", df)
In the above example, you use the apply()
function to apply a lambda function to each element of the InsertedDate
column. Inside the lambda function, strftime('%B-%Y')
formats each datetime value to display the full month name followed by the year (e.g., August-2018
).
# Output:
# Get month and year from datetime column:
InsertedDate Month_Year
Spark 2018-08-14 August-2018
PySpark 2019-10-17 October-2019
Hadoop 2020-11-14 November-2020
Python 2020-05-17 May-2020
Pandas 2021-09-15 September-2021
Hadoop 2021-12-14 December-2021
Use Pandas.to_datetime() & datetime.strftime() Method
Similarly, you can use pd.to_datetime()
to convert the DateTime column and then apply strftime()
method to format it as desired.
# Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy-mm'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y-%m')
print("Get month and year from datetime column:\n", df)
The above code correctly extracts the year and month from the InsertedDate
column using pd.to_datetime()
to convert it to a DateTime object, and then strftime('%Y-%m')
to format it as yyyy-mm
.
# Output:
# Get month and year from datetime column:
InsertedDate yyyy-mm
Spark 2018-08-14 2018-08
PySpark 2019-10-17 2019-10
Hadoop 2020-11-14 2020-11
Python 2020-05-17 2020-05
Pandas 2021-09-15 2021-09
Hadoop 2021-12-14 2021-12
Complete the Example to Get Month and Year from Panda Datetime
import pandas as pd
import numpy as np
import datetime
Dates = ["2018-08-14","2019-10-17","2020-11-14","2020-05-17","2021-09-15","2021-12-14"]
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop"]
df = pd.DataFrame({'InsertedDate': pd.to_datetime(Dates)},index=Courses)
# Use Datetime.strftime() method to extract month and year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')
df['Month'] = df['InsertedDate'].dt.strftime('%m')
print(df)
# Using pandas.Series.dt.year() & pandas.Series.dt.month() method
df['Year'] = df['InsertedDate'].dt.year
df['Month'] = df['InsertedDate'].dt.month
print(df)
# Using pandas.DatetimeIndex() to extract month and year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year
df['month'] = pd.DatetimeIndex(df['InsertedDate']).month
print(df)
# Use datetime.to_period() method to extract month and year
df['Month_Year'] = df['InsertedDate'].dt.to_period('M')
print(df)
# Use DataFrame.apply() with lambda function and strftime()
df['Month_Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%B-%Y'))
print(df)
# Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy-mm'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y-%m')
print(df)
# Use datetime.strftime() method
df['yyyy'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y')
df['mm'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%m')
print(df)
FAQ on Get Month and Year from Panda Datetime
To convert a column to a datetime type in Pandas, you can use the pd.to_datetime()
function. For example, df['datetime_column'] = pd.to_datetime(df['datetime_column'])
You can use the dt.month
and dt.year
attributes to get the month and year from the DateTime column For example, df['month'] = df['datetime_column'].dt.month
and df['year'] = df['datetime_column'].dt.year
You can extract various components like day, hour, minute, etc., using the dt
attribute. For instance, df['datetime_column'].dt.day
for a day and df['datetime_column'].dt.hour
for an hour.
You can use the pd.to_datetime()
function with the errors='coerce'
parameter to handle missing or invalid values by converting them to NaT (Not a Time). For example, df['datetime_column'] = pd.to_datetime(df['datetime_column'], errors='coerce')
You can filter the DataFrame based on a specific month or year. For instance, to filter for January. For example, january_data = df[df['datetime_column'].dt.month == 1]
Conclusion
In this article, I have explained how to extract month and year separately from the Pandas DateTime column by using pandas.Series.dt.strftime()
, pandas.DatetimeIndex()
, datetime.to_period()
, and apply()
functions with examples.
Happy Learning !!
Related Articles
- Get Pandas DataFrame Columns by Data Type
- Pandas Extract Column Value Based on Another Column
- Pandas Filter DataFrame by Multiple Conditions
- Select Pandas DataFrame Rows Between Two Dates
- pandas Convert Datetime to Seconds
- Sort Pandas DataFrame by Date (Datetime)
- Pandas Get Day, Month and Year from DateTime
- pandas head() – Returns Top N Rows
- Pandas Convert Multiple Columns To DateTime Type
- Pandas Convert Column To DateTime
Really good!