• Post author:
  • Post category:Pandas
  • Post last modified:May 9, 2024
  • Reading time:18 mins read
You are currently viewing Pandas Extract Month and Year from Datetime

You can extract the month and year from a datetime column in a pandas DataFrame using the dt.month and dt.year attributes of the datetime column. If the data is not already in DateTime format, you need to convert it using pd.to_datetime() before extracting the month and year. In this article, I will cover extracting the year and month from the DataFrame column using pandas.DatetimeIndex property and strftime() method.

Advertisements

key points –

  • Use pd.to_datetime() to convert datetime columns from string format to datetime format.
  • Access the month and year attributes of a datetime column using dt.month and dt.year.
  • Utilize the strftime() method to extract specific date components like month and year as strings.
  • Extracting the month and year separately indeed facilitates easier analysis and manipulation of temporal data.
  • DateTime operations in pandas facilitate efficient data handling and analysis, particularly in time series datasets.

Pandas Extract Month and Year using Datetime.strftime()

To run some examples of extracting month and year from DateTime, let’s create DataFrame with a column of DateTime values and convert this column into a DateTime column using the pd.to_datetime() function, and finally, use the strftime() method to extract the month and year from a DateTime column.


import pandas as pd
import numpy as np
import datetime
Dates = ["2018-08-14","2019-10-17","2020-11-14","2020-05-17","2021-09-15","2021-12-14"]
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop"]
df = pd.DataFrame({'InsertedDate': pd.to_datetime(Dates)},index=Courses)
print("DataFrame:\n", df)

Yields below output.

pandas datetime month year

The strftime() method in Python’s datetime module (as well as in pandas’ datetime functionality) takes a datetime object and returns a string representing that datetime according to a specific format. You can use %Y and %m as format codes to extract year and month respectively from the pandas DataFrame.


# Use Datetime.strftime() method 
# To extract month and year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')
df['Month'] = df['InsertedDate'].dt.strftime('%m')
print("Get month and year from datetime column:\n", df)

Yields below output.

pandas datetime month year

Using Pandas.Series.dt.year() & month()

Alternatively, pd.Series.dt.year() and pd.Series.dt.month() attributes return a Series object containing the extracted year and month values, respectively. By assigning these Series objects to new columns in the DataFrame, we can construct a DataFrame with separate columns for year and month.


# Using pandas.Series.dt.year() & pandas.Series.dt.month() method
df['Year'] = df['InsertedDate'].dt.year 
df['Month'] = df['InsertedDate'].dt.month 
print("Get month and year from datetime column:\n", df)

This code assigns the Series objects returned by pd.Series.dt.year() and pd.Series.dt.month() to new columns Year and Month, respectively, creating a DataFrame with separate columns for year and month. This example yields the below output.


# Output:
# Get month and year from datetime column:
        InsertedDate  Year  Month
Spark     2018-08-14  2018      8
PySpark   2019-10-17  2019     10
Hadoop    2020-11-14  2020     11
Python    2020-05-17  2020      5
Pandas    2021-09-15  2021      9
Hadoop    2021-12-14  2021     12

Use pandas DatetimeIndex()

Also, the DatetimeIndex.month and DatetimeIndex.year attributes can be used to extract the month and year, respectively, from a pandas Datetime column. These attributes take a date as an argument and return the corresponding month or year.


# Using pandas.DatetimeIndex() to extract month and year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year
df['month'] = pd.DatetimeIndex(df['InsertedDate']).month
print("Get month and year from datetime column:\n", df)

In the above examples, extract the month and year using DatetimeIndex.month and DatetimeIndex.year. It assigns the extracted values to new columns month and year, respectively, creating a DataFrame with separate columns for the month and year. Yields the same output as above.

Use Datetime.to_period() Method

You can also use the to_period() method to extract the month and year from the DateTime column and assign the result to a new column called Month_Year.


# Use datetime.to_period() method to extract month and year
df['Month_Year'] = df['InsertedDate'].dt.to_period('M')
print("Get month and year from datetime column:\n", df)

In the above examples, to create a new column Month_Year containing the period (in this case, the month and year) extracted from the InsertedDate column using the to_period() method. The resulting DataFrame now has the month and year represented as a Period object in the Month_Year column. Yields the same output as above.


# Output:
# Get month and year from datetime column:
        InsertedDate   Month_Year
Spark     2018-08-14    2018-08
PySpark   2019-10-17    2019-10
Hadoop    2020-11-14    2020-11
Python    2020-05-17    2020-05
Pandas    2021-09-15    2021-09
Hadoop    2021-12-14    2021-12

Use DataFrame.apply() With Lambda Function and strftime()

You can also use the apply() function along with a lambda function and strftime() to format the datetime column and extract the month and year.


# Use DataFrame.apply() with lambda function and strftime()
df['Month_Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%B-%Y')) 
print("Get month and year from datetime column:\n", df)

In the above example, you use the apply() function to apply a lambda function to each element of the InsertedDate column. Inside the lambda function, strftime('%B-%Y') formats each datetime value to display the full month name followed by the year (e.g., August-2018).


# Output:
# Get month and year from datetime column:
      InsertedDate      Month_Year
Spark     2018-08-14     August-2018
PySpark   2019-10-17    October-2019
Hadoop    2020-11-14   November-2020
Python    2020-05-17        May-2020
Pandas    2021-09-15  September-2021
Hadoop    2021-12-14   December-2021

Use Pandas.to_datetime() & datetime.strftime() Method

Similarly, you can use pd.to_datetime() to convert the DateTime column and then apply strftime() method to format it as desired.


# Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy-mm'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y-%m')
print("Get month and year from datetime column:\n", df) 

The above code correctly extracts the year and month from the InsertedDate column using pd.to_datetime() to convert it to a DateTime object, and then strftime('%Y-%m') to format it as yyyy-mm.


# Output:
# Get month and year from datetime column: 
        InsertedDate  yyyy-mm
Spark     2018-08-14  2018-08
PySpark   2019-10-17  2019-10
Hadoop    2020-11-14  2020-11
Python    2020-05-17  2020-05
Pandas    2021-09-15  2021-09
Hadoop    2021-12-14  2021-12

Complete the Example to Get Month and Year from Panda Datetime


import pandas as pd
import numpy as np
import datetime
Dates = ["2018-08-14","2019-10-17","2020-11-14","2020-05-17","2021-09-15","2021-12-14"]
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop"]
df = pd.DataFrame({'InsertedDate': pd.to_datetime(Dates)},index=Courses)

# Use Datetime.strftime() method to extract month and year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')
df['Month'] = df['InsertedDate'].dt.strftime('%m')
print(df)

# Using pandas.Series.dt.year() & pandas.Series.dt.month() method
df['Year'] = df['InsertedDate'].dt.year 
df['Month'] = df['InsertedDate'].dt.month 
print(df)

# Using pandas.DatetimeIndex() to extract month and year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year
df['month'] = pd.DatetimeIndex(df['InsertedDate']).month
print(df)

# Use datetime.to_period() method to extract month and year
df['Month_Year'] = df['InsertedDate'].dt.to_period('M')
print(df)

# Use DataFrame.apply() with lambda function and strftime()
df['Month_Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%B-%Y')) 
print(df)

# Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy-mm'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y-%m')
print(df)

# Use datetime.strftime() method
df['yyyy'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y')
df['mm'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%m')
print(df)

FAQ on Get Month and Year from Panda Datetime

How do I convert a column to a datetime type in Pandas?

To convert a column to a datetime type in Pandas, you can use the pd.to_datetime() function. For example, df['datetime_column'] = pd.to_datetime(df['datetime_column'])

How can I extract the month and year from a Pandas datetime column?

You can use the dt.month and dt.year attributes to get the month and year from the DateTime column For example, df['month'] = df['datetime_column'].dt.month and df['year'] = df['datetime_column'].dt.year

How can I extract the day or other components from a datetime column?

You can extract various components like day, hour, minute, etc., using the dt attribute. For instance, df['datetime_column'].dt.day for a day and df['datetime_column'].dt.hour for an hour.

How do I handle missing or null values in the datetime column?

You can use the pd.to_datetime() function with the errors='coerce' parameter to handle missing or invalid values by converting them to NaT (Not a Time). For example, df['datetime_column'] = pd.to_datetime(df['datetime_column'], errors='coerce')

How can I filter the DataFrame based on a specific month or year?

You can filter the DataFrame based on a specific month or year. For instance, to filter for January. For example, january_data = df[df['datetime_column'].dt.month == 1]

Conclusion

In this article, I have explained how to extract month and year separately from the Pandas DateTime column by using pandas.Series.dt.strftime(), pandas.DatetimeIndex(), datetime.to_period(), and apply() functions with examples.

Happy Learning !!

References

Leave a Reply

This Post Has One Comment

  1. Sait

    Really good!