Site icon Spark By {Examples}

Pandas Extract Month and Year from Datetime

pandas datetime month year

You can extract the month and year from a datetime column in a pandas DataFrame using the dt.month and dt.year attributes of the datetime column. If the data is not already in datetime format, you need to convert it using pd.to_datetime() before extracting the month and year. In this article, I will cover extracting the year and month from the DataFrame column using pandas.DatetimeIndex property and strftime() method.

key points –

Pandas Extract Month and Year using Datetime.strftime()

To run some examples of extracting month and year from DateTime, let’s create a Pandas DataFrame with a column of DateTime values and convert this column into a datetime column using the pd.to_datetime() function, and finally, use the strftime() method to extract the month and year from a datetime column.


import pandas as pd
import numpy as np
import datetime
Dates = ["2018-08-14","2019-10-17","2020-11-14","2020-05-17","2021-09-15","2021-12-14"]
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop"]
df = pd.DataFrame({'InsertedDate': pd.to_datetime(Dates)},index=Courses)
print("DataFrame:\n", df)

Yields below output.

pandas datetime month year

The strftime() method in Python’s datetime module (as well as in pandas’ datetime functionality) takes a datetime object and returns a string representing that datetime according to a specific format. You can use %Y and %m as format codes to extract year and month respectively from the pandas DataFrame.


# Use Datetime.strftime() method 
# To extract month and year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')
df['Month'] = df['InsertedDate'].dt.strftime('%m')
print("Get month and year from datetime column:\n", df)

Yields below output.

pandas datetime month year

Using Pandas.Series.dt.year() & month()

Alternatively, The pd.Series.dt.year() and pd.Series.dt.month() attributes return a Series object containing the extracted year and month values, respectively. By assigning these Series objects to new columns in the DataFrame, we can construct a DataFrame with separate columns for year and month.


# Using pandas.Series.dt.year() & pandas.Series.dt.month() method
df['Year'] = df['InsertedDate'].dt.year 
df['Month'] = df['InsertedDate'].dt.month 
print("Get month and year from datetime column:\n", df)

This code assigns the Series objects returned by pd.Series.dt.year() and pd.Series.dt.month() to new columns Year and Month, respectively, creating a DataFrame with separate columns for year and month. This example yields the below output.


# Output:
# Get month and year from datetime column:
        InsertedDate  Year  Month
Spark     2018-08-14  2018      8
PySpark   2019-10-17  2019     10
Hadoop    2020-11-14  2020     11
Python    2020-05-17  2020      5
Pandas    2021-09-15  2021      9
Hadoop    2021-12-14  2021     12

Use pandas DatetimeIndex()

Also, the DatetimeIndex.month and DatetimeIndex.year attributes can be used to extract the month and year, respectively, from a pandas Datetime column. These attributes take a date as an argument and return the corresponding month or year.


# Using pandas.DatetimeIndex() to extract month and year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year
df['month'] = pd.DatetimeIndex(df['InsertedDate']).month
print("Get month and year from datetime column:\n", df)

In the above examples, extract the month and year using DatetimeIndex.month and DatetimeIndex.year. It assigns the extracted values to new columns month and year, respectively, creating a DataFrame with separate columns for the month and year. Yields the same output as above.

Use Datetime.to_period() Method

You can also use the to_period() method to extract the month and year from the DateTime column and assign the result to a new column called Month_Year.


# Use datetime.to_period() method to extract month and year
df['Month_Year'] = df['InsertedDate'].dt.to_period('M')
print("Get month and year from datetime column:\n", df)

In the above examples, to create a new column Month_Year containing the period (in this case, the month and year) extracted from the InsertedDate column using the to_period() method. The resulting DataFrame now has the month and year represented as a Period object in the Month_Year column. Yields the same output as above.


# Output:
# Get month and year from datetime column:
        InsertedDate   Month_Year
Spark     2018-08-14    2018-08
PySpark   2019-10-17    2019-10
Hadoop    2020-11-14    2020-11
Python    2020-05-17    2020-05
Pandas    2021-09-15    2021-09
Hadoop    2021-12-14    2021-12

Use DataFrame.apply() With Lambda Function and strftime()

You can also use the apply() function along with a lambda function and strftime() to format the datetime column and extract the month and year.


# Use DataFrame.apply() with lambda function and strftime()
df['Month_Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%B-%Y')) 
print("Get month and year from datetime column:\n", df)

In the above example, you use the apply() function to apply a lambda function to each element of the InsertedDate column. Inside the lambda function, strftime('%B-%Y') formats each datetime value to display the full month name followed by the year (e.g., August-2018).


# Output:
# Get month and year from datetime column:
      InsertedDate      Month_Year
Spark     2018-08-14     August-2018
PySpark   2019-10-17    October-2019
Hadoop    2020-11-14   November-2020
Python    2020-05-17        May-2020
Pandas    2021-09-15  September-2021
Hadoop    2021-12-14   December-2021

Use Pandas.to_datetime() & datetime.strftime() Method

Similarly, you can use pd.to_datetime() to convert the DateTime column and then apply strftime() method to format it as desired.


# Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy-mm'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y-%m')
print("Get month and year from datetime column:\n", df) 

The above code correctly extracts the year and month from the InsertedDate column using pd.to_datetime() to convert it to a DateTime object, and then strftime('%Y-%m') to format it as yyyy-mm.


# Output:
# Get month and year from datetime column: 
        InsertedDate  yyyy-mm
Spark     2018-08-14  2018-08
PySpark   2019-10-17  2019-10
Hadoop    2020-11-14  2020-11
Python    2020-05-17  2020-05
Pandas    2021-09-15  2021-09
Hadoop    2021-12-14  2021-12

Complete the Example to Get Month and Year from Panda Datetime


import pandas as pd
import numpy as np
import datetime
Dates = ["2018-08-14","2019-10-17","2020-11-14","2020-05-17","2021-09-15","2021-12-14"]
Courses =["Spark","PySpark","Hadoop","Python","Pandas","Hadoop"]
df = pd.DataFrame({'InsertedDate': pd.to_datetime(Dates)},index=Courses)

# Use Datetime.strftime() method to extract month and year
df['Year'] = df['InsertedDate'].dt.strftime('%Y')
df['Month'] = df['InsertedDate'].dt.strftime('%m')
print(df)

# Using pandas.Series.dt.year() & pandas.Series.dt.month() method
df['Year'] = df['InsertedDate'].dt.year 
df['Month'] = df['InsertedDate'].dt.month 
print(df)

# Using pandas.DatetimeIndex() to extract month and year
df['year'] = pd.DatetimeIndex(df['InsertedDate']).year
df['month'] = pd.DatetimeIndex(df['InsertedDate']).month
print(df)

# Use datetime.to_period() method to extract month and year
df['Month_Year'] = df['InsertedDate'].dt.to_period('M')
print(df)

# Use DataFrame.apply() with lambda function and strftime()
df['Month_Year'] = df['InsertedDate'].apply(lambda x: x.strftime('%B-%Y')) 
print(df)

# Use Pandas.to_datetime() and datetime.strftime() method
df['yyyy-mm'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y-%m')
print(df)

# Use datetime.strftime() method
df['yyyy'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%Y')
df['mm'] = pd.to_datetime(df['InsertedDate']).dt.strftime('%m')
print(df)

FAQ on Get Month and Year from Panda Datetime

How do I convert a column to a datetime type in Pandas?

You can use the pd.to_datetime() function to convert a column to a datetime type. For example, df['datetime_column'] = pd.to_datetime(df['datetime_column'])

How can I extract the month and year from a Pandas datetime column?

You can use the dt.month and dt.year attributes to get the month and year from the DateTime column For example, df['month'] = df['datetime_column'].dt.month and df['year'] = df['datetime_column'].dt.year

How can I extract the day or other components from a datetime column?

You can extract various components like day, hour, minute, etc., using the dt attribute. For example, df['datetime_column'].dt.day for a day and df['datetime_column'].dt.hour for an hour.

How do I handle missing or null values in the datetime column?

You can use the pd.to_datetime() function with the errors='coerce' parameter to handle missing or invalid values by converting them to NaT (Not a Time). For example, df['datetime_column'] = pd.to_datetime(df['datetime_column'], errors='coerce')

How can I filter the DataFrame based on a specific month or year?

You can filter the DataFrame based on a specific month or year. For example, to filter for January. For example, january_data = df[df['datetime_column'].dt.month == 1]

Conclusion

In this article, I have explained how to extract month and year separately from the Pandas DateTime column by using pandas.Series.dt.strftime(), pandas.DatetimeIndex(), datetime.to_period(), and DataFrame.apply() methods with examples.

Happy Learning !!

References

Exit mobile version