To convert a pandas DataFrame column from string to date type (datetime64[ns])
format, you can use the pandas.to_datetime()
function or the DataFrame.astype()
method. In this article, I will explain change the string column to date format, change multiple string columns to date format, and finally change all string columns that have date string to date time column.
Note that if dates are not in date format, you cannot execute any time-series-based operations on the dates hence, conversion is required.
Key Points –
- The primary function for converting string data to datetime objects in Pandas.
pd.to_datetime()
can often infer the correct date format from a variety of string representations.- You can specify a date format using the
format
parameter inpd.to_datetime()
for consistent conversion. pd.to_datetime()
can handle columns with mixed date formats, but may require careful handling to avoid errors.- You can apply
pd.to_datetime()
to a single column or multiple columns in a DataFrame for date conversion.
Quick Examples of Change String to Date in Pandas DataFrame
Following are quick examples of changing the DataFrame columns type from string to date format (date type datetime64[ns]
).
# Quick examples of change string to date
# Example 1: Use pandas.to_datetime()
# To convert string to datetime format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"])
# Example 2: Convert and store in another column
df["NewColumn"] = pd.to_datetime(df["InsertedDate"])
# Example 3: Using DataFrame.astype() function
df["InsertedDate"] = df["InsertedDate"].astype('datetime64[ns]')
# Example 4: Convert the data type of column 'Date' from string (YYYY/MM/DD)
# To datetime64
df["InsertedDates"] = pd.to_datetime(df["InsertedDates"], format="%Y/%m/%d")
# Example 5: Use pandas.to_datetime() to convert string
# To "yyyymmdd" format
df["InsertedDates"] = pd.to_datetime(df["InsertedDates"], format='%y%m%d')
To run some examples of changing string to ate in Pandas DataFrame, let’s create a Pandas DataFrame.
# Create DataFrame
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[22000,25000,23000,24000,26000],
"InsertedDate":["2020/11/14","17/11/2020","17-11-2020","2021-11-17","11/14/2021"]
})
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
print("----------------------------------")
print("Type of the columns:\n", df.dtypes)
Yields below output.
Use pandas.to_datetime() to Change String to Date
pandas.to_datetime() method is used to change String/Object time to date type (datetime64[ns]). This method is smart enough to change different formats of the String date column to date.
Related: You can also convert Date (datetime) to String Format.
# Use pandas.to_datetime() to convert string to datetime format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"])
print("After converting a string type to datetime"\n:, df)
print("----------------------------------")
print("Type of the columns:\n", df.dtypes)
Note that the dtype
of InsertedDate
column changed to datetime64[ns]
from object
type.
In order to create a new column in pandas instead of updating the existing one.
# Convert and store in different column
df["NewColumn"] = pd.to_datetime(df["InsertedDate"])
You can also specify the input format param to to_datetime()
# Specify input format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"], format="%Y/%m/%d")
Using DataFrame.astype() to Change String to Date
DataFrame.astype()
method is also used to change string to date dtype (datetime64[ns]
). Note that astype() is used to cast any datatype (dtype) in pandas.
# Using DataFrame.astype() function
df["InsertedDate"] = df["InsertedDate"].astype('datetime64[ns]')
print("After converting a string type to datetime"\n:, df)
print("----------------------------------")
print("Type of the columns:\n", df.dtypes)
Use pandas.to_datetime() to change String to “yyyymmdd” Format
If You have a date in "yymmdd"
format in the DataFrame column, and to change it from a string to a date(‘yyyy-mm-dd’)
format.
import pandas as pd
technologies = [["211114",22000],["211117",25000],["211115",26000],
["211117",30000],['211114',35000]]
df = pd.DataFrame(technologies,columns=["InsertedDate",'Fee'])
print("Create DataFrame:\n", df)
print("---------------------------")
print("Type of the columns:\n", df.dtypes)
Output.
# Output:
# Create DataFrame:
InsertedDate Fee
0 211114 22000
1 211117 25000
2 211115 26000
3 211117 30000
4 211114 35000
--------------------------------
Type of the columns:
InsertedDate object
Fee int64
dtype: object
You can see that the Datatype of the "InsertedDate"
column in the DataFrame is "object"
, that means, it is a string. Now, convert the datatype into datetime(‘yyyy-mm-dd’)
format by using df["InsertedDate"] = pd.to_datetime(df["InsertedDate"],format='%y%m%d')
function.
# Use pandas.to_datetime() to convert string to "yyyymmdd" format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"], format='%y%m%d')
print("After converting a string type to datetime"\n:, df)
print("----------------------------------")
print("Type of the columns:\n", df.dtypes)
Output.
# Output:
# After converting a string type to datetime:
InsertedDate Fee
0 2021-11-14 22000
1 2021-11-17 25000
2 2021-11-15 26000
3 2021-11-17 30000
4 2021-11-14 35000
# Type of the columns:
InsertedDate datetime64[ns]
Fee int64
dtype: object
Change Multiple Columns from string Using pandas.to_datetime()
Alternatively, you can convert multiple columns from string type to date format, which means "YYYYMMDD"
format, by using the "pandas.to_datetime()"
function.
# Create DataFrame
import pandas as pd
technologies = [["20211114","Spark","20211115"],["20211117","PySpark","20211118"],["20211115","Hadoop","20211115"],
["20211117","Python","20211119"],['20211114',"Pandas","20211117"]]
df = pd.DataFrame(technologies,columns=["InsertedDate",'Courses',"UpdatedDate"])
print("Create DataFrame:\n", df)
print("----------------------------------")
print("Type of the columns:\n", df.dtypes)
Convert multiple string columns to Date.
# Using pandas.to_datetime()
# To convert multiple columns from string
df["InsertedDate"] = pd.to_datetime(
df["InsertedDate"],
format='%Y%m%d'
)
df["UpdatedDate"] = pd.to_datetime(
df["UpdatedDate"],
format='%Y%m%d'
)
print("After converting a string type to datetime:\n", df)
print("----------------------------------")
print("Type of the columns:\n", df.dtypes)
Output.
# Output:
# After converting a string type to datetime
InsertedDate Courses UpdatedDate
0 2021-11-14 Spark 2021-11-15
1 2021-11-17 PySpark 2021-11-18
2 2021-11-15 Hadoop 2021-11-15
3 2021-11-17 Python 2021-11-19
4 2021-11-14 Pandas 2021-11-17
-----------------------------------------
Type of the columns:
InsertedDate datetime64[ns]
Courses object
UpdatedDate datetime64[ns]
dtype: object
Complete Example for Change String to Date
import pandas as pd
technologies = ({
'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','50days','55days','40days','60days'],
'Discount':[1000,2300,1000,1200,2500],
"InsertedDate":["2020/11/14","2020/11/17","2021/11/15","2021/11/17","2021/11/14"]
})
df = pd.DataFrame(technologies)
# Use pandas.to_datetime()
# To convert string to datetime format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"])
print (df.dtypes)
# Check the format of 'InsertedDate' column
df.info()
# Using DataFrame.astype() function
df["InsertedDate"] = df["InsertedDate"].astype('datetime64[ns]')
print (df.dtypes)
# Convert the data type of column 'Date' from string (YYYY/MM/DD)
# To datetime64
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"], format="%Y/%m/%d")
print(df.dtypes)
# Use pandas.to_datetime()
# To convert string to "yyyymmdd" format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"], format='%y%m%d')
print(df)
print(df.dtypes)
# Using pandas.to_datetime()
# To convert multiple columns from string
df["InsertedDate"] = pd.to_datetime(
df["InsertedDate"],
format='%Y%m%d')
df["UpdatedDate"] = pd.to_datetime(
df["UpdatedDate"],
format='%Y%m%d')
print(df)
print(df.dtypes)
FAQs on Change String to Date Format
If your date format differs, you can specify the format using the format
parameter. For example, if your date is in the format ‘dd-mm-yyyy’. For example, df['date_format'] = pd.to_datetime(df['string_column'], format='%d-%m-%Y')
You can convert multiple string columns to date columns simultaneously by applying pd.to_datetime()
to each column. For example, df[['date_format1', 'date_format2']] = df[['string_column1', 'string_column2']].apply(pd.to_datetime)
you can convert a date column back to a string using the dt.strftime()
method. For example, df['string_column'] = df['date_format'].dt.strftime('%Y-%m-%d')
Once you have a datetime column, you can extract specific components using the dt
accessor. For example, df['day'] = df['date'].dt.day
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
Conclusion
In conclusion, this article has changed Pandas DataFrame column type from string to Date format by using pandas.to_datetime()
& DataFrame.astype()
functions.
Happy Learning !!
Related Articles
- Pandas DatetimeIndex Usage Explained
- Convert Pandas DatetimeIndex to String
- pandas Convert Datetime to Seconds
- Sort Pandas DataFrame by Date (Datetime)
- Pandas Extract Year from Datetime
- Pandas Convert Integer to Datetime Type
- Pandas Convert Datetime to Date Column
- Pandas Convert Column To DateTime