Pandas Change String Object to Date in DataFrame

Pandas Change DataFrame Column Type From String to Date type datetime64[ns] Format – You can change the pandas DataFrame column type from string to date format by using pandas.to_datetime() and DataFrame.astype() method.

In this article, I will explain how to change the string column to date format, change multiple string columns to date format, and finally change all string columns that have date string to date time column.

Note that if dates are not in date format, you cannot execute any time-series based operations on the dates hence, conversion is required.

1. Quick Examples of Change String to Date in Pandas DataFrame

If you are in a hurry, below are some quick examples of how to change pandas DataFrame columns type from string to date format (date type datetime64[ns].


# Below are quick example

# Use pandas.to_datetime() to convert string to datetime format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"])

# Convert and store in another column
df["NewColumn"] = pd.to_datetime(df["InsertedDate"])

# Using DataFrame.astype() function
df["InsertedDate"] = df["InsertedDate"].astype('datetime64[ns]')

# Convert the data type of column 'Date' from string (YYYY/MM/DD) to datetime64
df["InsertedDates"] =  pd.to_datetime(df["InsertedDates"], format="%Y/%m/%d")

# Use pandas.to_datetime() to convert string to "yyyymmdd" format
df["InsertedDates"] = pd.to_datetime(df["InsertedDates"], format='%y%m%d')

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names CoursesFeeDurationDiscount, and InsertedDate.


import pandas as pd
technologies   = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    "InsertedDate":["2020/11/14","17/11/2020","17-11-2020","2021-11-17","11/14/2021"]
          })
df = pd.DataFrame(technologies)
print(df)
print(df.dtypes)

Yields below output.



   Courses    Fee InsertedDate
0    Spark  22000    2020/11/14
1  PySpark  25000    17/11/2020
2   Hadoop  23000    17-11-2020
3   Python  24000    2021-11-17
4   Pandas  26000    11/14/2021
Courses          object
Fee               int64
InsertedDate    object
dtype: object

2. Use pandas.to_datetime() to Change String to Date

pandas.to_datetime() method is used to change String/Object time to date type (datetime64[ns]). This method is smart enough to change different formats of the String date column to date.


# Use pandas.to_datetime() to convert string to datetime format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"])
print(df)
print (df.dtypes)

Yields below output. Note that the dtype of InsertedDate column changed to datetime64[ns] from object type.


   Courses    Fee InsertedDate
0    Spark  22000    2020-11-14
1  PySpark  25000    2020-11-17
2   Hadoop  23000    2020-11-17
3   Python  24000    2021-11-17
4   Pandas  26000    2021-11-14
Courses                  object
Fee                       int64
InsertedDate    datetime64[ns]
dtype: object

In order to create a new column in pandas instead of updating the existing one.


# Convert and store in different column
df["NewColumn"] = pd.to_datetime(df["InsertedDate"])

You can also specify the input format param to to_datetime()


# Specify input format
df["InsertedDate"] =  pd.to_datetime(df["InsertedDate"], format="%Y/%m/%d")

3. Using DataFrame.astype() to Change String to Date

DataFrame.astype() method is also used to change string to date dtype (datetime64[ns]). Note that astype() is used to cast any datatype (dtype) in pandas.


# Using DataFrame.astype() function
df["InsertedDate"] = df["InsertedDate"].astype('datetime64[ns]')
print (df.dtypes)

Yields same output as above.

5. Use pandas.to_datetime() to change String to “yyyymmdd” Format

If You have a date in "yymmdd" format in the DataFrame column, and to change it from a string to a date(‘yyyy-mm-dd’) format.


import pandas as pd
technologies = [["211114",22000],["211117",25000],["211115",26000],
            ["211117",30000],['211114',35000]]
df = pd.DataFrame(technologies,columns=["InsertedDate",'Fee'])
print(df)
print(df.dtypes)

Yields below output.


  InsertedDate    Fee
0        211114  22000
1        211117  25000
2        211115  26000
3        211117  30000
4        211114  35000
InsertedDate    object
Fee               int64
dtype: object

You see that the Datatype of the "InsertedDate" column in the DataFrame is "object", that means, it is a string. Now, convert the datatype into datetime(‘yyyy-mm-dd’) format by using df["InsertedDate"] = pd.to_datetime(df["InsertedDate"],format='%y%m%d') function.


# Use pandas.to_datetime() to convert string to "yyyymmdd" format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"], format='%y%m%d')
print(df)
print(df.dtypes)

Yields below output.


  InsertedDate    Fee
0    2021-11-14  22000
1    2021-11-17  25000
2    2021-11-15  26000
3    2021-11-17  30000
4    2021-11-14  35000
InsertedDate    datetime64[ns]
Fee                       int64
dtype: object

6. Change Multiple Columns from string Using pandas.to_datetime()

You can convert multiple columns from "string" to "date" format, which means "YYYYMMDD" format, by using the "pandas.to_datetime()" function.


import pandas as pd
technologies = [["20211114","Spark","20211115"],["20211117","PySpark","20211118"],["20211115","Hadoop","20211115"],
            ["20211117","Python","20211119"],['20211114',"Pandas","20211117"]]
df = pd.DataFrame(technologies,columns=["InsertedDate",'Courses',"UpdatedDate"])
print(df)
print(df.dtypes)

Conver multiple string columns to Date.


#  Using pandas.to_datetime() to convert multiple columns from string
df["InsertedDate"] = pd.to_datetime(
                          df["InsertedDate"],
                          format='%Y%m%d'
)
df["UpdatedDate"] = pd.to_datetime(
                          df["UpdatedDate"],
                          format='%Y%m%d'
)
print(df)
print(df.dtypes)

Yields below output.


  InsertedDate  Courses UpdatedDate
0    2021-11-14    Spark   2021-11-15
1    2021-11-17  PySpark   2021-11-18
2    2021-11-15   Hadoop   2021-11-15
3    2021-11-17   Python   2021-11-19
4    2021-11-14   Pandas   2021-11-17
InsertedDate    datetime64[ns]
Courses                  object
UpdatedDate     datetime64[ns]
dtype: object

7. Complete Example For Change String to Date in DataFrame


import pandas as pd
technologies   = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','55days','40days','60days'],
    'Discount':[1000,2300,1000,1200,2500],
    "InsertedDate":["2020/11/14","2020/11/17","2021/11/15","2021/11/17","2021/11/14"]
          })
df = pd.DataFrame(technologies)


# Use pandas.to_datetime() to convert string to datetime format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"])
print (df.dtypes)

# Check the format of 'InsertedDate' column
df.info()

# Using DataFrame.astype() function
df["InsertedDate"] = df["InsertedDate"].astype('datetime64[ns]')
print (df.dtypes)

# Convert the data type of column 'Date' from string (YYYY/MM/DD) to datetime64
df["InsertedDate"] =  pd.to_datetime(df["InsertedDate"], format="%Y/%m/%d")
print(df.dtypes)

# Use pandas.to_datetime() to convert string to "yyyymmdd" format
df["InsertedDate"] = pd.to_datetime(df["InsertedDate"], format='%y%m%d')
print(df)
print(df.dtypes)

#  Using pandas.to_datetime() to convert multiple columns from string
df["InsertedDate"] = pd.to_datetime(
                          df["InsertedDate"],
                          format='%Y%m%d')
df["UpdatedDate"] = pd.to_datetime(
                          df["UpdatedDate"],
                          format='%Y%m%d')
print(df)
print(df.dtypes)

Conclusion

In this article, you have learned how to change pandas DataFrame column type from string to Date format by using pandas.to_datetime() & DataFrame.astype() function with examples.

Happy Learning !!

You May Also Like

References

Leave a Reply

You are currently viewing Pandas Change String Object to Date in DataFrame