DataFrame.astype()
function is used to cast a column data type (dtype) in pandas object, it supports String, flat, date, int, datetime any many other dtypes supported by Numpy. This comes in handy when you wanted to cast the DataFrame column from one data type to another.
Key Points –
- Use the
astype()
method to convert strings to integers in Pandas. - Ensure the string values represent valid integers, otherwise, the conversion will raise errors.
- Handle missing or non-numeric values appropriately to avoid conversion issues.
- Consider using
pd.to_numeric()
for more flexibility and error handling. - Validate the resulting integer data type to confirm successful conversion and compatibility with downstream analysis.
- Verify the converted integer dtype matches your expectations and data requirements.
DataFrame.astype() Syntax
Following is a syntax of the DataFrame.astype()
. This function takes dtype
, copy
, and errors
params.
# astype() Syntax
DataFrame.astype(dtype, copy=True, errors='raise')
Following are the parameters of astype()
function.
dtype
– Accepts a numpy.dtype or Python type to cast entire pandas object to the same type. Use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns.copy
-Default True. Return a copy whencopy=True
(be very careful settingcopy=False
as changes to values then may propagate to other pandas objects).errors
– Default raise.- Use ‘raise’ to generate exception when unable to cast due to invalid data for type.
- Use ‘ignore’ to not raise exception (supress errors/exceptions). On error return original object.
DataFrame.astype() – Cast All Columns Data Type (dtype)
By default pandas astype()
function tries to cast all DataFrame columns to specified numpy.dtype
or Python types (int, string, float, date, datetime). If any of the columns are unable to cast due to the invalid data or nan, it raises the error ‘ValueError: invalid literal’ and fails the operation.
The below example demonstrates casting all columns data types.
# DataFrame.astype() - Cast All Columns Data Type (dtype)
import pandas as pd
import numpy as np
# Create DataFrame from Dictionary
technologies = {
'Fee' :["20000","25000","26000"],
'Discount':["1000","2300","1500"]
}
df = pd.DataFrame(technologies)
print(df.dtypes)
# Output:
# Fee object
# Discount object
# dtype: object
DataFrame.dtypes returns the Column name and dtypes for all DataFrame columns. Note that the above DataFrame has object types for all columns.
Now let’s cast the data type to 64-bit signed integer, you can use numpy.int64
,numpy.int_
, int64
or int
as param. To cast to 32-bit signed integer, use numpy.int32
, int32
.
# Cast all columns to int
df = df.astype(np.int64)
df = df.astype('int64')
df = df.astype('int')
print(df.dtypes)
# output:
# Fee int64
# Discount int64
# dtype: object
Notice that it updated all columns with the new dtype.
Let’s cast it to String, using numpy.str_
or string
.
# Cast all columns to string
df = df.astype('string')
print(df.dtypes)
# Output:
# Fee string
# Discount string
# dtype: object
Let’s cast it to float type using numpy.float64
, numpy.float_
, float
# Cast all columns to float
df = df.astype('float')
print(df.dtypes)
# Output:
# Fee float
# Discount flat
# dtype: object
Change Specific Column Type
Similarly, you can also change the specific column type by using Series.astype()
function, since each column on DataFrame is pandas Series, I will get the column from DataFrame as Series and use astype()
function. In the below example df.Fee
or df['Fee']
returns Series object.
# Cast specific column type
df.Fee = df.Fee.astype('int')
(or)
df.Fee = df['Fee'].astype('int')
print(df.dtypes)
# Output:
# Fee int64
# Discount object
# dtype: object
astype() – Cast Multiple Columns Using Dict
dtype
param of the astype()
function also supports Dictionary in format {col: dtype, …} where col is a column label and dtype is a numpy.dtype
or Python type (int, string, float, date, datetime) to cast one or multiple DataFrame columns.
# Astype() - Cast Multiple Columns Using Dict
import pandas as pd
import numpy as np
# Create DataFrame from Dictionary
technologies = {
'Courses':["Spark","PySpark","Hadoop"],
'Fee' :["20000","25000","26000"],
'Duration':['30day','40days','35days'],
'Discount':["1000","2300","1500"]
}
df = pd.DataFrame(technologies)
print(df.dtypes)
# Output:
# Courses object
# Fee object
# Duration object
# Discount object
# dtype: object
Now, by using the pandas DataFrame.astype()
function, cast the Courses
column to string
, Fee
column to int
and Discount
column to float
.
# Apply cast type for multiple columns
df2 = df.astype({'Courses':'string','Fee':'int','Discount':'float'})
print(df2.dtypes)
# Output:
# Courses string
# Fee int64
# Duration object
# Discount float64
# dtype: object
astype() with raise or ignore Error
Finally, let’s see how you can raise or ignore the error while casting, to do so you should use errors
param. By default, it uses raise
as a value meaning generate an exception when unable to cast due to invalid data for type.
From our DataFrame Courses
column have string
data, let’s cast this to int
and see what happens.
# Raise error when unable to cast
df.Courses = df.Courses.astype('int')
# Output:
# ValueError: invalid literal for int() with base 10: 'Spark'
As you see, it raised the error when unable to cast. Now let’s suppress the exception using ignore value on errors param. With this, when errors happen it ignores the error and returns the same object without updating.
# Ignore error when unable to cast
df.Courses = df.Courses.astype('int', errors='ignore')
print(df.dtypes)
# Output:
# Courses string
# Fee int64
# Duration object
# Discount float64
# dtype: object
Frequently Asked Questions on pandas DataFrame.astype()
The astype()
method in pandas DataFrame is used to change the data type of one or more columns to a specified data type. It enables users to convert the data type of columns within a DataFrame, facilitating data manipulation and analysis.
he astype()
method can be used to convert columns to various data types such as int, float, string, datetime, category, etc.
If the conversion cannot be performed (e.g., due to invalid values in the column), astype()
may raise an error or coerce the values to the closest valid representation, depending on the parameters provided.
The astype()
method in pandas DataFrame can be used to convert multiple columns simultaneously. You can achieve this by passing a dictionary where the keys are the column names and the values are the target data types to which you want to convert the corresponding columns. This allows for efficient conversion of multiple columns in a single operation.
Besides astype()
, pandas provides other methods like pd.to_numeric()
, pd.to_datetime()
, and pd.to_timedelta()
for more specialized conversions. Additionally, you can use custom functions or apply transformations using lambda functions.
Missing values (NaN) are usually preserved during conversion, but the behavior may vary depending on the target data type and conversion process. It’s important to handle missing values appropriately before or after using astype()
.
Conclusion
In this article, I have explained the pandas DataFrame.astype()
syntax, examples of casting entire DataFrame, specific columns, multiple columns to numpy.dtype
or Python type (int, string, float, date, datetime).
Related Articles
- Different Ways to Change Data Type in pandas
- Pandas Convert Column to Int in DataFrame
- Sort Pandas DataFrame by Date (Datetime)
- Convert Pandas Series to String
- Pandas Convert String to Integer
- Convert Pandas Timestamp to Datetime
- Pandas Convert Column to Int in DataFrame
- Pandas Replace Values based on Condition
- Pandas Series astype() Function
- Pandas Count Distinct Values DataFrame
- pandas rename multiple columns
- Pandas Get Statistics For Each Group?
- How to Change Column Name in Pandas
- Pandas apply() Return Multiple Columns