While working in Pandas DataFrame or any table-like data structures we are often required to change the data type(dtype) of a column also called type casting, for example, convert from int to string, string to int e.t.c, In pandas, you can do this by using several methods like astype()
, to_numeric()
, covert_dttypes()
, infer_objects()
and e.t.c. In this article, I will explain different examples of how to change or convert the data type in Pandas DataFrame – convert all columns to a specific type, convert single or multiple column types – convert to numeric types e.t.c.
Key Points–
- Applying the
.astype()
method to convert data types directly, specifying the desired dtype. - Utilizing the
.to_numeric()
function to coerce object types into numeric types, with options for handling errors and coercing strings. - Using the
infer_objects()
method to automatically infer and convert data types. - Employing the
as_type()
method to convert data types with specific parameters like nullable integers. - Utilizing custom functions or mapping techniques for more complex type conversions.
1. Quick Examples of Changing Data Type
Below are some quick examples of converting column data type on Pandas DataFrame.
# Quick examples of converting data types
# Example 1: Convert all types to best possible types
df2=df.convert_dtypes()
# Example 2: Change All Columns to Same type
df = df.astype(str)
# Example 3: Change Type For One or Multiple Columns
df = df.astype({"Fee": int, "Discount": float})
# Example 4: Ignore errors
df = df.astype({"Courses": int},errors='ignore')
# Example 5: Converts object types to possible types
df = df.infer_objects()
# Example 6: Converts fee column to numeric type
df['Fee'] = pd.to_numeric(df['Fee'])
# Example 7: Convert Fee and Discount to numeric types
df[['Fee', 'Discount']] =df [['Fee', 'Discount']].apply(pd.to_numeric)
Now let’s see with an example. first, create a Pandas DataFrame with columns names Courses
, Fee
, Duration
, Discount
.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
'Fee' :[20000,25000,26000,22000,24000,21000,22000],
'Duration ':['30day','40days','35days', '40days','60days','50days','55days'],
'Discount':[11.8,23.7,13.4,15.7,12.5,25.4,18.4]
}
df = pd.DataFrame(technologies)
print(df.dtypes)
Yields below output.
# Output:
Courses object
Fee int64
Duration object
Discount float64
2. DataFrame.convert_dtypes() to Convert Data Type in Pandas
convert_dtypes()
is available in Pandas DataFrame since version 1.0.0, this is the most used method as it automatically converts the column types to best possible types.
Below is the Syntax of the pandas.DataFrame.convert_dtypes()
.
# Syntax of DataFrame.convert_dtypes
DataFrame.convert_dtypes(infer_objects=True, convert_string=True,
convert_integer=True, convert_boolean=True, convert_floating=True)
Now, let’s see a simple example.
# Convert all types to best possible types
df2=df.convert_dtypes()
print(df2.dtypes)
Yields below output. Note that it converted columns with object
type to string
type.
# Output:
Courses string
Fee int64
Duration string
Discount float64
This method is handy when you want to leverage Pandas’ built-in type inference capabilities to automatically convert data types, especially when dealing with large datasets or when you’re unsure about the optimal data type for each column.
3. DataFrame.astype() to Change Data Type in Pandas
In pandas DataFrame use dataframe.astype() function to convert one type to another type of single or multiple columns at a time, you can also use it to change all column types to the same type. When you perform astype()
on a DataFrame without specifying a column name, it changes all columns to a specific type. To convert a specific column, you need to explicitly specify the column.
Below is the syntax of pandas.DataFrame.astype()
# Below is syntax of astype()
DataFrame.astype(dtype, copy=True, errors='raise')
3.1 Change All Columns to Same type in Pandas
df.astype(str)
converts all columns of Pandas DataFrame to string
type. To convert all columns in the DataFrame to strings, as confirmed by printing the data types before and after the conversion. Each column will be of type object
, which is the dtype Pandas uses for storing strings.
# Change All Columns to Same type
df = df.astype(str)
print(df.dtypes)
Yields below output.
# Output:
Courses object
Fee object
Duration object
Discount object
dtype: object
3.2 Change Type For One or Multiple Columns in Pandas
On astype()
Specify the param as JSON notation with column name as key and type you wanted to convert as a value to change one or multiple columns. Below example cast DataFrame column Fee
to int
type and Discount
to float
type.
# Change Type For One or Multiple Columns
df = df.astype({"Fee": int, "Discount": float})
print(df.dtypes)
3.3 Convert Data Type for All Columns in a List
Sometimes you may need to convert a list of DataFrame columns to a specific type, you can achieve this in several ways. Below are 3 different ways that convert columns Fee
and Discount
to float
type.
# Convert data type for all columns in a list
df = pd.DataFrame(technologies)
cols = ['Fee', 'Discount']
df[cols] = df[cols].astype('float')
# By using a loop
for col in ['Fee', 'Discount']:
df[col] = df[col].astype('float')
# By using apply() & astype() together
df[['Fee', 'Discount']].apply(lambda x: x.astype('float'))
3.4 Raise or Ignore Error when Convert Column type Fails
By default, when you are trying to change a column to a type that is not supported with the data, Pandas generates an error, in order to ignore error use errors param; this takes either ignore or error as value. In the below example I am converting a column that has string value to int which is not supported hence it generates an error, I used errors='ignore'
to ignore the error.
# Ignores error
df = df.astype({"Courses": int},errors='ignore')
# Generates error
df = df.astype({"Courses": int},errors='raise')
4. DataFrame.infer_objects() to Change Data Type in Pandas
Use DataFrame.infer_objects()
method to automatically convert object columns to a type of data it holding. It checks the data of each object column and automatically converts it to data type. Note that it converts only object types. For example, if a column with object type is holding int or float types, using infer_object()
converts it to respective types.
# Converts object types to possible types
df = pd.DataFrame(technologies)
df = df.infer_objects()
print(df.dtypes)
5. Using DataFrame.to_numeric() to Convert Numeric Types
pandas.DataFrame.to_numeric()
is used to convert columns with non-numeric dtypes
to the most suitable numeric type.
5.1 Convert Numeric Types
Using pd.to_numeric()
is another way to convert a specific column to a numeric type in Pandas. Here’s how you can use it to convert the Fee
column to numeric type
# Converts fee column to numeric type
df['Fee'] = pd.to_numeric(df['Fee'])
print(df.dtypes)
This code will convert the Fee
column from strings to numeric values, as confirmed by printing the data types after the conversion.
5.2 Convert Multiple Numeric Types using apply() Method
Use to_numeric()
along with DataFrame.apply()
method to convert multiple columns into a numeric type. The below example converts column Fee
and Discount
to numeric types.
# Convert Fee and Discount to numeric types
df = pd.DataFrame(technologies)
df[['Fee', 'Discount']] =df [['Fee', 'Discount']].apply(pd.to_numeric)
print(df.dtypes)
Frequently Asked Questions on Different Ways to Change Data Type
There are several methods available in pandas to change data types, including astype()
, to_numeric()
, convert_dtypes()
, and direct assignment with functions like pd.to_numeric()
.
You can use the astype()
method when you want to convert all columns to a specific data type or convert individual columns to different data types.
The to_numeric()
function is particularly useful for converting object types to numeric types, with options for handling errors and coercing strings.
The convert_dtypes()
method in pandas is appropriate to use when you want to automatically convert DataFrame columns to the best possible data types based on their content. Here are some scenarios where convert_dtypes()
is particularly useful.
Changing data types can have performance implications, especially for large datasets. It’s important to consider memory usage and computational efficiency when choosing the appropriate method for data type conversion.
Conclusion
In this article, you have learned how to convert/change all columns of the DataFrame to a specific type, case one or multiple columns and finally converting columns to numeric type using astype()
, to_numeric()
, covert_dttypes()
, infer_objects()
methods.
Happy Learning !!
Related Articles
- How to Get All Column Names as List in Pandas?
- Pandas Drop Single & Multiple Columns From DataFrame
- Pandas Add New Column to Existing DataFrame
- How to rename column on pandas DataFrame
- Pandas Get DataFrame Columns by Data Type
- Pandas Empty DataFrame with Column Names & Types
- Pandas Get Row Number of DataFrame
- How to Union Pandas DataFrames using Concat?
- Apply Multiple Filters to Pandas DataFrame or Series
- Pandas Check If DataFrame is Empty
- Pandas Get First Row Value of a Given Column
- How to Rename a Pandas Series
- Convert Pandas Timestamp to Datetime
- Pandas Set Value to Particular Cell in DataFrame Using Index
Excellent. Thankyou