• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:17 mins read
You are currently viewing Different Ways to Change Data Type in Pandas

While working in Pandas DataFrame or any table-like data structures we are often required to change the data type(dtype) of a column also called type casting, for example, convert from int to string, string to int e.t.c, In pandas, you can do this by using several methods like astype(), to_numeric(), covert_dttypes(), infer_objects() and e.t.c. In this article, I will explain different examples of how to change or convert the data type in Pandas DataFrame – convert all columns to a specific type, convert single or multiple column types – convert to numeric types e.t.c.

Key Points–

  • Applying the .astype() method to convert data types directly, specifying the desired dtype.
  • Utilizing the .to_numeric() function to coerce object types into numeric types, with options for handling errors and coercing strings.
  • Using the infer_objects() method to automatically infer and convert data types.
  • Employing the as_type() method to convert data types with specific parameters like nullable integers.
  • Utilizing custom functions or mapping techniques for more complex type conversions.

1. Quick Examples of Changing Data Type

Below are some quick examples of converting column data type on Pandas DataFrame.


# Quick examples of converting data types 

# Example 1: Convert all types to best possible types
df2=df.convert_dtypes()

# Example 2: Change All Columns to Same type
df = df.astype(str)

# Example 3: Change Type For One or Multiple Columns
df = df.astype({"Fee": int, "Discount": float})

# Example 4: Ignore errors
df = df.astype({"Courses": int},errors='ignore')

# Example 5: Converts object types to possible types
df = df.infer_objects()

# Example 6: Converts fee column to numeric type
df['Fee'] = pd.to_numeric(df['Fee'])

# Example 7: Convert Fee and Discount to numeric types
df[['Fee', 'Discount']] =df [['Fee', 'Discount']].apply(pd.to_numeric)

Now let’s see with an example. first, create a Pandas DataFrame with columns names Courses, Fee, Duration, Discount.


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration ':['30day','40days','35days', '40days','60days','50days','55days'],
    'Discount':[11.8,23.7,13.4,15.7,12.5,25.4,18.4]
    }
df = pd.DataFrame(technologies)
print(df.dtypes)

Yields below output.


# Output:
Courses       object
Fee            int64
Duration      object
Discount     float64

2. DataFrame.convert_dtypes() to Convert Data Type in Pandas

convert_dtypes() is available in Pandas DataFrame since version 1.0.0, this is the most used method as it automatically converts the column types to best possible types.

Below is the Syntax of the pandas.DataFrame.convert_dtypes().


# Syntax of DataFrame.convert_dtypes
DataFrame.convert_dtypes(infer_objects=True, convert_string=True,
      convert_integer=True, convert_boolean=True, convert_floating=True)

Now, let’s see a simple example.


# Convert all types to best possible types
df2=df.convert_dtypes()
print(df2.dtypes)

Yields below output. Note that it converted columns with object type to string type.


# Output:
Courses       string
Fee            int64
Duration      string
Discount     float64

This method is handy when you want to leverage Pandas’ built-in type inference capabilities to automatically convert data types, especially when dealing with large datasets or when you’re unsure about the optimal data type for each column.

3. DataFrame.astype() to Change Data Type in Pandas

In pandas DataFrame use dataframe.astype() function to convert one type to another type of single or multiple columns at a time, you can also use it to change all column types to the same type. When you perform astype() on a DataFrame without specifying a column name, it changes all columns to a specific type. To convert a specific column, you need to explicitly specify the column.

Below is the syntax of pandas.DataFrame.astype()


# Below is syntax of astype()
DataFrame.astype(dtype, copy=True, errors='raise')

3.1 Change All Columns to Same type in Pandas

df.astype(str) converts all columns of Pandas DataFrame to string type. To convert all columns in the DataFrame to strings, as confirmed by printing the data types before and after the conversion. Each column will be of type object, which is the dtype Pandas uses for storing strings.


# Change All Columns to Same type
df = df.astype(str)
print(df.dtypes)

Yields below output.


# Output:
Courses      object
Fee          object
Duration     object
Discount     object
dtype: object

3.2 Change Type For One or Multiple Columns in Pandas

On astype() Specify the param as JSON notation with column name as key and type you wanted to convert as a value to change one or multiple columns. Below example cast DataFrame column Fee to int type and Discount to float type.


# Change Type For One or Multiple Columns
df = df.astype({"Fee": int, "Discount": float})
print(df.dtypes)

3.3 Convert Data Type for All Columns in a List

Sometimes you may need to convert a list of DataFrame columns to a specific type, you can achieve this in several ways. Below are 3 different ways that convert columns Fee and Discount to float type.


# Convert data type for all columns in a list
df = pd.DataFrame(technologies)
cols = ['Fee', 'Discount']
df[cols] = df[cols].astype('float')

# By using a loop
for col in ['Fee', 'Discount']:
    df[col] = df[col].astype('float')

# By using apply() & astype() together
df[['Fee', 'Discount']].apply(lambda x: x.astype('float'))

3.4 Raise or Ignore Error when Convert Column type Fails

By default, when you are trying to change a column to a type that is not supported with the data, Pandas generates an error, in order to ignore error use errors param; this takes either ignore or error as value. In the below example I am converting a column that has string value to int which is not supported hence it generates an error, I used errors='ignore' to ignore the error.


# Ignores error
df = df.astype({"Courses": int},errors='ignore')

# Generates error
df = df.astype({"Courses": int},errors='raise')

4. DataFrame.infer_objects() to Change Data Type in Pandas

Use DataFrame.infer_objects() method to automatically convert object columns to a type of data it holding. It checks the data of each object column and automatically converts it to data type. Note that it converts only object types. For example, if a column with object type is holding int or float types, using infer_object() converts it to respective types.


# Converts object types to possible types
df = pd.DataFrame(technologies)
df = df.infer_objects()
print(df.dtypes)

5. Using DataFrame.to_numeric() to Convert Numeric Types

pandas.DataFrame.to_numeric() is used to convert columns with non-numeric dtypes to the most suitable numeric type.

5.1 Convert Numeric Types

Using pd.to_numeric() is another way to convert a specific column to a numeric type in Pandas. Here’s how you can use it to convert the Fee column to numeric type


# Converts fee column to numeric type
df['Fee'] = pd.to_numeric(df['Fee'])
print(df.dtypes)

This code will convert the Fee column from strings to numeric values, as confirmed by printing the data types after the conversion.

5.2 Convert Multiple Numeric Types using apply() Method

Use to_numeric() along with DataFrame.apply() method to convert multiple columns into a numeric type. The below example converts column Fee and Discount to numeric types.


# Convert Fee and Discount to numeric types
df = pd.DataFrame(technologies)
df[['Fee', 'Discount']] =df [['Fee', 'Discount']].apply(pd.to_numeric)
print(df.dtypes)

Frequently Asked Questions on Different Ways to Change Data Type

What are the different methods available in pandas to change data types?

There are several methods available in pandas to change data types, including astype(), to_numeric(), convert_dtypes(), and direct assignment with functions like pd.to_numeric().

When should I use the astype() method?

You can use the astype() method when you want to convert all columns to a specific data type or convert individual columns to different data types.

What is the purpose of the to_numeric() function?

The to_numeric() function is particularly useful for converting object types to numeric types, with options for handling errors and coercing strings.

When is it appropriate to use the convert_dtypes() method?

The convert_dtypes() method in pandas is appropriate to use when you want to automatically convert DataFrame columns to the best possible data types based on their content. Here are some scenarios where convert_dtypes() is particularly useful.

Are there any performance considerations when changing data types in pandas?

Changing data types can have performance implications, especially for large datasets. It’s important to consider memory usage and computational efficiency when choosing the appropriate method for data type conversion.

Conclusion

In this article, you have learned how to convert/change all columns of the DataFrame to a specific type, case one or multiple columns and finally converting columns to numeric type using astype(), to_numeric(), covert_dttypes(), infer_objects() methods.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply

This Post Has One Comment

  1. pour panah

    Excellent. Thankyou