• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:20 mins read
You are currently viewing Pandas Series astype() Function

In pandas, the astype() method is used to cast a pandas object (like a Series or DataFrame) to a specified data type. This method is particularly useful when you want to convert the data type of elements within a pandas Series to a different data type.

In this article, I will explain the astype() function and using its syntax, parameters, and usage how we can convert Pandas Series data type from one type to another type with multiple examples

Key Points –

  • The primary purpose of the astype() function is to adjust the data type of elements within a pandas Series.
  • Users can specify the target data type (e.g., int, float, str, bool) to which they want to convert the elements of the Series.
  • The errors parameter handles errors during conversion. Options include 'raise' (default), 'coerce' (replace errors with NaN), and 'ignore' (leave errors unchanged).
  • The copy parameter, when set to True (default), returns a new Series with the specified data type. When set to False, it modifies the original Series in place.
  • When converting to numeric types, non-numeric values in the original Series may raise errors or be replaced with NaN based on the errors parameter.
  • Conversion to boolean is based on truthiness, where non-zero values are True, and zero is False.
  • It’s crucial to ensure data consistency before using astype(), especially with mixed data types, to avoid unexpected results or errors.

Syntax of Pandas Series astype() Function

Following is the syntax of the pandas series astype() function.


# Syntax of series astype() function
Series.astype(dtype, copy=True, errors='raise')

Parameters of the Series astype()

Following are the parameters of the astype() function

  • dtype – The data type to which the elements of the Series should be cast.
  • copy – A boolean flag indicating whether to return a new Series (True) or modify the existing one in place (False). The default is True.
  • errors – This parameter determines how to handle errors during the conversion. It can take three values.
    • raise (default) – Raises an error if the conversion cannot be performed.
    • coerce – Coerces errors to NaN, meaning that if there are elements that cannot be converted, they will be replaced with NaN.
    • ignore – Ignores errors and leaves the Series unchanged if the conversion cannot be performed.

Return Value

It returns a new Series with the specified data type. The return value depends on the copy parameter.

Create Pandas Series

Pandas Series can be created in several ways by using Python lists & dictionaries, below example create a Series from a dictionary. To use Pandas first, you need to import using import pandas as pd.


import pandas as pd

# Create Pandas Series
Courses = {'Spark': '20000', 'PySpark': '15000', 'Java': '10000'}
series = pd.Series(Courses)
print("Original Series:\n",series)

Yields below output.

pandas series astype

Convert Pandas Series Data Type using astype()

To convert the data type of elements in a pandas Series to an integer, you can use the astype() method with the argument int.


# Convert the Series elements to integer
series_int = series.astype(int)
print("Convert Series elements to integer:\n",series_int)

In the above example, the original Series contains string representations of integers. The astype(int) method is used to convert those strings to actual integer values. The result series_int is a new Series with the updated data type. This example yields the below output.

pandas series astype

Alternatively, to convert the data type of elements in a pandas Series to an integer in place, you can use the astype() method with the copy parameter set to False.


# Convert the string elements to integer in place
series.astype(int, copy=False)
print("Convert Series elements to integer:\n",series)

# Output:
# Convert Series elements to integer:
#  Spark      20000
# PySpark    15000
# Java       10000
# dtype: object

In this case, the original Series is modified in place, and the return value is None. It’s important to note that when using copy=False, the Original Series is modified directly, and there is no need to assign the result to a new variable.

Convert the Pandas Series to Float

To convert the data type of elements in a pandas Series to float, you can use the astype() method with the argument float.


# Convert the data type of elements to float
series_float = series.astype(float)
print("Convert Series elements to float:\n",series_float)

# Output:
# Convert Series elements to float:
#  Spark      20000.0
# PySpark    15000.0
# Java       10000.0
# dtype: float64

In the above example, the original Series contains string representations of numbers, including a decimal value. The astype(float) method is used to convert those strings to actual float values. The result series_float is a new Series with the updated data type.

Convert the Pandas Series to String

Similarly, you can convert the data type of elements in a pandas Series to a string, you can use the astype() method with the argument str. For instance, the original Series contains numeric values. The astype(str) method is used to convert those numeric values to string representations. The result series_str is a new Series with the updated data type.


import pandas as pd

# Sample Series
Courses = {'Spark': 20000, 'PySpark': 15000, 'Java': 10000}
series = pd.Series(Courses)

# Convert the data type of elements to string
series_str = series.astype(str)
print("Convert Series elements to string:\n", series_str)

# Output:
# Convert Series elements to string:
#  Spark      20000
# PySpark    15000
# Java       10000
# dtype: object

Convert the Pandas Series to Boolean

To convert the data type of elements in a pandas Series to a boolean, you can use the astype() method with the argument bool. For example, the original Series contains numeric values. The astype(bool) method is used to convert those numeric values to boolean. The result series_bool is a new Series with the updated data type.


import pandas as pd

# Sample Series
Courses = {'Spark': 20000, 'PySpark': 0, 'Java': 10000}
series = pd.Series(Courses)

# Convert the data type of elements to boolean
series_bool = series.astype(bool)
print("Convert Series elements to boolean:\n",series_bool)

# Output:
# Convert Series elements to boolean:
#  Spark       True
# PySpark    False
# Java        True
# dtype: bool

Keep in mind that the conversion to boolean is based on truthiness, where non-zero values are treated as True and zero is treated as False. If you have other types of data, the conversion behavior may vary.

Handle Errors – Coerce

When using the astype() function in Pandas and if you want to handle errors by coercing them to NaN (Not a Number), you can use the errors='coerce' parameter. This is particularly useful when converting non-convertible values to a numeric type.


import pandas as pd

# Create Pandas Series
Courses = {'Spark': '20000', 'PySpark': '15000', 'Java': 'x'}
series = pd.Series(Courses)

# Convert the data type of elements to integer with errors='coerce'
series_int_coerce = series.astype(int, errors='coerce')
print("Convert Series elements to integer:\n", series_int_coerce)

# Output:
# ValueError: Expected value of kwarg 'errors' to be one of ['raise', 'ignore']. Supplied value is 'coerce'

In the above example, the original Series contains a non-numeric value (‘x’). By using errors='coerce', the astype(int) method will replace non-convertible values with NaN. The result series_int_coerce is a new Series with the updated data type, and non-convertible values are represented as NaN.

Handle Errors – Ignore

When using the astype() function in pandas and if you want to handle errors by ignoring them and leaving the original values unchanged, you can use the errors='ignore' parameter. This is useful when you want to attempt the conversion but do not want to raise an error or coerce non-convertible values to NaN.


import pandas as pd

# Create Pandas Series
Courses = {'Spark': '20000', 'PySpark': '15000', 'Java': 'x'}
series = pd.Series(Courses)

# Convert the data type of elements to integer with errors='ignore'
series_int_ignore = series.astype(int, errors='ignore')
print("Series with data type converted to integer using 'ignore':\n",series_int_ignore)

# Output:
# Series with data type converted to integer using 'ignore':
#  Spark      20000
# PySpark    15000
# Java           x
# dtype: object 

In the above example, the original Series contains a non-numeric value (‘x’). By using errors='ignore', the astype(int) method will attempt the conversion but leave non-convertible values unchanged. The result series_int_ignore is a new Series with the updated data type, and non-convertible values are kept as they are in the original Series.

Frequently Asked Questions on Pandas Series astype() Function

What is the purpose of the astype() function in pandas?

The astype() function in pandas is used to convert the data type of elements in a Series to a specified data type. It is particularly useful when you need to transform the data to a different type, such as converting strings to numbers or changing between numeric types.

Can the astype() function modify the original Series in place?

The astype() function can modify the original Series in place. By default, the copy parameter is set to True, meaning that the function returns a new Series with the specified data type. However, if you set copy=False, the original Series is modified in place.

How can I convert a Series to a specific data type, like datetime?

While the astype() function is suitable for numeric and string conversions, for datetime conversions, you might want to use pd.to_datetime() function or pd.to_numeric() for more specific numeric conversions. The appropriate function depends on the desired conversion.

Is it possible to convert a Series to a categorical data type using astype()?

While astype() is primarily used for numeric and string conversions, for categorical conversions, it’s better to use astype('category') or pd.Categorical(). This ensures that the data is treated as a categorical type, providing benefits like efficient memory usage and categorical-specific operations.

Can I convert a Series with mixed data types using astype()?

In general, it’s advisable to have consistent data types in a Series. If you have mixed data types, consider cleaning or handling the data appropriately before conversion. Using astype() on a Series with mixed data types may lead to unexpected results or errors.

Conclusion

In this article, I have explained the astype() function in pandas is a versatile method for converting the data type of elements in a Series. It allows you to transform your data to a specified data type, such as converting strings to numbers or vice versa with examples.

Happy Learning!!

References

Malli

Malli is an experienced technical writer with a passion for translating complex Python concepts into clear, concise, and user-friendly articles. Over the years, he has written hundreds of articles in Pandas, NumPy, Python, and takes pride in ability to bridge the gap between technical experts and end-users.