In pandas, the astype()
method is used to cast a pandas object (like a Series or DataFrame) to a specified data type. This method is particularly useful when you want to convert the data type of elements within a pandas Series to a different data type.
In this article, I will explain the astype()
function and using its syntax, parameters, and usage how we can convert Pandas Series data type from one type to another type with multiple examples
Key Points –
- The primary purpose of the
astype()
function is to adjust the data type of elements within a pandas Series. - Users can specify the target data type (e.g., int, float, str, bool) to which they want to convert the elements of the Series.
- The
errors
parameter handles errors during conversion. Options include'raise'
(default),'coerce'
(replace errors with NaN), and'ignore'
(leave errors unchanged). - The
copy
parameter, when set toTrue
(default), returns a new Series with the specified data type. When set toFalse
, it modifies the original Series in place. - When converting to numeric types, non-numeric values in the original Series may raise errors or be replaced with NaN based on the
errors
parameter. - Conversion to boolean is based on truthiness, where non-zero values are
True
, and zero isFalse
. - It’s crucial to ensure data consistency before using
astype()
, especially with mixed data types, to avoid unexpected results or errors.
Syntax of Pandas Series astype() Function
Following is the syntax of the pandas series astype() function.
# Syntax of series astype() function
Series.astype(dtype, copy=True, errors='raise')
Parameters of the Series astype()
Following are the parameters of the astype() function
dtype
– The data type to which the elements of the Series should be cast.copy
– A boolean flag indicating whether to return a new Series (True
) or modify the existing one in place (False
). The default isTrue
.errors
– This parameter determines how to handle errors during the conversion. It can take three values.raise
(default) – Raises an error if the conversion cannot be performed.coerce
– Coerces errors to NaN, meaning that if there are elements that cannot be converted, they will be replaced with NaN.ignore
– Ignores errors and leaves the Series unchanged if the conversion cannot be performed.
Return Value
It returns a new Series with the specified data type. The return value depends on the copy
parameter.
Create Pandas Series
Pandas Series can be created in several ways by using Python lists & dictionaries, below example create a Series from a dictionary. To use Pandas first, you need to import using import pandas as pd
.
import pandas as pd
# Create Pandas Series
Courses = {'Spark': '20000', 'PySpark': '15000', 'Java': '10000'}
series = pd.Series(Courses)
print("Original Series:\n",series)
Yields below output.
Convert Pandas Series Data Type using astype()
To convert the data type of elements in a pandas Series to an integer, you can use the astype()
method with the argument int
.
# Convert the Series elements to integer
series_int = series.astype(int)
print("Convert Series elements to integer:\n",series_int)
In the above example, the original Series contains string representations of integers. The astype(int)
method is used to convert those strings to actual integer values. The result series_int
is a new Series with the updated data type. This example yields the below output.
Alternatively, to convert the data type of elements in a pandas Series to an integer in place, you can use the astype()
method with the copy
parameter set to False
.
# Convert the string elements to integer in place
series.astype(int, copy=False)
print("Convert Series elements to integer:\n",series)
# Output:
# Convert Series elements to integer:
# Spark 20000
# PySpark 15000
# Java 10000
# dtype: object
In this case, the original Series is modified in place, and the return value is None
. It’s important to note that when using copy=False
, the Original Series is modified directly, and there is no need to assign the result to a new variable.
Convert the Pandas Series to Float
To convert the data type of elements in a pandas Series to float, you can use the astype()
method with the argument float
.
# Convert the data type of elements to float
series_float = series.astype(float)
print("Convert Series elements to float:\n",series_float)
# Output:
# Convert Series elements to float:
# Spark 20000.0
# PySpark 15000.0
# Java 10000.0
# dtype: float64
In the above example, the original Series contains string representations of numbers, including a decimal value. The astype(float)
method is used to convert those strings to actual float values. The result series_float
is a new Series with the updated data type.
Convert the Pandas Series to String
Similarly, you can convert the data type of elements in a pandas Series to a string, you can use the astype() method with the argument str. For instance, the original Series contains numeric values. The astype(str) method is used to convert those numeric values to string representations. The result series_str
is a new Series with the updated data type.
import pandas as pd
# Sample Series
Courses = {'Spark': 20000, 'PySpark': 15000, 'Java': 10000}
series = pd.Series(Courses)
# Convert the data type of elements to string
series_str = series.astype(str)
print("Convert Series elements to string:\n", series_str)
# Output:
# Convert Series elements to string:
# Spark 20000
# PySpark 15000
# Java 10000
# dtype: object
Convert the Pandas Series to Boolean
To convert the data type of elements in a pandas Series to a boolean, you can use the astype() method with the argument bool
. For example, the original Series contains numeric values. The astype(bool) method is used to convert those numeric values to boolean. The result series_bool
is a new Series with the updated data type.
import pandas as pd
# Sample Series
Courses = {'Spark': 20000, 'PySpark': 0, 'Java': 10000}
series = pd.Series(Courses)
# Convert the data type of elements to boolean
series_bool = series.astype(bool)
print("Convert Series elements to boolean:\n",series_bool)
# Output:
# Convert Series elements to boolean:
# Spark True
# PySpark False
# Java True
# dtype: bool
Keep in mind that the conversion to boolean is based on truthiness, where non-zero values are treated as True
and zero is treated as False
. If you have other types of data, the conversion behavior may vary.
Handle Errors – Coerce
When using the astype()
function in Pandas and if you want to handle errors by coercing them to NaN (Not a Number), you can use the errors='coerce'
parameter. This is particularly useful when converting non-convertible values to a numeric type.
import pandas as pd
# Create Pandas Series
Courses = {'Spark': '20000', 'PySpark': '15000', 'Java': 'x'}
series = pd.Series(Courses)
# Convert the data type of elements to integer with errors='coerce'
series_int_coerce = series.astype(int, errors='coerce')
print("Convert Series elements to integer:\n", series_int_coerce)
# Output:
# ValueError: Expected value of kwarg 'errors' to be one of ['raise', 'ignore']. Supplied value is 'coerce'
In the above example, the original Series contains a non-numeric value (‘x’). By using errors='coerce'
, the astype(int)
method will replace non-convertible values with NaN. The result series_int_coerce
is a new Series with the updated data type, and non-convertible values are represented as NaN.
Handle Errors – Ignore
When using the astype()
function in pandas and if you want to handle errors by ignoring them and leaving the original values unchanged, you can use the errors='ignore'
parameter. This is useful when you want to attempt the conversion but do not want to raise an error or coerce non-convertible values to NaN.
import pandas as pd
# Create Pandas Series
Courses = {'Spark': '20000', 'PySpark': '15000', 'Java': 'x'}
series = pd.Series(Courses)
# Convert the data type of elements to integer with errors='ignore'
series_int_ignore = series.astype(int, errors='ignore')
print("Series with data type converted to integer using 'ignore':\n",series_int_ignore)
# Output:
# Series with data type converted to integer using 'ignore':
# Spark 20000
# PySpark 15000
# Java x
# dtype: object
In the above example, the original Series contains a non-numeric value (‘x’). By using errors='ignore'
, the astype(int)
method will attempt the conversion but leave non-convertible values unchanged. The result series_int_ignore
is a new Series with the updated data type, and non-convertible values are kept as they are in the original Series.
Frequently Asked Questions on Pandas Series astype() Function
The astype()
function in pandas is used to convert the data type of elements in a Series to a specified data type. It is particularly useful when you need to transform the data to a different type, such as converting strings to numbers or changing between numeric types.
The astype()
function can modify the original Series in place. By default, the copy
parameter is set to True
, meaning that the function returns a new Series with the specified data type. However, if you set copy=False
, the original Series is modified in place.
While the astype()
function is suitable for numeric and string conversions, for datetime conversions, you might want to use pd.to_datetime()
function or pd.to_numeric()
for more specific numeric conversions. The appropriate function depends on the desired conversion.
While astype()
is primarily used for numeric and string conversions, for categorical conversions, it’s better to use astype('category')
or pd.Categorical()
. This ensures that the data is treated as a categorical type, providing benefits like efficient memory usage and categorical-specific operations.
In general, it’s advisable to have consistent data types in a Series. If you have mixed data types, consider cleaning or handling the data appropriately before conversion. Using astype()
on a Series with mixed data types may lead to unexpected results or errors.
Conclusion
In this article, I have explained the astype() function in pandas is a versatile method for converting the data type of elements in a Series. It allows you to transform your data to a specified data type, such as converting strings to numbers or vice versa with examples.
Happy Learning!!
Related Articles
- Pandas Series.max() Function
- Pandas Get Floor or Ceil of Series
- Pandas Iterate Over Series
- Pandas Series.isin() Function
- Convert Pandas Series to String
- How to Rename a Pandas Series
- Pandas.Series.combine() Function
- Pandas Series.mean() Function
- Convert Pandas Series to DataFrame
- Pandas Series sum() Function
- Pandas Series concat() Function
- Pandas Series unique() Function with Examples
- How to Convert NumPy Array to Pandas Series?
- How to Get the Length of a Series in Pandas?
- Pandas Series groupby() Function with Examples