In Pandas, you can convert a string column to an integer column using the astype
method. To convert String to Int (Integer) from Pandas DataFrame or Series use Series.astype(int)
or pandas.to_numeric()
functions. In this article, I will explain how to convert one or multiple string columns to integer type with examples.
1. Quick Examples of Convert String to Integer
If you are in a hurry, below are some quick examples of how to convert or cast string to integer dtype.
# Quick examples of convert string to integer
# Example 1: Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)
# Example 2: Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)
# Example 3: Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)
# Example 4: convert the strings
# To integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)
# Example 5: convert the strings to integers
# Using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)
2. Series.astype() Syntax
Following is a syntax of the Series.astype()
. This function takes dtype
, copy
, and errors
params.
# Astype() Syntax
Series.astype(dtype, copy=True, errors=’raise’)
2.1 Parameters of astype()
Following are the parameters of astype() function.
dtype
– Accepts a numpy.dtype or Python type to cast entire pandas object to the same type.copy
– Default True. Return a copy whencopy=True
.errors
– Default raise- Use ‘
raise
’ to generate an exception when unable to cast due to invalid data for type. - Use ‘
ignore
’ to not raise exceptions (suppress errors/exceptions). On error return the original object.
- Use ‘
2.2 Return value of astype()
It returns a Series with the changed data type.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses
, Fee
, Duration
and Discount
.
# Create the Series
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee' :['22000','25000','24000','26000'],
'Duration':['30days','50days','40days','60days'],
'Discount':['1000','2300','2500','1400']
})
df = pd.DataFrame(technologies)
print("Create Series:")
print(df.dtypes)
Yields below output.
2. Pandas Convert String to Integer
We can use Pandas Series.astype()
to convert or cast a string to an integer in a specific DataFrame column or Series. Since each column on DataFrame is pandas Series, I will get the column from DataFrame as a Series and use astype()
function. In the below example df.Fee
or df['Fee']
returns Series object.
Use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame columns.
# Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)
# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)
Yields below output.
# Output:
Courses object
Fee int32
Duration object
Discount object
dtype: object
3. Convert Multiple String Columns to Integer
We can also convert multiple string columns to integers by sending dict of column name data type to astype()
function. The below example converts columns 'Fee','Discount'
from string to integer dtype.
# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)
Yields below output.
# Output:
Courses object
Fee int32
Duration object
Discount int32
dtype: object
4. Using pandas.to_numeric()
Alternatively, you can convert all string columns to integer type in pandas using to_numeric()
. For instance, use df['Fee'] = pd.to_numeric(df['Fee'])
function to convert ‘Fee’
column to int.
# Using pandas.to_numeric()
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)
Yields below output.
# Output:
Courses object
Fee int64
Duration object
Discount object
dtype: object
If you don’t want to lose the values with letters in them, use str.replace()
with a regex pattern to drop the non-digit characters.
# Convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)
Yields the same output as above.
5. Complete Example of Convert String to Integer
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee' :['22000','25000','24000','26000'],
'Duration':['30days','50days','40days','60days'],
'Discount':['1000','2300','2500','1400']
})
df = Pd.DataFrame(technologies)
print(df)
print(df.dtypes)
# Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)
# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)
# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)
# Convert the strings to integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)
# Convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)
Frequently Asked Questions on Pandas Convert String to Integer
To convert a single column from string to integer in a Pandas DataFrame, you can use the astype
method. For example, the values in ‘Column1’ are initially strings. The astype(int)
method is applied to convert the values to integers. Make sure that the string values in the column can be safely converted to integers; otherwise, you may encounter errors.
You can convert multiple string columns to integers in one go in a Pandas DataFrame. You can either use the astype
method on each column individually or use the apply
function to apply the conversion to all string columns at once.
You can handle non-numeric values or missing values by using the pd.to_numeric
function with the errors
parameter. Setting errors='coerce'
will replace non-convertible values with NaN.
If you want to convert all string columns to integers for the entire DataFrame, you can use the applymap
function. For example, the applymap
function is used to apply the conversion to every element in the DataFrame. The lambda function checks if each element is a digit using isdigit()
and converts it to an integer if it is. Other non-numeric elements remain unchanged.
Potential issues include handling non-numeric values or missing values. It’s important to ensure that the string values in the columns can be safely converted to integers. If not, a ValueError
may occur. Additionally, be aware that converting large integer values to strings may result in loss of precision. Always check the data type and handle any errors appropriately.
Conclusion
In this article, I have explained how to convert single column, and multiple columns from string to integer type in Pandas DataFrame using Series.astype(int)
and pandas.to_numeric()
function.
Happy Learning !!
Related Articles
- Pandas Handle Missing Data in Dataframe
- pandas convert column to numpy array
- Pandas Convert String to Integer
- Convert PySpark DataFrame to Pandas
- convert pandas to pyspark dataframe
- Get First Row of Pandas DataFrame
- Pandas Get Last Row from DataFrame
- How to Convert Pandas DataFrame to List?
- How to Get an Index from Pandas DataFrame
- Convert NumPy Array to Pandas DataFrame
- Find Intersection Between Two Series in Pandas
- Change the Order of Pandas DataFrame Columns
- How to Get an Index from Pandas DataFrame
- Pandas Get First Column of DataFrame as Series
- Pandas Convert List of Dictionaries to DataFrame
- Pandas – Convert DataFrame to Dictionary (Dict)