To convert a string column to an integer in a Pandas DataFrame, you can use the astype()
method. To convert String to Int (Integer) from Pandas DataFrame or Series use Series.astype(int)
or pandas.to_numeric()
functions. In this article, I will explain the astype()
function, its syntax, parameters, and usage of how to convert single or multiple string columns into integer types with some examples.
Quick Examples of Converting String to Integer
Following are quick examples of converting or casting a string to integer dtype.
# Quick examples of convert string to integer
# Example 1: Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)
# Example 2: Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)
# Example 3: Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)
# Example 4: convert the strings
# To integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)
# Example 5: convert the strings to integers
# Using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)
Series.astype() Syntax
Following is a syntax of the Series.astype()
. This function takes dtype
, copy
, and errors
params.
# Astype() Syntax
Series.astype(dtype, copy=True, errors=’raise’)
Parameters of astype()
Following are the parameters of astype() function.
dtype
– Accepts a numpy.dtype or Python type to cast entire pandas object to the same type.copy
– Default True. Return a copy whencopy=True
.errors
– Default raise- Use ‘
raise
’ to generate an exception when unable to cast due to invalid data for type. - Use ‘
ignore
’ to not raise exceptions (suppress errors/exceptions). On error return the original object.
- Use ‘
Return value of astype()
It returns a Series with the changed data type.
To run some examples of converting a string column to an integer column, let’s create Pandas DataFrame using data from a dictionary.
# Create the Series
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee' :['22000','25000','24000','26000'],
'Duration':['30days','50days','40days','60days'],
'Discount':['1000','2300','2500','1400']
})
df = pd.DataFrame(technologies)
print("Create Series:")
print(df.dtypes)
Yields below output.
Convert String to Integer
You can use Pandas Series.astype()
to convert or cast a string to an integer in a specific DataFrame column or Series. Given that each column in a DataFrame is essentially a Pandas Series, accessing a specific column from the DataFrame yields a Series object. For instance, when retrieving the Fee
column from DataFrame df
using either df.Fee
or df[Fee]
, it returns a Series object.
Use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype
or Python type to cast one or more of the DataFrame columns.
# Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)
# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)
# Output:
# Courses object
# Fee int32
# Duration object
# Discount object
# dtype: object
Multiple Columns Integer Conversion
Alternatively, to convert multiple string columns to integers in a Pandas DataFrame, you can use the astype()
method.
# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)
# Output:
# Courses object
# Fee int32
# Duration object
# Discount int32
# dtype: object
In the above examples, convert the Fee
and Discount
columns from string type to integer type in the DataFrame df
. The print(df.dtypes)
statement then prints the data types of each column in the DataFrame after the conversion.
Use pandas.to_numeric() to Single String
Similarly, if you want to convert a single string column to an integer using pd.to_numeric()
, you can directly apply it to that specific column. For instance, use df['Fee'] = pd.to_numeric(df['Fee'])
function to convert ‘Fee’
column to int.
# Using pandas.to_numeric()
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)
# Output:
# Courses object
# Fee int64
# Duration object
# Discount object
# dtype: object
If you don’t want to lose the values with letters in them, use str.replace()
with a regex pattern to drop the non-digit characters.
# Convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)
# Output:
# Courses object
# Fee int64
# Duration object
# Discount object
# dtype: object
Complete Example of Convert String to Integer
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee' :['22000','25000','24000','26000'],
'Duration':['30days','50days','40days','60days'],
'Discount':['1000','2300','2500','1400']
})
df = Pd.DataFrame(technologies)
print(df)
print(df.dtypes)
# Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)
# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)
# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)
# Convert the strings to integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)
# Convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)
Frequently Asked Questions on Pandas Convert String to Integer
To convert a single column from string to integer in a Pandas DataFrame, you can use the astype
method. For instance, the values in ‘Column1’ are initially strings. The astype(int)
method is applied to convert the values to integers. Make sure that the string values in the column can be safely converted to integers; otherwise, you may encounter errors.
You can handle non-numeric values or missing values by using the pd.to_numeric
function with the errors
parameter. Setting errors='coerce'
will replace non-convertible values with NaN.
If you want to convert all string columns to integers for the entire DataFrame, you can use the applymap
function. For example, the applymap
function is used to apply the conversion to every element in the DataFrame. The lambda function checks if each element is a digit using isdigit()
and converts it to an integer if it is. Other non-numeric elements remain unchanged.
Potential issues include handling non-numeric values or missing values. It’s important to ensure that the string values in the columns can be safely converted to integers. If not, a ValueError
may occur. Additionally, be aware that converting large integer values to strings may result in loss of precision. Always check the data type and handle any errors appropriately.
Conclusion
In this article, you have learned to convert single columns, and multiple columns from string to integer type in Pandas DataFrame using Series.astype(int)
and pandas.to_numeric()
function.
Happy Learning !!
Related Articles
- Pandas Handle Missing Data in Dataframe
- pandas convert column to numpy array
- Pandas Convert String to Integer
- Get First Row of Pandas DataFrame
- Pandas Get Last Row from DataFrame
- How to Convert Pandas DataFrame to List?
- How to Get an Index from Pandas DataFrame
- Convert NumPy Array to Pandas DataFrame
- Change the Order of Pandas DataFrame Columns
- How to Get an Index from Pandas DataFrame
- Pandas Get First Column of DataFrame as Series
- Pandas Convert DataFrame to Dictionary (Dict)