To convert String to Int (Integer) from Pandas DataFrame or Series use Series.astype(int)
or pandas.to_numeric()
functions. In this article, I will explain how to convert one or multiple string columns to integer type with examples.
1. Quick Examples of Convert String to Integer
If you are in a hurry, below are some quick examples of how to convert or cast string to integer dtype.
# Below are quick example
# Example 1: convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)
# Example 2: Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)
# Example 3: Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)
# Example 4: convert the strings to integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)
# Example 5: convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)
2. Series.astype() Syntax
Following is a syntax of the Series.astype()
. This function takes dtype
, copy
, and errors
params.
# Astype() Syntax
Series.astype(dtype, copy=True, errors=’raise’)
2.1 Parameters of astype()
Following are the parameters of astype() function.
dtype
– Accepts a numpy.dtype or Python type to cast entire pandas object to the same type.copy
– Default True. Return a copy whencopy=True
.errors
– Default raise- Use ‘
raise
’ to generate an exception when unable to cast due to invalid data for type. - Use ‘
ignore
’ to not raise exceptions (suppress errors/exceptions). On error return the original object.
- Use ‘
2.2 Return value of astype()
It returns a Series with the changed data type.
Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are Courses
, Fee
, Duration
and Discount
.
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee' :['22000','25000','24000','26000'],
'Duration':['30days','50days','40days','60days'],
'Discount':['1000','2300','2500','1400']
})
df = pd.DataFrame(technologies)
print(df.dtypes)
Yields below output.
# Output:
Courses object
Fee object
Duration object
Discount object
dtype: object
2. Pandas Convert String to Integer
We can use Pandas Series.astype()
to convert or cast a string to an integer in a specific DataFrame column or Series. Since each column on DataFrame is pandas Series, I will get the column from DataFrame as a Series and use astype()
function. In the below example df.Fee
or df['Fee']
returns Series object.
Use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame columns.
# Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)
# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)
Yields below output.
# Output:
Courses object
Fee int32
Duration object
Discount object
dtype: object
3. Convert Multiple String Columns to Integer
We can also convert multiple string columns to integers by sending dict of column name data type to astype()
function. The below example converts columns 'Fee','Discount'
from string to integer dtype.
# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)
Yields below output.
# Output:
Courses object
Fee int32
Duration object
Discount int32
dtype: object
4. Using pandas.to_numeric()
Alternatively, you can convert all string columns to integer type in pandas using to_numeric()
. For example use df['Fee'] = pd.to_numeric(df['Fee'])
function to convert ‘Fee’
column to int.
# Using pandas.to_numeric()
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)
Yields below output.
# Output:
Courses object
Fee int64
Duration object
Discount object
dtype: object
If you don’t want to lose the values with letters in them, use str.replace()
with a regex pattern to drop the non-digit characters.
# Convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)
Yields the same output as above.
5. Complete Example of Convert String to Integer
import pandas as pd
import numpy as np
technologies= ({
'Courses':["Spark","PySpark","Hadoop","Pandas"],
'Fee' :['22000','25000','24000','26000'],
'Duration':['30days','50days','40days','60days'],
'Discount':['1000','2300','2500','1400']
})
df = Pd.DataFrame(technologies)
print(df)
print(df.dtypes)
# Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)
# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)
# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)
# Convert the strings to integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)
# Convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)
6. Conclusion
In this article, I have explained how to convert single column, and multiple columns from string to integer type in Pandas DataFrame using Series.astype(int)
and pandas.to_numeric()
function.
Happy Learning !!
Related Articles
- Pandas Handle Missing Data in Dataframe
- Get First Row of Pandas DataFrame
- Pandas Get Last Row from DataFrame
- How to Get an Index from Pandas DataFrame
- Convert NumPy Array to Pandas DataFrame
- Find Intersection Between Two Series in Pandas
- Change the Order of Pandas DataFrame Columns
- How to Get an Index from Pandas DataFrame
- Pandas Get First Column of DataFrame as Series