Pandas Convert String to Integer

Use pandas Series.astype(int) or pandas.to_numeric() functions to convert or cast a Series or DataFrame column from a string to an integer data type (dtype). In this article, I will explain how to convert one or multiple string columns to integer type with examples.

1. Quick Examples of Convert String to Integer

If you are in a hurry, below are some quick examples of how to convert or cast string to integer dtype.


# Below are quick example

# Example 1: convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)

# Example 2: Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)

# Example 3: Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)

# Example 4: convert the strings to integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)

# Example 5: convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)

2. Series.astype() Syntax

Following is a syntax of the Series.astype(). This function takes dtype, copy, and errors params.


# astype() Syntax
Series.astype(dtype, copy=True, errors=’raise’)

2.1 Parameters of astype()

Following are the parameters of astype() function.

  • dtype – Accepts a numpy.dtype or Python type to cast entire pandas object to the same type.
  • copy – Default True. Return a copy when copy=True.
  • errors – Default raise
    • Use ‘raise’ to generate an exception when unable to cast due to invalid data for type.
    • Use ‘ignore’ to not raise exceptions (suppress errors/exceptions). On error return the original object.

2.2 Return value of astype()

It returns a Series with the changed data type.

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFee, Duration and Discount.


import pandas as pd
import numpy as np
technologies= ({
   'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']
              })
df = pd.DataFrame(technologies)

print(df.dtypes)

Yields below output.


# Output
Courses     object
Fee         object
Duration    object
Discount    object
dtype: object

2. Pandas Convert String to Integer

We can use Pandas Series.astype() to convert or cast a string to an integer in a specific DataFrame column or Series. Since each column on DataFrame is pandas Series, I will get the column from DataFrame as a Series and use astype() function. In the below example df.Fee or df['Fee'] returns Series object.

Use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame columns.


# convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)

# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)

Yields below output.


# Output
Courses     object
Fee          int32
Duration    object
Discount    object
dtype: object

3. Convert Multiple String Columns to Integer

We can also convert multiple string columns to integers by sending dict of column name data type to astype() function. The below example converts columns 'Fee','Discount' from string to integer dtype.


# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)

Yields below output.


# Output
Courses     object
Fee          int32
Duration    object
Discount     int32
dtype: object

4. Using pandas.to_numeric()

Alternatively, you can convert all string columns to integer type using pandas.to_numeric(). For example use df['Fee'] = pd.to_numeric(df['Fee']) function to convert ‘Fee’ column to int.


# Using pandas.to_numeric()
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)

Yields below output.


# Output
Courses     object
Fee          int64
Duration    object
Discount    object
dtype: object

If you don’t want to lose the values with letters in them, use str.replace() with a regex pattern to drop the non-digit characters.


# convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)

Yields the same output as above.

5. Complete Example of Convert String to Integer


import pandas as pd
import numpy as np
technologies= ({
   'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']
              })
df = pd.DataFrame(technologies)
print(df)
print(df.dtypes)

# convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)

# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)

# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)

# convert the strings to integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)

# convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)

6. Conclusion

In this article, I have explained how to convert single column, and multiple columns from string to integer type in Pandas DataFrame using Series.astype(int) and pandas.to_numeric() function.

Happy Learning !!

Related Articles

References

Leave a Reply