Pandas Convert String to Integer

  • Post author:
  • Post category:Pandas
  • Post last modified:January 10, 2024
  • Reading time:17 mins read

In Pandas, you can convert a string column to an integer column using the astype method. To convert String to Int (Integer) from Pandas DataFrame or Series use Series.astype(int) or pandas.to_numeric() functions. In this article, I will explain how to convert one or multiple string columns to integer type with examples.

1. Quick Examples of Convert String to Integer

If you are in a hurry, below are some quick examples of how to convert or cast string to integer dtype.


# Quick examples of convert string to integer

# Example 1: Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)

# Example 2: Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)

# Example 3: Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)

# Example 4: convert the strings 
# To integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)

# Example 5: convert the strings to integers 
# Using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)

2. Series.astype() Syntax

Following is a syntax of the Series.astype(). This function takes dtype, copy, and errors params.


# Astype() Syntax
Series.astype(dtype, copy=True, errors=’raise’)

2.1 Parameters of astype()

Following are the parameters of astype() function.

  • dtype – Accepts a numpy.dtype or Python type to cast entire pandas object to the same type.
  • copy – Default True. Return a copy when copy=True.
  • errors – Default raise
    • Use ‘raise’ to generate an exception when unable to cast due to invalid data for type.
    • Use ‘ignore’ to not raise exceptions (suppress errors/exceptions). On error return the original object.

2.2 Return value of astype()

It returns a Series with the changed data type.

Now, Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are CoursesFee, Duration and Discount.


# Create the Series
import pandas as pd
import numpy as np
technologies= ({
   'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']
              })
df = pd.DataFrame(technologies)

print("Create Series:")
print(df.dtypes)

Yields below output.

pandas convert string integer

2. Pandas Convert String to Integer

We can use Pandas Series.astype() to convert or cast a string to an integer in a specific DataFrame column or Series. Since each column on DataFrame is pandas Series, I will get the column from DataFrame as a Series and use astype() function. In the below example df.Fee or df['Fee'] returns Series object.

Use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame columns.


# Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)

# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)

Yields below output.


# Output:
Courses     object
Fee          int32
Duration    object
Discount    object
dtype: object

3. Convert Multiple String Columns to Integer

We can also convert multiple string columns to integers by sending dict of column name data type to astype() function. The below example converts columns 'Fee','Discount' from string to integer dtype.


# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)

Yields below output.


# Output:
Courses     object
Fee          int32
Duration    object
Discount     int32
dtype: object

4. Using pandas.to_numeric()

Alternatively, you can convert all string columns to integer type in pandas using to_numeric(). For instance, use df['Fee'] = pd.to_numeric(df['Fee']) function to convert ‘Fee’ column to int.


# Using pandas.to_numeric()
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)

Yields below output.


# Output:
Courses     object
Fee          int64
Duration    object
Discount    object
dtype: object

If you don’t want to lose the values with letters in them, use str.replace() with a regex pattern to drop the non-digit characters.


# Convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)

Yields the same output as above.

5. Complete Example of Convert String to Integer


import pandas as pd
import numpy as np
technologies= ({
   'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']
              })
df = Pd.DataFrame(technologies)
print(df)
print(df.dtypes)

# Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)

# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)

# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)

# Convert the strings to integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)

# Convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)

Frequently Asked Questions on Pandas Convert String to Integer

How do I convert a single column from string to integer in a Pandas DataFrame?

To convert a single column from string to integer in a Pandas DataFrame, you can use the astype method. For example, the values in ‘Column1’ are initially strings. The astype(int) method is applied to convert the values to integers. Make sure that the string values in the column can be safely converted to integers; otherwise, you may encounter errors.

Can I convert multiple string columns to integers in one go?

You can convert multiple string columns to integers in one go in a Pandas DataFrame. You can either use the astype method on each column individually or use the apply function to apply the conversion to all string columns at once.

How do I handle non-numeric values or missing values during the conversion?

You can handle non-numeric values or missing values by using the pd.to_numeric function with the errors parameter. Setting errors='coerce' will replace non-convertible values with NaN.

What if I want to convert string columns to integers for the entire DataFrame?

If you want to convert all string columns to integers for the entire DataFrame, you can use the applymap function. For example, the applymap function is used to apply the conversion to every element in the DataFrame. The lambda function checks if each element is a digit using isdigit() and converts it to an integer if it is. Other non-numeric elements remain unchanged.

Are there any potential issues when converting strings to integers in Pandas?

Potential issues include handling non-numeric values or missing values. It’s important to ensure that the string values in the columns can be safely converted to integers. If not, a ValueError may occur. Additionally, be aware that converting large integer values to strings may result in loss of precision. Always check the data type and handle any errors appropriately.

Conclusion

In this article, I have explained how to convert single column, and multiple columns from string to integer type in Pandas DataFrame using Series.astype(int) and pandas.to_numeric() function.

Happy Learning !!

Related Articles

References

Malli

Malli is an experienced technical writer with a passion for translating complex Python concepts into clear, concise, and user-friendly articles. Over the years, he has written hundreds of articles in Pandas, NumPy, Python, and takes pride in ability to bridge the gap between technical experts and end-users.

Leave a Reply