• Post author:
  • Post category:Pandas
  • Post last modified:May 18, 2024
  • Reading time:15 mins read
You are currently viewing Pandas Convert String to Integer

To convert a string column to an integer in a Pandas DataFrame, you can use the astype() method. To convert String to Int (Integer) from Pandas DataFrame or Series use Series.astype(int) or pandas.to_numeric() functions. In this article, I will explain the astype() function, its syntax, parameters, and usage of how to convert single or multiple string columns into integer types with some examples.

Advertisements

Quick Examples of Converting String to Integer

Following are quick examples of converting or casting a string to integer dtype.


# Quick examples of convert string to integer

# Example 1: Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)

# Example 2: Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)

# Example 3: Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)

# Example 4: convert the strings 
# To integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)

# Example 5: convert the strings to integers 
# Using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)

Series.astype() Syntax

Following is a syntax of the Series.astype(). This function takes dtype, copy, and errors params.


# Astype() Syntax
Series.astype(dtype, copy=True, errors=’raise’)

Parameters of astype()

Following are the parameters of astype() function.

  • dtype – Accepts a numpy.dtype or Python type to cast entire pandas object to the same type.
  • copy – Default True. Return a copy when copy=True.
  • errors – Default raise
    • Use ‘raise’ to generate an exception when unable to cast due to invalid data for type.
    • Use ‘ignore’ to not raise exceptions (suppress errors/exceptions). On error return the original object.

Return value of astype()

It returns a Series with the changed data type.

To run some examples of converting a string column to an integer column, let’s create Pandas DataFrame using data from a dictionary.


# Create the Series
import pandas as pd
import numpy as np
technologies= ({
   'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']
              })
df = pd.DataFrame(technologies)

print("Create Series:")
print(df.dtypes)

Yields below output.

pandas convert string integer

Convert String to Integer

You can use Pandas Series.astype() to convert or cast a string to an integer in a specific DataFrame column or Series. Given that each column in a DataFrame is essentially a Pandas Series, accessing a specific column from the DataFrame yields a Series object. For instance, when retrieving the Fee column from DataFrame df using either df.Fee or df[Fee], it returns a Series object.

Use {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame columns.


# Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)

# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)

# Output:
# Courses     object
# Fee          int32
# Duration    object
# Discount    object
# dtype: object

Multiple Columns Integer Conversion

Alternatively, to convert multiple string columns to integers in a Pandas DataFrame, you can use the astype() method.


# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)

# Output:
# Courses     object
# Fee          int32
# Duration    object
# Discount     int32
# dtype: object

In the above examples, convert the Fee and Discount columns from string type to integer type in the DataFrame df. The print(df.dtypes) statement then prints the data types of each column in the DataFrame after the conversion.

Use pandas.to_numeric() to Single String

Similarly, if you want to convert a single string column to an integer using pd.to_numeric(), you can directly apply it to that specific column.  For instance, use df['Fee'] = pd.to_numeric(df['Fee']) function to convert ‘Fee’ column to int.


# Using pandas.to_numeric()
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)

# Output:
# Courses     object
# Fee          int64
# Duration    object
# Discount    object
# dtype: object

If you don’t want to lose the values with letters in them, use str.replace() with a regex pattern to drop the non-digit characters.


# Convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)

# Output:
# Courses     object
# Fee          int64
# Duration    object
# Discount    object
# dtype: object

Complete Example of Convert String to Integer


import pandas as pd
import numpy as np
technologies= ({
   'Courses':["Spark","PySpark","Hadoop","Pandas"],
    'Fee' :['22000','25000','24000','26000'],
    'Duration':['30days','50days','40days','60days'],
    'Discount':['1000','2300','2500','1400']
              })
df = Pd.DataFrame(technologies)
print(df)
print(df.dtypes)

# Convert string to an integer
df["Fee"] = df["Fee"].astype(int)
print (df.dtypes)

# Change specific column type
df.Fee = df['Fee'].astype('int')
print(df.dtypes)

# Multiple columns integer conversion
df[['Fee', 'Discount']] = df[['Fee','Discount']].astype(int)
print(df.dtypes)

# Convert the strings to integers use to_numeric
df['Fee'] = pd.to_numeric(df['Fee'])
print (df.dtypes)

# Convert the strings to integers using ste.replace & astype()
df['Fee'] = df['Fee'].str.replace('[^0-9]', '', regex=True).astype('int64')
print(df.dtypes)

Frequently Asked Questions on Pandas Convert String to Integer

How do I convert a single column from string to integer in a Pandas DataFrame?

To convert a single column from string to integer in a Pandas DataFrame, you can use the astype method. For instance, the values in ‘Column1’ are initially strings. The astype(int) method is applied to convert the values to integers. Make sure that the string values in the column can be safely converted to integers; otherwise, you may encounter errors.

How do I handle non-numeric values or missing values during the conversion?

You can handle non-numeric values or missing values by using the pd.to_numeric function with the errors parameter. Setting errors='coerce' will replace non-convertible values with NaN.

What if I want to convert string columns to integers for the entire DataFrame?

If you want to convert all string columns to integers for the entire DataFrame, you can use the applymap function. For example, the applymap function is used to apply the conversion to every element in the DataFrame. The lambda function checks if each element is a digit using isdigit() and converts it to an integer if it is. Other non-numeric elements remain unchanged.

Are there any potential issues when converting strings to integers in Pandas?

Potential issues include handling non-numeric values or missing values. It’s important to ensure that the string values in the columns can be safely converted to integers. If not, a ValueError may occur. Additionally, be aware that converting large integer values to strings may result in loss of precision. Always check the data type and handle any errors appropriately.

Conclusion

In this article, you have learned to convert single columns, and multiple columns from string to integer type in Pandas DataFrame using Series.astype(int) and pandas.to_numeric() function.

Happy Learning !!

Related Articles

References