Pandas Convert Column to Int in DataFrame

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:January 25, 2022

Use pandas DataFrame.astype(int) and DataFrame.apply() methods to convert a column to int (float/string to integer/int64/int32 dtype) data type. If you are converting float, I believe you would know float is bigger than int type, and converting into int would lose any value after the decimal.

Note that while converting a float to int, it doesn’t do any rounding and flooring and it just truncates the fraction values (anything after .). In this article, I will explain different ways to convert columns with float values to integer values.

1. Quick Examples of pandas Convert Column to Int

If you are in a hurry, below are some of the quick examples of how to convert column to integer dtype in DataFrame.


# Below are quick examples

# convert "Fee" from String to int
df = df.astype({'Fee':'int'})

# Convert all columns to int dtype.
# This returns error in our DataFrame
#df = df.astype('int')

# Convert single column to int dtype.
df['Fee'] = df['Fee'].astype('int')

# convert "Discount" from Float to int
df = df.astype({'Discount':'int'})

# Converting Multiple columns to int
df = pd.DataFrame(technologies)
df = df.astype({"Fee":"int","Discount":"int"})

# convert "Fee" from float to int and replace NaN values
df['Fee'] = df['Fee'].fillna(0).astype(int)
print(df)
print(df.dtypes)

Now, let’s create a DataFrame with a few rows and columns, execute some examples and validate the results. Our DataFrame contains column names Courses, Fee, Duration and Discount.


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :["22000","25000","23000","24000","26000"],
    'Duration':['30days','50days','35days', '40days','55days'],
    'Discount':[1000.10,2300.15,1000.5,1200.22,2500.20]
          }
df = pd.DataFrame(technologies)
print(df)
print(df.dtypes)

Yields below output. Note that Fee column is string/object hilding integer value and Discount is float64 type.


   Courses    Fee Duration  Discount
0    Spark  22000   30days   1000.10
1  PySpark  25000   50days   2300.15
2   Hadoop  23000   35days   1000.50
3   Python  24000   40days   1200.22
4   Pandas  26000   55days   2500.20

Courses      object
Fee          object
Duration     object
Discount    float64
dtype: object

2. Convert Column to int (Integer)

Use pandas DataFrame.astype() function to convert column to int (integer), you can apply this on a specific column or on an entire DataFrame. To cast the data type to 64-bit signed integer, you can use numpy.int64,numpy.int_int64 or int as param. To cast to 32-bit signed integer, use numpy.int32 or int32.

The Below example converts Fee column from string dtype to int64. You can also use numpy.int64 as a param to this method.


# convert "Fee" from String to int
df = df.astype({'Fee':'int'})
print(df.dtypes)

Yields below output.


Courses      object
Fee           int64
Duration     object
Discount    float64
dtype: object

If you have a DataFrame that has all string columns holiding integer values, you can convert it to int dtype simply using as below. If you have any column that has alpha-numeric values, this returns an error. If you run this on our DataFrame, you will get an error.


# Convert all columns to int dtype.
df = df.astype('int')

You can also use Series.astype() to convert a specific column. since each column on DataFrame is pandas Series, I will get the column from DataFrame as Series and use astype() function. In the below example df.Fee or df['Fee'] returns Series object.


# Convert single column to int dtype.
df['Fee'] = df['Fee'].astype('int')

3. Convert Float to Int dtype

Now by using the same approaches using astype() let’s convert the float column to int (integer) type in pandas DataFrame. Note that while converting a float to int, it doesn’t do any rounding and flooring and it just truncates the fraction values (anything after .).

The below example, converts column Discount holiding float values to int using DataFrame.astype() function.


# convert "Discount" from Float to int
df = df.astype({'Discount':'int'})
print(df.dtypes)

Yields below output


Courses     object
Fee          int64
Duration    object
Discount     int64
dtype: object

Similarly, you can also cast all columns or a single columns. Refer examples for above section for details.

4. Casting Multiple Columns to Integer

You can also convert multiple columns to integer by sending dict of column name -> data type to astype() method. The below example converts column Fee from String to int and Discount from float to int dtypes.


# Converting Multiple columns to int
df = pd.DataFrame(technologies)
df = df.astype({"Fee":"int","Discount":"int"})
print(df.dtypes)

Yields below output.


Courses     object
Fee          int32
Duration    object
Discount     int32
dtype: object

5. Using apply(np.int64) to Cast to Integer

You can also use DataFrame.apply() method to convert Fee column from string to integer in pandas. As you see in this example we are using numpy.int64 .


import numpy as np
# convert "Fee" from float to int using DataFrame.apply(np.int64)
df["Fee"] = df["Fee"].apply(np.int64)
print(df.dtypes)

Yields below output.


Courses      object
Fee           int64
Duration     object
Discount    float64
dtype: object

6. Convert Column Containing NaNs to astype(int)

In order to demonstrate some NaN/Null values, let’s create a DataFrame using NaN Values. To convert a column that includes a mixture of float and NaN values to int, first replace NaN values with zero on pandas DataFrame and then use astype() to convert.


import pandas as pd
import numpy as np
technologies= {
    'Fee' :[22000.30,25000.40,np.nan,24000.50,26000.10,np.nan]
          }
df = pd.DataFrame(technologies)
print(df)
print(df.dtypes)

Use DataFrame.fillna() to replace the NaN values with integer value zero.


# convert "Fee" from float to int and replace NaN values
df['Fee'] = df['Fee'].fillna(0).astype(int)
print(df)
print(df.dtypes)

Yields below output.


     Fee
0  22000
1  25000
2      0
3  24000
4  26000
5      0
Fee    int32
dtype: object

Conclusion

In this article, you have learned how to convert column string to int, float to to int using DataFrame.astype() and DataFrame.apply() method. Also, you have learned how to convert float and string to integers when you have Nan/null values in a column.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas Convert Column to Int in DataFrame