Pandas – Drop Infinite Values From DataFrame

By using replace() & dropna() methods you can remove infinite values from rows & columns in pandas DataFrame. Infinite values are represented in NumPy as np.inf & -np.inf for negative values. you get np with the statement import numpy as np .

In this article, I will explain how to drop/remove infinite values from pandas DataFrame. In order to remove infinite values, you can either first replace infinite values with NaN and remove NaN from DataFrame or use pd.set_option('use_inf_as_na',True) to consider all infinite values as Nan.

1. Create a Pandas DataFrame With Sample Data

Let’s create a DataFrame with a few rows and columns, execute some examples and validate the results. Our DataFrame contains column names Courses, Fee, Duration, and Discount with infinite values on all columns.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas",np.inf,"Python",-np.inf],
    'Fee' :[22000,25000,23000,np.inf,26000,25000,-np.inf,24000],
    'Duration':['30day','50days','55days', '40days','60days',-np.inf,'55days',np.inf],
    'Discount':[1000,2300,1200,np.inf,2500,-np.inf,2000,1500]
                }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses      Fee Duration  Discount
0    Spark  22000.0    30day    1000.0
1  PySpark  25000.0   50days    2300.0
2   Hadoop  23000.0   55days    1200.0
3   Python      inf   40days       inf
4   pandas  26000.0   60days    2500.0
5      inf  25000.0     -inf      -inf
6   Python     -inf   55days    2000.0
7     -inf  24000.0      inf    1500.0

2. Replace Infinite By NaN & Drop Rows With NaN in pandas

By using df.replace(), replace the infinite values with the NaN values and then use the df.dropna(inplace=True) method to remove the rows with NaN, Null/None values. This eventually removes values from pandas DataFrame. inplace=True is used to update the existing DataFrame.


# Replace infinite updated data with nan
df.replace([np.inf, -np.inf], np.nan, inplace=True)
# Drop rows with NaN
df.dropna(inplace=True)
print(df)

Yields below output. df.replace([np.inf, -np.inf], np.nan, inplace=True) replaces all np.inf & -np,inf values with NaN on current DataFrame.


   Courses      Fee Duration  Discount
0    Spark  22000.0    30day    1000.0
1  PySpark  25000.0   50days    2300.0
2   Hadoop  23000.0   55days    1200.0
4   pandas  26000.0   60days    2500.0

3. Using pandas.option.context() to Consider Infinite as NaN

You can use with pd.option_context('mode.use_inf_as_na',True): to consider all inf as Nan within a block of code. In python with is used to specify the scope of the block. IN case if you wanted to consider all inf as Nan in a complete program the use pd.set_option('use_inf_as_na',True).

Note: For older versions, replace use_inf_as_na with use_inf_as_null.


# Changing option context to use infinite as nan
# Drop the rows with nan or infinite values
with pd.option_context('mode.use_inf_as_na', True):
  df.dropna(inplace=True)
print(df)

Yields same output as above.

4. Using pandas replace() To Drop Rows or Columns Infinite Values

Use df.replace() to replace entire infinite values with np.nan and use pd.DataFrame.dropna(axis=0). to drop rows and axis set 1 to drop columns from the resultant Pd.DataFrame.


# Replace to drop rows or columns infinite values
df = df.replace([np.inf, -np.inf], np.nan).dropna(axis=0)
print(df)

5. Pandas Changing Option to Consider Infinite as NaN

You can do using pd.set_option() to pandas provided the option to use consider infinite as NaN. It makes the entire pandas module consider the infinite values as NaN. Use the df.dropna() method to remove the rows with infinite values.


# Changing option to consider infinite as nan
pd.set_option('mode.use_inf_as_na', True)
df.dropna(inplace=True)
print(df)

Yields same output as above.

6. Using DataFrame.isin() to Create Filter

Use filter df=df[~df_filter] to mask the infinite values.


# Using DataFrame.isin() to Create Filter
df_filter = df.isin([np.nan, np.inf, -np.inf])
# Mask df with the filter
df = df[~df_filter]
df.dropna(inplace=True)
print(df)

Yields same output as above.

7. Select Non-Null Rows Using DataFrame.replace()

You can use df[df.replace([np.inf,-np.inf],np.nan).notnull().all(axis=1)] to replace infinite and -infinite with NaN, and then select non-null rows. axis set 1 to drop columns.


# Using replace method to select non-null rows
df = df[df.replace([np.inf, -np.inf], np.nan).notnull().all(axis=1)] 
print(df)

8. Complete Example For Drop Infinite Values From DataFrame


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas",np.inf,"Python",-np.inf],
    'Fee' :[22000,25000,23000,np.inf,26000,25000,-np.inf,24000],
    'Duration':['30day','50days','55days', '40days','60days',-np.inf,'55days',np.inf],
    'Discount':[1000,2300,1200,np.inf,2500,-np.inf,2000,1500]
                }
df = pd.DataFrame(technologies)
print(df)

# Replace infinite display updated data with nan
df.replace([np.inf, -np.inf], np.nan)
print(df)

# Replace infinite updated data with nan
df.replace([np.inf, -np.inf], np.nan, inplace=True)
# Drop rows with NaN
df.dropna(inplace=True)
print(df)

# Changing option context to use infinite as nan
with pd.option_context('mode.use_inf_as_na', True):
#Drop the rows with nan or infinite values
  df.dropna(inplace=True)
print(df)

# Replace to drop rows or columns infinite values
df = df.replace([np.inf, -np.inf], np.nan).dropna(axis=0)
print(df)

# Changing option to consider infinite as nan
pd.set_option('mode.use_inf_as_na', True)
df.dropna(inplace=True)
print(df)

# Using DataFrame.isin() to Create Filter
df_filter = df.isin([np.nan, np.inf, -np.inf])
# Mask df with the filter
df = df[~df_filter]
df.dropna(inplace=True)
print(df)

# Using replace method to select non-null rows
df = df[df.replace([np.inf, -np.inf], np.nan).notnull().all(axis=1)] 
print(df)

Conclusion

In this article, you have learned how to drop infinite values from pandas DataFrame using DataFrame.replace(), DataFrame.dropna(), and DataFrame.isin() method. Also, you have learned how to replace all infinite values with Nan or any specific values.

Happy Learning !!

Also Read

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply