Pandas – Drop Infinite Values From DataFrame

  • Post author:
  • Post category:Pandas
  • Post last modified:October 5, 2023

By using replace() & dropna() methods you can remove infinite values from rows & columns in pandas DataFrame. Infinite values are represented in NumPy as np.inf & -np.inf for negative values. you get np with the statement import numpy as np .

In this article, I will explain how to drop/remove infinite values from pandas DataFrame. In order to remove infinite values, you can either first replace infinite values with NaN and remove NaN from DataFrame or use pd.set_option('use_inf_as_na',True) to consider all infinite values as Nan.

1. Create a Pandas DataFrame With Sample Data

Let’s create a DataFrame with a few rows and columns, execute some examples and validate the results. Our DataFrame contains column names Courses, Fee, Duration, and Discount with infinite values on all columns.


# Create DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas",np.inf,"Python",-np.inf],
    'Fee' :[22000,25000,23000,np.inf,26000,25000,-np.inf,24000],
    'Duration':['30day','50days','55days', '40days','60days',-np.inf,'55days',np.inf],
    'Discount':[1000,2300,1200,np.inf,2500,-np.inf,2000,1500]
                }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses      Fee Duration  Discount
0    Spark  22000.0    30day    1000.0
1  PySpark  25000.0   50days    2300.0
2   Hadoop  23000.0   55days    1200.0
3   Python      inf   40days       inf
4   pandas  26000.0   60days    2500.0
5      inf  25000.0     -inf      -inf
6   Python     -inf   55days    2000.0
7     -inf  24000.0      inf    1500.0

2. pands Drop Infinite Values

By using df.replace(), replace the infinite values with the NaN values and then use the pandas.DataFrame.dropna() method to remove the rows with NaN, Null/None values. This eventually drops infinite values from pandas DataFrame. inplace=True is used to update the existing DataFrame.


# Replace infinite updated data with nan
df.replace([np.inf, -np.inf], np.nan, inplace=True)

# Drop rows with NaN
df.dropna(inplace=True)
print(df)

Yields below output. df.replace([np.inf, -np.inf], np.nan, inplace=True) replaces all np.inf & -np,inf values with NaN on current DataFrame.


# Output:
   Courses      Fee Duration  Discount
0    Spark  22000.0    30day    1000.0
1  PySpark  25000.0   50days    2300.0
2   Hadoop  23000.0   55days    1200.0
4   pandas  26000.0   60days    2500.0

3. Using pandas.option.context() to Consider Infinite as NaN

You can use with pd.option_context('mode.use_inf_as_na',True): to consider all inf as Nan within a block of code. In python with is used to specify the scope of the block. IN case if you wanted to consider all inf as Nan in a complete program the use pd.set_option('use_inf_as_na',True).

Note: For older versions, replace use_inf_as_na with use_inf_as_null.


# Changing option context to use infinite as nan

# Drop the rows with nan or infinite values
with pd.option_context('mode.use_inf_as_na', True):
  df.dropna(inplace=True)
print(df)

Yields same output as above.

4. Using pandas replace() & dropna() To Drop Infinite Values

Use df.replace() to replace entire infinite values with np.nan and use pd.DataFrame.dropna(axis=0). to drop rows. This ideally drops all infinite values from pandas DataFrame.


# Replace to drop rows or columns infinite values
df = df.replace([np.inf, -np.inf], np.nan).dropna(axis=0)
print(df)

5. Pandas Changing Option to Consider Infinite as NaN

You can do using pd.set_option() to pandas provided the option to use consider infinite as NaN. It makes the entire pandas module consider the infinite values as NaN. Use the pandas.DataFrame.dropna() method to drop the rows with infinite values.


# Changing option to consider infinite as nan
pd.set_option('mode.use_inf_as_na', True)
df.dropna(inplace=True)
print(df)

Yields same output as above.

6. Using DataFrame.isin() to Create Filter

Use filter df=df[~df_filter] to mask the infinite values.


# Using DataFrame.isin() to Create Filter
df_filter = df.isin([np.nan, np.inf, -np.inf])

# Mask df with the filter
df = df[~df_filter]
df.dropna(inplace=True)
print(df)

Yields same output as above.

7. Select Non-Null Rows Using DataFrame.replace()

You can use df[df.replace([np.inf,-np.inf],np.nan).notnull().all(axis=1)] to replace infinite and -infinite with NaN, and then select non-null rows. axis set 1 to drop columns.


# Using replace method to select non-null rows
df = df[df.replace([np.inf, -np.inf], np.nan).notnull().all(axis=1)] 
print(df)

8. Complete Example of pandas Drop Infinite Values


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas",np.inf,"Python",-np.inf],
    'Fee' :[22000,25000,23000,np.inf,26000,25000,-np.inf,24000],
    'Duration':['30day','50days','55days', '40days','60days',-np.inf,'55days',np.inf],
    'Discount':[1000,2300,1200,np.inf,2500,-np.inf,2000,1500]
                }
df = pd.DataFrame(technologies)
print(df)

# Replace infinite display updated data with nan
df.replace([np.inf, -np.inf], np.nan)
print(df)

# Replace infinite updated data with nan
df.replace([np.inf, -np.inf], np.nan, inplace=True)
# Drop rows with NaN
df.dropna(inplace=True)
print(df)

# Changing option context to use infinite as nan
with pd.option_context('mode.use_inf_as_na', True):
# Drop the rows with nan or infinite values
  df.dropna(inplace=True)
print(df)

# Replace to drop rows or columns infinite values
df = df.replace([np.inf, -np.inf], np.nan).dropna(axis=0)
print(df)

# Changing option to consider infinite as nan
pd.set_option('mode.use_inf_as_na', True)
df.dropna(inplace=True)
print(df)

# Using DataFrame.isin() to Create Filter
df_filter = df.isin([np.nan, np.inf, -np.inf])
# Mask df with the filter
df = df[~df_filter]
df.dropna(inplace=True)
print(df)

# Using replace method to select non-null rows
df = df[df.replace([np.inf, -np.inf], np.nan).notnull().all(axis=1)] 
print(df)

Conclusion

In this article, you have learned how to drop infinite values from pandas DataFrame using DataFrame.replace(), DataFrame.dropna(), and DataFrame.isin() method. Also, you have learned how to replace all infinite values with Nan or any specific values.

Happy Learning !!

References

Naveen

I am a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, I have honed my expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. My journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. I have started this SparkByExamples.com to share my experiences with the data as I come across. You can learn more about me at LinkedIn

Leave a Reply

You are currently viewing Pandas – Drop Infinite Values From DataFrame