Pandas Replace Blank Values (empty) with NaN

You can replace black values or empty string with NAN in pandas DataFrame by using DataFrame.replace(), DataFrame.apply(), and DataFrame.mask() methods. In this article, I will explain how to replace blank values with NAN on the entire DataFrame and selected columns with some examples

1. Quick Examples of Replace Blank or Empty Values With NAN

If you are in hurry, below are some quick examples of how to replace blank values or empty string with NAN on pandas DataFrame.


# Below are some quick examples.
# Replace Blank values with DataFrame.replace() methods.
df2 = df.replace(r'^\s*$', np.nan, regex=True)

# Using DataFrame.mask() method.
df2=df.mask(df == '')

# Replace on single column
df2 = df.Courses.replace('',np.nan,regex = True)

# Replace on all selected columns
df2 = df[['Courses','Duration']].apply(lambda x: x.str.strip()).replace('', np.nan)

Now, Let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results to replace blank values with NAN. Our DataFrame contains column names CoursesFee and Duration.


# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","","Spark","","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','','30days','','35days']
          }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee Duration
0    Spark  22000   30days
1           25000         
2    Spark  23000   30days
3           24000         
4  PySpark  26000   35days

2. Pands Replace Blank Values with NaN using replace() Method

You can replace blank/empty values with DataFrame.replace() methods. The replace() method replaces the specified value with another specified value on a specified column or on all columns of a DataFrame; replaces every case of the specified value.


# Replace Blank values with DataFrame.replace() methods.
df2 = df.replace(r'^\s*$', np.nan, regex=True)
print(df2)

Yields below output.


   Courses    Fee Duration
0    Spark  22000   30days
1      NaN  25000      NaN
2    Spark  23000   30days
3      NaN  24000      NaN
4  PySpark  26000   35days

3. Pandas Replace Blank Values with NaN using mask()

You can also replace blank values with NAN with DataFrame.mask() methods. The mask() method replaces the values of the rows where the condition evaluates to True.


# Using DataFrame.mask() method.
df2=df.mask(df == '')
print(df2)

Yields below output.


   Courses    Fee Duration
0    Spark  22000   30days
1      NaN  25000      NaN
2    Spark  23000   30days
3      NaN  24000      NaN
4  PySpark  26000   35days

4. Pandas Replace Empty String with NaN on Single Column

Using replace() method you can also replace empty string or blank values to a NaN on a single selected column.


# Replace on single column
df2 = df.Courses.replace('',np.nan,regex = True)
print(df2)

Yields below output


0      Spark
1        NaN
2      Spark
3        NaN
4    PySpark
Name: Courses, dtype: object

5. Replace Blank Values with NAN by Using DataFrame.apply() Method

Another method to replace blank values with NAN is by using DataFrame.apply() method and lambda functions. The apply() method allows you to apply a function along with one of the axis of the DataFrame, default 0, which is the index (row) axis.

In order to use this, you need to have all columns as String type. If you have any non-string column this gives an error. Since I have a non-string column, I have selected only string columns and used apply function.


# Replace on all selected columns
df2 = df[['Courses','Duration']].apply(lambda x: x.str.strip()).replace('', np.nan)
print(df2)

Yields below output


   Courses Duration
0    Spark   30days
1      NaN      NaN
2    Spark   30days
3      NaN      NaN
4  PySpark   35days

6. Complete Example of Replace Blank values (Empty String) with NaN

Below is a complete example of how to replace blank values with Nan values on DataFrame.


# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark","","Spark","","PySpark"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','','30days','','35days']
          }
df = pd.DataFrame(technologies)
print(df)


# Replace Blank values with DataFrame.replace() methods.
df2 = df.replace(r'^\s*$', np.nan, regex=True)
print(df2)

# Using DataFrame.mask() method.
df2=df.mask(df == '')
print(df2)

# Replace on single column
df2 = df.Courses.replace('',np.nan,regex = True)
print(df2)

# Replace on all selected columns
df2 = df[['Courses','Duration']].apply(lambda x: x.str.strip()).replace('', np.nan)
print(df2)

Conclusion

In this article, You have learned how to replace blank values with NAN of pandas DataFrame by using DataFrame.replace(), DataFrame.apply(), DataFrame.mask() methods with the examples.

You May Also Like

References

Leave a Reply

You are currently viewing Pandas Replace Blank Values (empty) with NaN