In pandas, you can replace blank values (empty strings) with NaN using the replace()
method. In this article, I will explain the replacing blank values or empty strings with NaN in a pandas DataFrame and select columns by using either replace()
, apply()
, or mask()
functions.
Key Points –
- Blank values include empty strings and whitespace characters.
- Pandas provides multiple methods to replace blank values, such as
replace()
,mask()
, andapply()
. DataFrame.replace()
can be used with regular expressions to match blank values and replace them withNaN
.mask()
allows for conditional replacement, where values that meet a condition (e.g., empty strings) are replaced withNaN
.- Blank spaces (e.g.,
' '
) should be stripped before replacing if you want to treat them as empty values. - Use
apply()
with a lambda function to process multiple columns or perform custom replacements.
Related: You can also replace NaN values with blank/empty string.
Quick Examples of Replace Blank or Empty Values With NAN
Following are quick examples of replacing blank values or an empty string with NAN.
# Quick examples of replace blank or empty values with nan
# Replace blank values with DataFrame.replace() methods
df2 = df.replace(r'^\s*$', np.nan, regex=True)
# Using DataFrame.mask() method
df2=df.mask(df == '')
# Replace on single column
df2 = df.Courses.replace('',np.nan,regex = True)
# Replace on all selected columns
df2 = df[['Courses','Duration']].apply(lambda x: x.str.strip()).replace('', np.nan)
To run some examples of replacing blank values or an empty string with NAN, let’s create a pandas DataFrame.
# Create a Pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","","Spark","","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','','30days','','35days']
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Pandas Replace Blank Values with NaN using replace()
You can replace blank/empty values with DataFrame.replace() methods. This method replaces the specified value with another specified value on a specified column or on all columns of a DataFrame; replaces every case of the specified value.
# Replace blank values with DataFrame.replace() methods
df2 = df.replace(r'^\s*$', np.nan, regex=True)
print("After replacing blank values with NaN:\n", df2)
Yields below output.
Pandas Replace Blank Values with NaN using mask()
You can also replace blank values with NAN with DataFrame.mask()
methods. The mask()
method replaces the values of the rows where the condition evaluates to True.
# Using DataFrame.mask() method
df2=df.mask(df == '')
print("After replacing blank values with NaN:\n", df2)
Yields below output.
# Output:
# After replacing blank values with NaN:
Courses Fee Duration
0 Spark 22000 30days
1 NaN 25000 NaN
2 Spark 23000 30days
3 NaN 24000 NaN
4 PySpark 26000 35days
Pandas Replace Empty String with NaN on Single Column
Using replace()
method you can also replace empty string or blank values to a NaN on a single selected column.
# Replace on single column
df2 = df.Courses.replace('',np.nan,regex = True)
print("After replacing blank values with NaN:\n", df2)
Yields below output
# Output:
# After replacing blank values with NaN:
0 Spark
1 NaN
2 Spark
3 NaN
4 PySpark
Name: Courses, dtype: object
Replace Blank Values with NAN by Using DataFrame.apply()
Another method to replace blank values with NAN is by using the DataFrame.apply()
method along with lambda
method. The apply()
method enables the application of a function along one of the DataFrame’s axes, with the default being 0, representing the index (row) axis.
In order to use this, you need to have all columns as String type. If you have any non-string column this gives an error. Since I have a non-string column, I have selected only string columns and used the apply function.
# Replace on all selected columns
df2 = df[['Courses','Duration']].apply(lambda x: x.str.strip()).replace('', np.nan)
print("After replacing blank values with NaN:\n", df2)
Yields below output
# Output:
# After replacing blank values with NaN:
Courses Duration
0 Spark 30days
1 NaN NaN
2 Spark 30days
3 NaN NaN
4 PySpark 35days
Complete Example of Replace Blank values (Empty String) with NaN
# Create a Pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","","Spark","","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','','30days','','35days']
}
df = pd.DataFrame(technologies)
print(df)
# Replace Blank values with DataFrame.replace() methods.
df2 = df.replace(r'^\s*$', np.nan, regex=True)
print(df2)
# Using DataFrame.mask() method.
df2=df.mask(df == '')
print(df2)
# Replace on single column
df2 = df.Courses.replace('',np.nan,regex = True)
print(df2)
# Replace on all selected columns
df2 = df[['Courses','Duration']].apply(lambda x: x.str.strip()).replace('', np.nan)
print(df2)
Conclusion
In this article, I have explained the replacement blank values with NAN of pandas DataFrame by using replace()
, apply()
, mask()
methods with the examples.
Related Articles
- Rename Index of Pandas DataFrame
- Remove NaN From Pandas Series
- Count NaN Values in Pandas DataFrame
- Check Any Value is NaN in DataFrame
- How to Replace String in pandas DataFrame
- Pandas DataFrame.fillna() function explained
Pandas Series.fillna() function explained - Pandas Drop Columns with NaN or None Values
- Pandas Drop Rows with NaN Values in DataFram
- Pandas – Replace NaN Values with Zero in a Column
The code worked for me. Thanks a lot!