You can replace black values or an empty string with NAN in pandas DataFrame by using DataFrame.replace()
, DataFrame.apply()
, and DataFrame.mask()
methods. In this article, I will explain how to replace blank values with NAN on the entire DataFrame and selected columns with multiple examples.
Related: You can also replace NaN values with blank/empty string.
1. Quick Examples of Replace Blank or Empty Values With NAN
If you are in a hurry, below are some quick examples of how to replace blank values or an empty string with NAN on Pandas DataFrame.
# Below are some quick examples.
# Replace Blank values with DataFrame.replace() methods.
df2 = df.replace(r'^\s*$', np.nan, regex=True)
# Using DataFrame.mask() method.
df2=df.mask(df == '')
# Replace on single column
df2 = df.Courses.replace('',np.nan,regex = True)
# Replace on all selected columns
df2 = df[['Courses','Duration']].apply(lambda x: x.str.strip()).replace('', np.nan)
Now, Let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results to replace blank values with NAN. Our DataFrame contains column names Courses
, Fee
and Duration
.
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","","Spark","","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','','30days','','35days']
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
2. Pands Replace Blank Values with NaN using replace() Method
You can replace blank/empty values with DataFrame.replace() methods. This method replaces the specified value with another specified value on a specified column or on all columns of a DataFrame; replaces every case of the specified value.
# Replace Blank values with DataFrame.replace() methods.
df2 = df.replace(r'^\s*$', np.nan, regex=True)
print("After replacing blank values with NaN:\n", df2)
Yields below output.
3. Pandas Replace Blank Values with NaN using mask()
You can also replace blank values with NAN with DataFrame.mask()
methods. The mask()
method replaces the values of the rows where the condition evaluates to True.
# Using DataFrame.mask() method.
df2=df.mask(df == '')
print("After replacing blank values with NaN:\n", df2)
Yields below output.
# Output:
# After replacing blank values with NaN:
Courses Fee Duration
0 Spark 22000 30days
1 NaN 25000 NaN
2 Spark 23000 30days
3 NaN 24000 NaN
4 PySpark 26000 35days
4. Pandas Replace Empty String with NaN on Single Column
Using replace()
method you can also replace empty string or blank values to a NaN on a single selected column.
# Replace on single column
df2 = df.Courses.replace('',np.nan,regex = True)
print("After replacing blank values with NaN:\n", df2)
Yields below output
# Output:
# After replacing blank values with NaN:
0 Spark
1 NaN
2 Spark
3 NaN
4 PySpark
Name: Courses, dtype: object
5. Replace Blank Values with NAN by Using DataFrame.apply() Method
Another method to replace blank values with NAN is by using the DataFrame.apply()
method along with lambda
functions. The apply()
method allows you to apply a function along with one of the axes of the DataFrame, default 0
, which is the index (row) axis.
In order to use this, you need to have all columns as String type. If you have any non-string column this gives an error. Since I have a non-string column, I have selected only string columns and used the apply function.
# Replace on all selected columns
df2 = df[['Courses','Duration']].apply(lambda x: x.str.strip()).replace('', np.nan)
print("After replacing blank values with NaN:\n", df2)
Yields below output
# Output:
# After replacing blank values with NaN:
Courses Duration
0 Spark 30days
1 NaN NaN
2 Spark 30days
3 NaN NaN
4 PySpark 35days
6. Complete Example of Replace Blank values (Empty String) with NaN
Below is a complete example of how to replace blank values with Nan values on DataFrame.
# Create a Pandas DataFrame.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark","","Spark","","PySpark"],
'Fee' :[22000,25000,23000,24000,26000],
'Duration':['30days','','30days','','35days']
}
df = pd.DataFrame(technologies)
print(df)
# Replace Blank values with DataFrame.replace() methods.
df2 = df.replace(r'^\s*$', np.nan, regex=True)
print(df2)
# Using DataFrame.mask() method.
df2=df.mask(df == '')
print(df2)
# Replace on single column
df2 = df.Courses.replace('',np.nan,regex = True)
print(df2)
# Replace on all selected columns
df2 = df[['Courses','Duration']].apply(lambda x: x.str.strip()).replace('', np.nan)
print(df2)
Conclusion
In this article, You have learned how to replace blank values with NAN of pandas DataFrame by using DataFrame.replace()
, DataFrame.apply()
, DataFrame.mask()
methods with the examples.
Related Articles
- Filter Rows with NAN Value from Pandas DataFrame Column
- Create Test and Train Samples from Pandas DataFrame
- How to Print Pandas DataFrame without Index
- Rename Index Values of Pandas DataFrame
- Rename Index of Pandas DataFrame
- Pandas Drop Columns with NaN or None Values
- Pandas Drop Rows with NaN Values in DataFram
- Remove NaN From Pandas Series
- Pandas – Replace NaN Values with Zero in a Column
- Count NaN Values in Pandas DataFrame
- Pandas – Check Any Value is NaN in DataFrame
- Pandas Replace Column value in DataFrame
- pandas replace values based on condition
- Pandas Replace substring in DataFrame
- How to Replace String in pandas DataFrame
- Pandas – Replace NaN Values with Zero in a Column
- Pandas DataFrame.fillna() function explained
Pandas Series.fillna() function explained