How to remove rows with NA values (missing values) from R DataFrame (data.frame)? NA stands for Not Available and it is not a number that is considered a missing value. So our task is to remove the rows that contain all NA values from the R data frame. We will see how to remove rows that contain some NAs and contains all NA.
In this article, we will see how to remove rows with some and all NAs from the R data frame. If a row contains some NA’s the following methods are used to drop these rows however, you can also replace NA with 0 or replace NA with empty string.
na.omit()
complete.cases()
rowSums()
drop_na()
If a row contains all NA, these two methods are used.
rowSums()
with ncolfilter()
withrowSums()
1. Quick Examples of Remove Rows with NA Values
Following are quick examples of how to remove/delete rows with NA on R DataFrame (data.frame).
# Quick Examples
#Remove rows with NA's using na.omit()
df <- na.omit(df)
#Remove rows with NA's using complete.cases
df <- df[complete.cases(df), ]
#Remove rows with NA's using rowSums()
df <- df[rowSums(is.na(df)) == 0, ]
#Import the tidyr package
library("tidyr")
#Remove rows with NA's using drop_na()
df <- df %>% drop_na()
#Remove rows that contains all NA's
df <- df[rowSums(is.na(df)) != ncol(df), ]
#Load the dplyr package
library("dplyr")
#Remove rows that contains all NA's
df <- filter(df, rowSums(is.na(df)) != ncol(df))
Let’s create a data frame with 5 rows and 3 columns such that one row contains all NA and some rows contain at least one NA.
#create dataframe with 5 rows and 3 columns
df=data.frame(id=c(2,1,3,4,NA),
name=c('sravan',NA,'chrisa','shivgami',NA),
gender=c(NA,'m',NA,'f',NA))
#display dataframe
print(df)
Output:
# Output
id name gender
1 2 sravan <NA>
2 1 <NA> m
3 3 chrisa <NA>
4 4 shivgami f
5 NA <NA> <NA>
2. Remove Rows with NA From R Dataframe
By using na.omit()
, complete.cases()
, rowSums()
, and drop_na()
methods you can remove rows that contain NA ( missing values) from R data frame. Let’s see an example for each of these methods.
2.1. Remove Rows with NA using na.omit()
In this method, we will use na.omit()
to delete rows that contain some NA values.
Syntax:
# Syntax
na.omit(df)
Where df
is the input data frame
Example:
In this example, we will apply na.omit()
to drop rows with some NA’s.
#Remove rows with NA's using na.omit()
print(na.omit(df))
Output:
# Output
id name gender
4 4 shivgami f
Notice that the above resultant data frame has no rows with NA values.
2.2. Remove Rows with NA’s using complete.cases()
In this method, we will use complete.cases()
to remove rows that contain some NA values.
Syntax:
# Syntax
df[complete.cases(df), ]
Example:
In this example, we will apply complete.cases()
to remove rows with some NA’s.
#Remove rows with NA's using complete.cases
print(df[complete.cases(df), ] )
Output:
# Output
id name gender
4 4 shivgami f
We can see that the above row has no NAs.
2.3. Remove rows with NA’s using rowSums()
In this method, we will use rowSums()
to remove rows that contain some NA values. It will take is.na()
parameter that checks if the value equals NA, if it is TRUE
, then rowSums()
is used to calculate the sum of rows. If it is equal to 0.
Syntax:
# Syntax
df[rowSums(is.na(df)) == 0, ]
Example:
In this example, we will apply rowSums()
to remove rows with some NA’s.
#Remove rows with NA's using rowSums()
print(df[rowSums(is.na(df)) == 0, ] )
Output:
# Output
id name gender
4 4 shivgami f
2.4. Remove rows with NA’s using drop_na()
drop_na()
will drop the rows that contain at least one NA value. It is available in tidyr
package. tidyr
is a third party library hence, in order to use tidyr
library, you need to first install it by using install.packages('tidyr')
. Once installation completes, load the tidyr
library in order to use this dro_na()
method. To load a library in R use library("tidyr")
.
Syntax:
# Syntax
df %>% drop_na()
where df
is the input data frame and %>%
loads the method to the data frame.
Example:
In this example, we will apply drop_na()
to remove rows with some NA’s.
#import the tidyr package
library("tidyr")
#remove rows with NA's using drop_na()
print(df %>% drop_na())
Output:
# Output
id name gender
4 4 shivgami f
3. Remove Rows Contain all NA Values in R Dataframe
Above examples, we have seen how to remove rows that have NA on any columns. In this section, we will remove the rows with NA on all columns in an R data frame (data.frame).
3.1. Remove Rows with All NA’s using rowSums() with ncol
Here, we are comparing rowSums()
count with ncol()
count, if they are not equal, we can say that row doesn’t contain all NA values. Hence the row that contains all NA will not be selected. When the counts are equal then the row is considered with all NA values and the row is considered to remove from the R dataframe.
Syntax:
# Syntax
df[rowSums(is.na(df)) != ncol(df), ]
Example:
In this example, we will apply rowSums()
and ncol()
methods to remove rows with all NA’s.
#Remove rows that contains all NA's
print(df[rowSums(is.na(df)) != ncol(df), ])
Output:
# Output
id name gender
1 2 sravan <NA>
2 1 <NA> m
3 3 chrisa <NA>
4 4 shivgami f
We can see that the id with value 5 row is deleted since it contains all NA values.
3.2. Delete Rows with NA’s using filter() with rowSums()
It is similar to the above method, we are comparing rowSums()
count with ncol()
count, if they are not equal, we can say that row doesn’t contain all NA values. Hence the row that contains all NA will not be selected. To do this, we have to use the filter()
method.
Syntax:
# Syntax
filter(df, rowSums(is.na(df)) != ncol(df))
Example:
In this example, we will apply filter()
with rowSums()
to remove rows with all NA’s.
#Load the dplyr package
library("dplyr")
#Remove rows that contains all NA's
print(filter(df, rowSums(is.na(df)) != ncol(df)))
Output:
# Output
id name gender
1 2 sravan <NA>
2 1 <NA> m
3 3 chrisa <NA>
4 4 shivgami f
We can see that the id-5 row is deleted since it contains all NA.
4. Conclusion
From this article, we have seen how to remove the rows that contain NA values from R dataframe. If you want to remove the rows that contain all NA values, you can use rowSums()
and rowSums()
with a filter from the dplyr
package. If you want to delete the rows that some NA values, you can use rowSums()
, drop_na()
from tidyr
package, na.omit()
and complete.cases()
Related Articles
- Rename Column From R Dataframe
- How to Replace Empty String with NA in R
- How to Replace Zero (0) with NA on R Dataframe Column
- How to Replace NA with 0 on Multiple Columns in R
- names() Function in R with Examples
- R Join on Different Column Names
- R Remove From Vector with Examples
- R Remove Duplicates From Vector