How to Remove Rows with NA in R

  • Post author:

How to remove rows with NA values (missing values) from R DataFrame (data.frame)? NA stands for Not Available and it is not a number that is considered a missing value. So our task is to remove the rows that contain all NA values from the R data frame. We will see how to remove rows that contain some NAs and contains all NA.

In this article, we will see how to remove rows with some and all NAs from the R data frame. If a row contains some NA’s the following methods are used to drop these rows however, you can also replace NA with 0 or replace NA with empty string.

  1. na.omit()
  2. complete.cases()
  3. rowSums()
  4. drop_na()

If a row contains all NA, these two methods are used.

  1. rowSums() with ncol
  2. filter() with rowSums()

1. Quick Examples of Remove Rows with NA Values

Following are quick examples of how to remove/delete rows with NA on R DataFrame (data.frame).


# Quick Examples

#Remove rows with NA's using na.omit()
df <- na.omit(df)

#Remove rows with NA's using complete.cases
df <- df[complete.cases(df), ] 

#Remove rows with NA's using rowSums()
df <- df[rowSums(is.na(df)) == 0, ] 

#Import the tidyr package                 
library("tidyr")

#Remove rows with NA's using drop_na()
df <- df %>% drop_na()

#Remove rows that contains all NA's
df <- df[rowSums(is.na(df)) != ncol(df), ]

#Load the dplyr package                      
library("dplyr") 

#Remove rows that contains all NA's
df <- filter(df, rowSums(is.na(df)) != ncol(df))

Let’s create a data frame with 5 rows and 3 columns such that one row contains all NA and some rows contain at least one NA.


#create dataframe with 5 rows and 3 columns
df=data.frame(id=c(2,1,3,4,NA),
       name=c('sravan',NA,'chrisa','shivgami',NA),
       gender=c(NA,'m',NA,'f',NA))

#display dataframe
print(df)

Output:


# Output
  id     name gender
1  2   sravan   <NA>
2  1     <NA>      m
3  3   chrisa   <NA>
4  4 shivgami      f
5 NA     <NA>   <NA>

2. Remove Rows with NA From R Dataframe

By using na.omit(), complete.cases(), rowSums(), and drop_na() methods you can remove rows that contain NA ( missing values) from R data frame. Let’s see an example for each of these methods.

2.1. Remove Rows with NA using na.omit()

In this method, we will use na.omit() to delete rows that contain some NA values.

Syntax:


# Syntax
na.omit(df)

Where df is the input data frame

Example:

In this example, we will apply na.omit() to drop rows with some NA’s.


#Remove rows with NA's using na.omit()
print(na.omit(df))

Output:


# Output
  id     name gender
4  4 shivgami      f

Notice that the above resultant data frame has no rows with NA values.

2.2. Remove Rows with NA’s using complete.cases()

In this method, we will use complete.cases() to remove rows that contain some NA values.

Syntax:


# Syntax
df[complete.cases(df), ] 

Example:

In this example, we will apply complete.cases() to remove rows with some NA’s.


#Remove rows with NA's using complete.cases
print(df[complete.cases(df), ] )

Output:


# Output
  id     name gender
4  4 shivgami      f

We can see that the above row has no NAs.

2.3. Remove rows with NA’s using rowSums()

In this method, we will use rowSums() to remove rows that contain some NA values. It will take is.na() parameter that checks if the value equals NA, if it is TRUE, then rowSums() is used to calculate the sum of rows. If it is equal to 0.

Syntax:


# Syntax
df[rowSums(is.na(df)) == 0, ] 

Example:

In this example, we will apply rowSums() to remove rows with some NA’s.


#Remove rows with NA's using rowSums()
print(df[rowSums(is.na(df)) == 0, ]  )

Output:


# Output
  id     name gender
4  4 shivgami      f

2.4. Remove rows with NA’s using drop_na()

drop_na() will drop the rows that contain at least one NA value. It is available in tidyr package. tidyr is a third party library hence, in order to use tidyr library, you need to first install it by using install.packages('tidyr'). Once installation completes, load the tidyr library in order to use this dro_na() method. To load a library in R use library("tidyr").

Syntax:


# Syntax
df %>% drop_na()

where df is the input data frame and %>% loads the method to the data frame.

Example:

In this example, we will apply drop_na() to remove rows with some NA’s.


#import the tidyr package                 
library("tidyr")

#remove rows with NA's using drop_na()
print(df %>% drop_na())

Output:


# Output
  id     name gender
4  4 shivgami      f

3. Remove Rows Contain all NA Values in R Dataframe

Above examples, we have seen how to remove rows that have NA on any columns. In this section, we will remove the rows with NA on all columns in an R data frame (data.frame).

3.1. Remove Rows with All NA’s using rowSums() with ncol

Here, we are comparing rowSums() count with ncol() count, if they are not equal, we can say that row doesn’t contain all NA values. Hence the row that contains all NA will not be selected. When the counts are equal then the row is considered with all NA values and the row is considered to remove from the R dataframe.

Syntax:


# Syntax
df[rowSums(is.na(df)) != ncol(df), ]

Example:

In this example, we will apply rowSums() and ncol() methods to remove rows with all NA’s.


#Remove rows that contains all NA's
print(df[rowSums(is.na(df)) != ncol(df), ])

Output:


# Output
   id     name gender
1  2   sravan   <NA>
2  1     <NA>      m
3  3   chrisa   <NA>
4  4 shivgami      f

We can see that the id with value 5 row is deleted since it contains all NA values.

3.2. Delete Rows with NA’s using filter() with rowSums()

It is similar to the above method, we are comparing rowSums() count with ncol() count, if they are not equal, we can say that row doesn’t contain all NA values. Hence the row that contains all NA will not be selected. To do this, we have to use the filter() method.

Syntax:


# Syntax
filter(df, rowSums(is.na(df)) != ncol(df))

Example:

In this example, we will apply filter() with rowSums() to remove rows with all NA’s.


#Load the dplyr package                      
library("dplyr") 

#Remove rows that contains all NA's
print(filter(df, rowSums(is.na(df)) != ncol(df)))

Output:


# Output
  id     name gender
1  2   sravan   <NA>
2  1     <NA>      m
3  3   chrisa   <NA>
4  4 shivgami      f

We can see that the id-5 row is deleted since it contains all NA.

4. Conclusion

From this article, we have seen how to remove the rows that contain NA values from R dataframe. If you want to remove the rows that contain all NA values, you can use rowSums() and rowSums() with a filter from the dplyr package. If you want to delete the rows that some NA values, you can use rowSums(), drop_na() from tidyr package, na.omit() and complete.cases()

Related Articles

References

  1. Missing values in R
  2. rowSums()
R Remove rows na

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing How to Remove Rows with NA in R