R – Remove Rows with NA Values (missing values)

How to remove rows with NA values (missing values) from R DataFrame (data.frame)? NA stands for Not Available and it is not a number that is considered a missing value. So our task is to remove the rows that contain all NA values from the R data frame. We will see how to remove rows that contain some NAs and contains all NA.

In this article, we will see how to remove rows with some and all NAs from the R data frame. If a row contains some NA’s the following methods are used to drop these rows however, you can also replace NA with 0 or replace NA with empty string.

  1. na.omit()
  2. complete.cases()
  3. rowSums()
  4. drop_na()

If a row contains all NA, these two methods are used.

  1. rowSums() with ncol
  2. filter() with rowSums()

1. Quick Examples of Remove Rows with NA Values

Following are quick examples of how to remove/delete rows with NA on R dataframe (data.frame).


# Quick Examples

#Remove rows with NA's using na.omit()
df <- na.omit(df)

#Remove rows with NA's using complete.cases
df <- df[complete.cases(df), ] 

#Remove rows with NA's using rowSums()
df <- df[rowSums(is.na(df)) == 0, ] 

#Import the tidyr package                 
library("tidyr")

#Remove rows with NA's using drop_na()
df <- df %>% drop_na()

#Remove rows that contains all NA's
df <- df[rowSums(is.na(df)) != ncol(df), ]

#Load the dplyr package                      
library("dplyr") 

#Remove rows that contains all NA's
df <- filter(df, rowSums(is.na(df)) != ncol(df))

Let’s create a dataframe with 5 rows and 3 columns such that one row contains all NA and some rows contain at least one NA.


#create dataframe with 5 rows and 3 columns
df=data.frame(id=c(2,1,3,4,NA),
       name=c('sravan',NA,'chrisa','shivgami',NA),
       gender=c(NA,'m',NA,'f',NA))

#display dataframe
print(df)

Output:


# Output
  id     name gender
1  2   sravan   <NA>
2  1     <NA>      m
3  3   chrisa   <NA>
4  4 shivgami      f
5 NA     <NA>   <NA>

2. Remove Rows with NA Values From R Dataframe

By using na.omit(), complete.cases(), rowSums(), and drop_na() methods you can remove rows that contain NA ( missing values) from R dataframe. Let’s see an example for each of these methods.

2.1. Remove Rows with NA using na.omit()

In this method, we will use na.omit() to delete rows that contain some NA values.

Syntax:


# Syntax
na.omit(df)

where df is the input dataframe

Example:

In this example, we will apply na.omit() to drop rows with some NA’s.


#Remove rows with NA's using na.omit()
print(na.omit(df))

Output:


# Output
  id     name gender
4  4 shivgami      f

Notice that the above resultant dataframe has no rows with NA values.

2.2. Remove Rows with NA’s using complete.cases()

In this method, we will use complete.cases() to remove rows that contain some NA values.

Syntax:


# Syntax
df[complete.cases(df), ] 

where df is the input dataframe

Example:

In this example, we will apply complete.cases() to remove rows with some NA’s.


#Remove rows with NA's using complete.cases
print(df[complete.cases(df), ] )

Output:


# Output
  id     name gender
4  4 shivgami      f

We can see that the above row has no NAs.

2.3. Remove rows with NA’s using rowSums()

In this method, we will use rowSums() to remove rows that contain some NA values. It will take is.na() parameter that checks if the value equals NA, if it is TRUE, then rowSums() is used to calculate the sum of rows. If it is equal to 0.

Syntax:


# Syntax
df[rowSums(is.na(df)) == 0, ] 

where df is the input dataframe

Example:

In this example, we will apply rowSums() to remove rows with some NA’s.


#Remove rows with NA's using rowSums()
print(df[rowSums(is.na(df)) == 0, ]  )

Output:


# Output
  id     name gender
4  4 shivgami      f

2.4. Remove rows with NA’s using drop_na()

drop_na() will drop the rows that contain at least one NA value. It is available in tidyr package. tidyr is a third party library hence, in order to use tidyr library, you need to first install it by using install.packages('tidyr'). Once installation completes, load the tidyr library in order to use this dro_na() method. To load a library in R use library("tidyr").

Syntax:


# Syntax
df %>% drop_na()

where df is the input dataframe and %>% loads the method to the dataframe.

Example:

In this example, we will apply drop_na() to remove rows with some NA’s.


#import the tidyr package                 
library("tidyr")

#remove rows with NA's using drop_na()
print(df %>% drop_na())

Output:


# Output
  id     name gender
4  4 shivgami      f

3. Remove Rows Contain all NA Values in R Dataframe

Above examples we have seen how to removed rows that has NA on any columns. In this section, we will drop the rows that contain all NA values in a R dataframe (data.frame). Row contains all NA values on all columns.

3.1. Delete Rows with All NA’s using rowSums() with ncol

Here, we are comparing rowSums() count with ncol() count, if they are not equal, we can say that row doesn’t contain all NA values. Hence the row that contains all NA will not be selected. When the counts are equal then the row will be deleted from R dataframe.

Syntax:


# Syntax
df[rowSums(is.na(df)) != ncol(df), ]

where df is the input dataframe

Example:

In this example, we will apply rowSums() and ncol() methods to remove rows with all NA’s.


#Remove rows that contains all NA's
print(df[rowSums(is.na(df)) != ncol(df), ])

Output:


# Output
   id     name gender
1  2   sravan   <NA>
2  1     <NA>      m
3  3   chrisa   <NA>
4  4 shivgami      f

We can see that the id with value 5 row is deleted since it contains all NA values.

3.2. Delete Rows with NA’s using filter() with rowSums()

It is similar to the above method, we are comparing rowSums() count with ncol() count, if they are not equal, we can say that row doesn’t contain all NA values. Hence the row that contains all NA will not be selected. To do this, we have to use the filter() method.

Syntax:


# Syntax
filter(df, rowSums(is.na(df)) != ncol(df))

where df is the input dataframe

Example:

In this example, we will apply filter() with rowSums() to remove rows with all NA’s.


#Load the dplyr package                      
library("dplyr") 

#Remove rows that contains all NA's
print(filter(df, rowSums(is.na(df)) != ncol(df)))

Output:


# Output
  id     name gender
1  2   sravan   <NA>
2  1     <NA>      m
3  3   chrisa   <NA>
4  4 shivgami      f

We can see that the id-5 row is deleted since it contains all NA.

4. Conclusion

From this article, we have seen how to remove the rows that contain NA values from R dataframe. If you want to delete the rows that contain all NA values, you can use rowSums() and rowSums() with a filter from the dplyr package. If you want to delete the rows that some NA values, you can use rowSums(), drop_na() from tidyr package, na.omit() and complete.cases()

References

  1. Missing values in R
  2. rowSums()

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing R – Remove Rows with NA Values (missing values)