• Post author:
• Post category:R Programming

The is.na() function is a built-in R function that is used to check the occurrence of (missing values) NA values in an R. It returns a logical vector or array of the same shape as the input, where each element is TRUE if the corresponding element in the input is NA, and FALSE otherwise. NA values negatively impact the quality of our data. Luckily, the R programming language offers a is.na() function to handle this missing data.

In this article, I will explain the R base is.na() function, including its syntax, parameters, and usage. I will demonstrate how we can handle data by checking, removing, replacing, and counting NA values using this function.

Key points-

• The is.na() function in R is used to check for missing values (NAs) in various data structures such as vectors, matrices, data frames, and lists.
• This function returns a logical vector, matrix, or array of the same shape as the input, indicating TRUE for NAs and FALSE for non-NAs.
• Placing an exclamation mark (!) before is.na() reverses its effect, identifying non-NA values instead. In this case, TRUE indicates a value that is not NA in R.
• You can remove NA values from vectors or data frames by using !is.na().
• NA values can be replaced with a specified value, such as zero, or empty string using is.na().
• Use sum(is.na(x)) to count the number of NA values in an object, and colSums(is.na(df)) to count NAs in each column of a data frame.

## R is.na() Function

The is.na() function in R is used to detect missing values (NAs) in vectors, data frames, and other R objects. Since missing values can greatly impact data analysis and results, this function is essential for data cleaning and preprocessing when dealing with incomplete data sets.

### Syntax of is.na() Function

Let’s know the syntax of the is.na() function.

# Syntax of is.na() function
is.na(x)

### Parameters

It allows only one parameter.

<strong>x</strong>: The R object is to be checked for NA values. This can be a vector, matrix, data frame, list, or any other R object.

### Return Value

The is.na() function returns a logical vector, matrix, or array of the same shape as the input x. Each element of the output is TRUE if the corresponding element of x is NA, and FALSE otherwise.

## Check NA Values in Vector using R is.na() Function

Let’s create a vector containing NA values and apply the is.na() function to it. This function will check each element of the vector to verify if it is an NA (missing value), and return a logical vector of the same length. In this resulting vector, each element will be TRUE if the corresponding element in the original vector is NA, and FALSE otherwise.

# create vector having NA values.
x = c(10, 20, NA, 60, NA, 40)
print("Check the presence of NA values in a Vector ")
is.na(x)

Yields below output.

## Check NA Values in R DataFrame

In this example, you can use the is.na() function to check for the presence of NA values in a data frame. Let’s create a data frame with three columns, each containing some NA (missing) values. The is.na() function will return a logical matrix of the same size as the input data frame. In this resulting matrix, each TRUE value represents an NA value, and each FALSE value represents a non-NA value.

# Create dataframe with 5 rows and 3 columns
df = data.frame(id = c(2,1,3,4,NA),
name = c('sravan',NA,'chrisa','shivgami',NA),
gender = c(NA,'m',NA,'f',NA))

# Display dataframe
print("Check the presence of NA values in a data frame ")
is.na(df)

Yields below output.

## Remove NA Values from Vector using is.na()

To remove NA values from a vector in R, you can use the negation operator ! before the is.na() function. This operation filters out the NA values and returns a vector containing only the remaining non-NA values.

# Remove NA values from vector using is.na() function
clean_vec = x[!is.na(x)]
print("After removing NA values from vector")
print(clean_vec)

# Output:
# [1] "After removing NA values from vector"
# [1] 10 20 60 40

## Remove NA Values from Dataframe

Alternatively, you can use the is.na() function on a data frame to remove NA values. By applying this function, you can get a vector of non-NA values from the data frame instead of maintaining its structure. Let’s see how it’s going to work.

# Remove NA values using is.na() function
df1 = df[!is.na(df)]
print("After removing NA values from data farme ")
print(df1)

# Output:
# [1] "After removing NA values from data farme "
# [1] " 2"       " 1"       " 3"       " 4"       "sravan"   "chrisa"   "shivgami" "m"
# [9] "f"

## Replace NA Values of the Data frame using is.na()

For data cleaning, you can not only remove NA values but also replace them with a specified value. In this example, I will replace NA values in a data frame with zero using the is.na() function.

# Replacing NA values with zero in a data frame
df[is.na(df)] <- 0
print("After replacing NA values with zero")
print(df)

# Output:
# [1] "After replacing NA values with zero"
#   id     name gender
# 1  2   sravan      0
# 2  1        0      m
# 3  3   chrisa      0
# 4  4 shivgami      f
# 5  0        0      0

You can also replace the NA values in specified columns of a data frame using the is.na() function. Let’s see how to replace the NA values in a specified column with zero.

# Replacing NA values with zero in specified column
df\$gender[is.na(df\$gender)] <- 0
print("After replacing NA values with zero of gender column")
print(df)

# Output:
# [1] "After replacing NA values with zero of gender column"
#   id     name gender
# 1  2   sravan      0
# 2  1    <NA>      m
# 3  3   chrisa      0
# 4  4 shivgami      f
# 5  NA   <NA>     0

## Count NA Values using is.na() Function

you can easily count the number of NA values in different parts of a data frame using built-in R functions such as the sum() function and is.na() function. Use the sum() function on the boolean array generated from the is.na() function to get the count of NA values across the entire data frame.

# Count of NA values of data frame
na_count <- sum(is.na(df))
print("Get the count of NA values of data frame:")
print(na_count)

# Output:
# [1] "Get the count of NA values of data frame:"
# [1] 6

### Count NA values of Specified Column

You can also get the count of a specified column of the data frame using the colSums() function along with the is.na() function.

# Get the count of NA values of specified column
na_count <- colSums(is.na(df\$gender))
print("Get the count of NA values of specified column:")
print(na_count)

# Output:
# [1] "Get the count of NA values of specified column:"
# [1] 3

### Count NA Values of Each Column using is.na()

Finally, you can determine the number of missing values in each column of a data frame by using the combination of the previously mentioned functions. You can pass the entire data frame to the is.na() function, which will return the count of missing values for each column.

# Get the count of NA values in each column using colSum()
na_count <- colSums(is.na(df))
print("Get the count of NA values in each column:")
print(na_count)

# Output:
# [1] "Get the count of NA values in each column:"
#   id   name gender
#     1      2      3

## Conclusion

In this article, I have explained the is.na() function is a powerful and versatile tool for handling missing data in R. This function efficiently manages missing values by identifying, removing, replacing, and counting NA values, thus improving the quality of your data sets and the reliability of your analytical results.