You are currently viewing R Filter DataFrame by Column Value


To filter a data frame by column value in R, you can use the filter() function from the dplyr package. The filter() function, a key verb in dplyr’s grammar of data manipulation, is commonly used by data science analysts for various data manipulation tasks. All dplyr verbs, including, the filter() function accept a data frame as input and return a data frame object.

Advertisements

To use dplyr filter() function, you have to install it first using install.packages('dplyr') and load it using library(dplyr). Alternatively, you can also use the R subset() function to get the same result.

Quick Examples of Filter Dataframe by Column Value

Following are quick examples of how to filter the data frame rows by column value.


# Quick Examples

# Filter Rows by column value
filter(df, gender == 'M')

# Filter Rows by list of column Values
filter(df, state %in% c('CA','AZ','DE'))

# Filter Rows by Checking values on Multiple Columns
filter(df, gender == 'M' & id >11)

# Filter DataFrame by column name id and name.
subset(df,gender == 'M',select = c('id','name'))

Let’s create an R data frame with different types of columns.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram','deepika','sahithi'),
  gender = c('M','M','F','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  state = c('CA','NY','DE',NA),
  row.names=c('r1','r2','r3','r4')
)
df

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika      F 1987-06-14    DE
r4 13 sahithi      F 1985-08-16  <NA>

Filter R Rows by Column Value

Let’s use the filter() function to get the data frame rows based on a column value. The following example gets all rows where the column gender is equal to the value 'M'. Note that the filter() takes the input data frame as the first argument and the second should be a condition you want to apply.


# Load dplyr package
library(dplyr)

# Using filter()
filter(df, gender == 'M')

Yields below output.


# Output
   id name gender        dob state
r1 10  sai      M 1990-10-02    CA
r2 11  ram      M 1981-03-24    NY

Filter Dataframe Based on Specific Values

Alternatively, you can use the filter() function to filter the data frame rows based on multiple column values. To do this, you can use the %in% operator to specify the column values you want to use for the filtering process. Let’s apply the function to the data frame along with specified column values to get the new data frame with filtered rows.


# Filter Rows by list of Column Values
filter(df, state %in% c("CA", "NY",'DE'))

For example, the code below has filtered all rows where the state column contains any of the values in the vector c('CA', 'AZ', 'PH').

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika      F 1987-06-14    DE

Filter Dataframe by R subset() Function

So far, we have learned how to filter the data frame in multiple ways using the dplyr package. Now we will learn how to filter a data frame based on multiple conditions using R base functions.

You can pass the data frame and multiple conditions, combined using the logical AND (&) operator, into this function. It will filter the rows of the data frame according to the specified logical conditions.


# Filter Rows based on Multiple Columns
subset(df, gender == 'M' & id >10)

The above code filters the data frame rows where the gender column is 'M' and the id column value is greater than 10.

Yields below output.


# Output
   id name gender        dob state
r2 11  ram      M 1981-03-24    NY

Conclusion

In this article, I have explained the process of filtering the data frame(data.frame) by column value in R. You can do this by using the filter() function from the dplyr package. dplyr is a package that provides a grammar of data manipulation, and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation. All dplyr verbs take input as data frame and return data frame object.

References