You are currently viewing R Filter DataFrame by Column Value

How to filter the data frame (DataFrame) by column value in R? By using R base df[] notation, or filter() from dplyr you can easily filter the DataFrame (data.frame) by column value. filter() is a verb from dplyr package.

dplyr is a package that provides a grammar of data manipulation and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation. All dplyr verbs take input as data.frame and return data.frame object.

In order to use dplyr filter() function, you have to install it first using install.packages('dplyr') and load it using library(dplyr). Alternatively, you can also use the R subset() function to get the same result.

1. Quick Examples of Filter DataFrame by Column Value

Following are quick examples of how to filter the DataFrame to get the rows by column value and subset columns by column name in R.


# Quick Examples

# Filter Rows by column value
filter(df, gender == 'M')

# Filter Rows by list of column Values
filter(df, state %in% c('CA','AZ','DE'))

# Filter Rows by Checking values on Multiple Columns
filter(df, gender == 'M' & id >11)

# Filter DataFrame by column name id and name.
subset(df,gender == 'M',select = c('id','name'))

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram','deepika','sahithi'),
  gender = c('M','M','F','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  state = c('CA','NY','DE',NA),
  row.names=c('r1','r2','r3','r4')
)
df

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika      F 1987-06-14    DE
r4 13 sahithi      F 1985-08-16  <NA>

2. Filter Rows by Column Value

Let’s use the filter() function to get the data frame rows based on a column value. The following example gets all rows where the column gender is equal to the value 'M'. Note that the filter() takes the input data frame as the first argument and the second should be a condition you want to apply.


# Load dplyr package
library(dplyr)

# Using filter()
filter(df, gender == 'M')

Yields below output.


# Output
   id name gender        dob state
r1 10  sai      M 1990-10-02    CA
r2 11  ram      M 1981-03-24    NY

3. Filter Rows by list of Column Values

By using the same option, you can also use an operator %in% to filter the DataFrame rows based on a list of values. The following example returns all rows where state values are present in vector values c('CA','AZ','PH').


# Filter Rows by list of Column Values
filter(df, state %in% c("CA", "NY",'DE'))

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika      F 1987-06-14    DE

4. Filter Rows based on Multiple Columns

If you wanted to check the conditions of multiple columns and filter the rows based on the result, use the below approach. In this example, I am using multiple conditions, each one with a separate column. This returns rows where gender is equal to M and id is greater than 12.


# Filter Rows based on Multiple Columns
subset(df, gender == 'M' & id >10)

Yields below output.


# Output
   id name gender        dob state
r2 11  ram      M 1981-03-24    NY

4. Conclusion

In this article, you have learned how to filter the data frame (data.frame) by column value in R. You can do this by using filter() function from dplyr package. dplyr is a package that provides a grammar of data manipulation, and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation. All dplyr verbs take input as data.frame and return data.frame object.

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium