Select Rows based on Column Value in R

Spread the love

How do I select rows based on column value in R? In this article, I will explain how to select rows based on column values by using R base function subset(), squate bracket notation, filter() from dplyr package and finally using data.table.

1. Quick Examples

Following are quick examples of how to select DataFrame rows based on column value. For more examples of select refer to select rows in R.


# Quick Examples

# Select Rows by column value
df[df$gender == 'M',]

# Select Rows by Checking values on Multiple Columns
df[df$gender == 'M' & df$id > 15,]

# Select Rows by list of column Values
df[df$state %in% c('CA','AZ','PH'),]

# Using is.element()
df[is.element(df$state, c('CA','AZ','PH')),]

# Using subset
subset(df, state %in% c("CA", "AZ", "PH"))

# Using dplyr::filter
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))

# Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
  row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika      F 1987-06-14  <NA>
r4 13 sahithi      F 1985-08-16  <NA>
r5 14   kumar      M 1995-03-02    DC
r6 15   scott      M 1991-06-21    DW
r7 16     Don      M 1986-03-24    AZ
r8 17     Lin      F 1990-08-26    PH

2. Select Rows based on Column Value

Let’s use the R base square bracket notation df[] to select rows based on a single column value. The following example selects all rows where the vector gender is equal to the value 'M'.


# Select Rows by column value
df[df$gender == 'M',]

Yields below output.


# Output
   id  name gender        dob state
r1 16   Don      M 1986-03-24    AZ
r2 10   sai      M 1990-10-02    CA
r3 14 kumar      M 1995-03-02    DC
r4 15 scott      M 1991-06-21    DW
r5 11   ram      M 1981-03-24    NY

3. Select Rows by list of Column Values

By using the same notation you can also use an operator %in% to select the DataFrame rows based on a list of values. The following example returns all rows when state values are present in vector values c('CA','AZ','PH').


# Select Rows by list of column Values
df[df$state %in% c('CA','AZ','PH'),]

Yields below output.


# Output
   id name gender        dob state
r1 16  Don      M 1986-03-24    AZ
r2 10  sai      M 1990-10-02    CA
r3 17  Lin      F 1990-08-26    PH

4. Select Rows based on Multiple Columns

If you wanted to check the conditions of multiple columns and select the rows based on the result, use the below approach. In this example, I am using multiple conditions, each one with the separate column.


# Select Rows by Checking values on Multiple Columns
df[df$gender == 'M' & df$id > 15,]

Yields below output.


# Output
   id name gender        dob state
r1 16  Don      M 1986-03-24    AZ

5. Using is.element() to Select Rows by Column Values

is.element() is R base function that is used to check if the value of a column from the first argument is present in the second argument if it is present it returns the row.


# Using is.element()
df[is.element(df$state, c('CA','AZ','PH')),]

Yields below output.


# Output
   id name gender        dob state
r1 16  Don      M 1986-03-24    AZ
r2 10  sai      M 1990-10-02    CA
r3 17  Lin      F 1990-08-26    PH

6. Using subset()

subset() is also a R base function that also mostly used to select rows from the DataFrame. This function takes the DataFrame object as input and the condition to select rows.


# Using subset
subset(df, state %in% c("CA", "AZ", "PH"))

7. Using dplyr::filter()

dplyr is a package that provides a grammar of data manipulation, and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation.

In order to use this, you have to install it first using install.packages('dplyr') and load it using library(dplyr). Here, I will be using method filter() from this package to select rows based on column values.


# Using dplyr::filter
library(dplyr)
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))

8. Using data.table

data.table is a package that is used to work with tabular data in R. It provides the efficient data.table object which is a much improved and better performance version of the default data.frame.

In order to use this, you have to install it first using install.packages('data.table') and load it using library(data.table). Here, I will be using methods from this package to select rows based on column values.


# Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]

Conclusion

In this article, you have learned how to select rows based on single column value and multiple column values in R by using r base function subset(), square bracket notation, function() from dplyr package and finally using data.table.

Related Articles

References

Naveen (NNK)

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Select Rows based on Column Value in R