How to Select Rows in R with Examples

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame. If you have data.table then use the function from it to achieve better performance.

Some of the methods explained in this article are also used to select columns from an R data frame.

1. Quick Examples of Select Rows

Following are quick examples of how to select rows from DataFrame (data.frame) in R


# Quick Examples

# Select Rows by index
df[3,]

# Select Rows by list of index values
df[c(3,4,6),]

# Select Rows by index range
df[3:6,]

# Select first N rows
head(df,3)

# Select last N rows
tail(df,3)

# Select Rows by name
df['r3',]

# Select Rows by list of names
df[c('r3','r6'),]

# Select Rows by column value
df[df$gender == 'M',]

# Select Rows by checking values on multiple columns
df[df$gender == 'M' & df$id > 15,]

# Select Rows by list of column values
df[df$state %in% c('CA','AZ','PH'),]

# Using is.element()
df[is.element(df$state, c('CA','AZ','PH')),]

# Using subset
subset(df, state %in% c("CA", "AZ", "PH"))

# Using dplyr::filter
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))

# Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV files to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH')
)
df

Yields below output.


# Output
  id    name gender        dob state
1 10     sai      M 1990-10-02    CA
2 11     ram      M 1981-03-24    NY
3 12 deepika      F 1987-06-14  <NA>
4 13 sahithi      F 1985-08-16  <NA>
5 14   kumar      M 1995-03-02    DC
6 15   scott      M 1991-06-21    DW
7 16     Don      M 1986-03-24    AZ
8 17     Lin      F 1990-08-26    PH

2. Using R base to Select Rows

By using a bracket notation you can select rows from DataFrame in R. In this selection I will cover how to select rows by index, select rows by Name, and check column values. All these returns a DataFrame after selecting the specific rows hence, you can use these to Create an R DataFrame from the existing DataFrame

2.1 By Index

Every row or observation in a DataFrame is assigned an index, you can use this index to get rows. Following are some commonly used methods to select rows by index in R.


# Select Rows by Index
df[3,]

# Select Rows by List of Index Values
df[c(3,4,6),]

# Select Rows by Index Range
df[3:6,]

# Select first N rows
head(df,3)

# Select last N rows
tail(df,3)

2.2 By Row Name

If you have row names on the DataFrame and wanted to select rows by row name in R, use the below approach. By default row names are the incremental sequence numbers assigned at the time of the creation of the R DataFrame. R also provides a way to assign custom row names while creating the DataFrame or setting row names on the existing by using rownames() function. To set the column names use colnames() function.

In order to get the multiple rows by name use the vector with the values, you wanted to return.


# Select Rows by Name
df['r3',]

# Select Rows by list of names
df[c('r3','r6'),]

2.3 By Checking Column Values

Let’s see some examples of how to select rows by conditions in R, for example, conditions include equal, not equal. And also some examples to get rows based on multiple conditions. To get rows based on column value use %in% operator.


# Select Rows by equal condition
df[df$gender == 'M',]

# Select Rows by not equal condition
df[df$gender != 'M',]

# Select Rows by Multiple Conditions
df[df$gender == 'M' & df$id > 15,]

# Select rows based on list
df[df$id %in% c(13,14,15),]

3. Select Rows with head() and tail()

Use head() function to get the first N rows and use tail() to get the last N rows. These two methods take dataframe object as the first argument and an integer value as the second argument that specifies how many rows to return.


# Select first N rows
head(df,3)

# Select last N rows
tail(df,3)

4. Using subset() to Select Rows from DataFrame

subset() is also an R base function that is also mostly used to select rows from the DataFrame. This function takes the DataFrame object as input and the condition to select rows.


# Using subset
subset(df, state %in% c("CA", "AZ", "PH"))

5. Using filter() from dplyr Package

dplyr is a package that provides a grammar of data manipulation and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation.

In order to use this, you have to install it first using install.packages('dplyr') and load it using library(dplyr). Here, I will be using method dplyr::filter() from this package to get rows based on column values.


# Load dplyr package
library(dplyr)

# Using dplyr::filter
dplyr::filter(df, state %in% c("CA", "AZ", "PH"))

6. Using setDT() from data.table Package

data.table is a package that is used to work with tabular data in R. It provides the efficient data.table object which is a much improved and better performance version of the default data.frame.

In order to use this, you have to install it first using install.packages('data.table') and load it using library(data.table). Here, I will be using methods from this package to select rows based on column values


# Using data.table
library(data.table)
setDT(df, key = 'state')[J(c("CA", "AZ", "PH"))]

7. Conclusion

In this article, you have learned by using bracket notation on R DataFrame you can select rows by name, by index, by column value, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another package dplyr to return the selected rows from the DataFrame. if you have a data.table then use the function from it to achieve better performance.

References

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing How to Select Rows in R with Examples