R subset() Function – Get Rows & Columns

The subset() is a generic R function that is used to get the rows and columns (In R terms observations & variables) from the data frame. Also used to get a subset of vectors, and subset of matrices. In this article, I will explain the syntax and usage of the subset() function and some examples of how to get the subset of rows and columns.

Alternatively, you can also select rows in R using df[] notation.

1. R subset() Function Syntax

The following is the syntax of the subset() function.


# Syntax of subset()
subset(x, subset, select, drop = FALSE, …)

Arguments

  • x – Object to be subsetted. Could be any of the Vector, data.frame, & matrices.
  • subset – Subset expression.
  • select – Columns to select in a vector.
  • drop – Passed on to the indexing method for matrices and data frames.
  • .. – Other arguments.

In order to understand the R subset() function with examples, first, let’s create a DataFrame in R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M',NA,'F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
  row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

Yields below output.


   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika   <NA> 1987-06-14  <NA>
r4 13 sahithi      F 1985-08-16  <NA>
r5 14   kumar      M 1995-03-02    DC
r6 15   scott      M 1991-06-21    DW
r7 16     Don      M 1986-03-24    AZ
r8 17     Lin      F 1990-08-26    PH

2. Rows subset() Example

The subset() function of R is used to get the subset of rows from the data frame based on a list of row names, a list of values, and based on conditions (certain criteria) e.t.c

2.1 subset() by Row Name

By using the subset() function let’s see how to get the specific row by name. Use the subset argument to specify the expression on how to get the rows.


# subset by row name
subset(df, subset=rownames(df) == 'r1') 

Yields below output.


# Output
   id name gender        dob state
r1 10  sai      M 1990-10-02    CA

2.2 subset() by a list of values

If you have a list of row values to subset, first create the vector with the list of values and use the %in% operator on condition to the subset() function.


# subset row by of row values
subset(df, rownames(df) %in% c('r1','r2','r3'))

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika   <NA> 1987-06-14  <NA>

2.3 subset() by Condition or Column Value

Using subset(), you can also select rows based on column value by specifying the conditions. In the following examples, I have covered how to use a subset data frame based on single and multiple conditions and based on a list of column values.


# subset by condition
subset(df, gender=='M')

# subset by condition
subset(df, state %in% c('CA','DC'))

# subset by multiple conditions
subset(df, gender=='M' | state == 'PH')

# subset by multiple conditions
subset(df, gender=='M' & state %in% c('CA','NY'))

I would leave the above subset examples to you to run and explore the output.

3. Columns subset() Example

subset() function is also used to get the columns or variables from the R. To subset columns use a select argument with either the column name or vector of column names. To create a vector with values use c() function.

3.1 subset() Columns by Name

The following example returns the subset of columns by name id, name and gender and rows where gender is equal to the value 'M'.


#subset columns
subset(df,gender=='M',select=c('id','name','gender'))

Yields below output.


# Output
   id  name gender
r1 10   sai      M
r2 11   ram      M
r5 14 kumar      M
r6 15 scott      M
r7 16   Don      M

To subset by single column use subset(df,gender=='M',select='id')

3.2 subset() Columns by Index

If you wanted to get the subset of columns based on an index, just pass the vector of column indexes as a value to select param. The following example subset rows with gender value as 'M' and columns with indexes 1,2, and 3.


#subset columns by index
subset(df,gender=='M',select=c(1,2,3))

Yields the same output as below.

4. Complete Example


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M',NA,'F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
  row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

# subset of rows
subset(df, rownames(df) == 'r1')

# subset of rows
subset(df, rownames(df) %in% c('r1','r2','r3'))

# subset by condition
subset(df, gender=='M')

# subset by condition
subset(df, state %in% c('CA','DC'))

# subset by multiple conditions
subset(df, gender=='M' | state == 'PH')

# subset by multiple conditions
subset(df, gender=='M' & state %in% c('CA','NY'))

#subset columns
subset(df,gender=='M',select='id')

#subset columns
subset(df,gender=='M',select=c('id','name','gender'))

#subset columns
subset(df,gender=='M',select=c(1,2,3))

5. Conclusion

In this article, you have learned subset() function that is used to get the specified observations(rows) and variables(columns) in the R. Also, learned subset syntax and usage with examples.

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply