# R subset() Function – Get Rows & Columns

• Post author:

The subset() is a generic R function that is used to get the rows and columns (In R terms observations & variables) from the data frame. Also used to get a subset of vectors, and subset of matrices. In this article, I will explain the syntax and usage of the `subset()` function and some examples of how to get the subset of rows and columns.

Alternatively, you can also select rows in R using df[] notation.

## 1. R subset() Function Syntax

The following is the syntax of the `subset()` function.

``````
# Syntax of subset()
subset(x, subset, select, drop = FALSE, …)
``````

Arguments

• `x` – Object to be subsetted. Could be any of the Vector, data.frame, & matrices.
• `subset` – Subset expression.
• `select` – Columns to select in a vector.
• `drop` – Passed on to the indexing method for matrices and data frames.
• `..` – Other arguments.

In order to understand the R subset() function with examples, first, let’s create a DataFrame in R.

``````
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M',NA,'F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
``````

Yields below output.

``````
id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika   <NA> 1987-06-14  <NA>
r4 13 sahithi      F 1985-08-16  <NA>
r5 14   kumar      M 1995-03-02    DC
r6 15   scott      M 1991-06-21    DW
r7 16     Don      M 1986-03-24    AZ
r8 17     Lin      F 1990-08-26    PH
``````

## 2. Rows subset() Example

The `subset()` function of R is used to get the subset of rows from the data frame based on a list of row names, a list of values, and based on conditions (certain criteria) e.t.c

### 2.1 subset() by Row Name

By using the `subset()` function let’s see how to get the specific row by name. Use the subset argument to specify the expression on how to get the rows.

``````
# subset by row name
subset(df, subset=rownames(df) == 'r1')
``````

Yields below output.

``````
# Output
id name gender        dob state
r1 10  sai      M 1990-10-02    CA
``````

### 2.2 subset() by a list of values

If you have a list of row values to subset, first create the vector with the list of values and use the `%in%` operator on condition to the `subset()` function.

``````
# subset row by of row values
subset(df, rownames(df) %in% c('r1','r2','r3'))
``````

Yields below output.

``````
# Output
id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika   <NA> 1987-06-14  <NA>
``````

### 2.3 subset() by Condition or Column Value

Using subset(), you can also select rows based on column value by specifying the conditions. In the following examples, I have covered how to use a subset data frame based on single and multiple conditions and based on a list of column values.

``````
# subset by condition
subset(df, gender=='M')

# subset by condition
subset(df, state %in% c('CA','DC'))

# subset by multiple conditions
subset(df, gender=='M' | state == 'PH')

# subset by multiple conditions
subset(df, gender=='M' & state %in% c('CA','NY'))
``````

I would leave the above subset examples to you to run and explore the output.

## 3. Columns subset() Example

`subset()` function is also used to get the columns or variables from the R. To subset columns use a `select` argument with either the column name or vector of column names. To create a vector with values use c() function.

### 3.1 subset() Columns by Name

The following example returns the subset of columns by name `id`, `name` and `gender` and rows where `gender` is equal to the value `'M'`.

``````
#subset columns
subset(df,gender=='M',select=c('id','name','gender'))
``````

Yields below output.

``````
# Output
id  name gender
r1 10   sai      M
r2 11   ram      M
r5 14 kumar      M
r6 15 scott      M
r7 16   Don      M
``````

To subset by single column use `subset(df,gender=='M',select='id')`

### 3.2 subset() Columns by Index

If you wanted to get the subset of columns based on an index, just pass the vector of column indexes as a value to `select` param. The following example subset rows with `gender` value as `'M'` and columns with indexes 1,2, and 3.

``````
#subset columns by index
subset(df,gender=='M',select=c(1,2,3))
``````

Yields the same output as below.

## 4. Complete Example

``````
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M',NA,'F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

# subset of rows
subset(df, rownames(df) == 'r1')

# subset of rows
subset(df, rownames(df) %in% c('r1','r2','r3'))

# subset by condition
subset(df, gender=='M')

# subset by condition
subset(df, state %in% c('CA','DC'))

# subset by multiple conditions
subset(df, gender=='M' | state == 'PH')

# subset by multiple conditions
subset(df, gender=='M' & state %in% c('CA','NY'))

#subset columns
subset(df,gender=='M',select='id')

#subset columns
subset(df,gender=='M',select=c('id','name','gender'))

#subset columns
subset(df,gender=='M',select=c(1,2,3))
``````

## 5. Conclusion

In this article, you have learned subset() function that is used to get the specified observations(rows) and variables(columns) in the R. Also, learned subset syntax and usage with examples.

## References

### NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..