The subset() is a generic R function that is used to get the rows and columns (In R terms observations & variables) from the data frame. Also used to get a subset of vectors, and subset of matrices. In this article, I will explain the syntax and usage of the `subset()`

function and some examples of how to get the subset of rows and columns.

Alternatively, you can also select rows in R using df[] notation.

## 1. R subset() Function Syntax

The following is the syntax of the `subset()`

function.

```
# Syntax of subset()
subset(x, subset, select, drop = FALSE, …)
```

**Arguments**

`x`

– Object to be subsetted. Could be any of the Vector, data.frame, & matrices.`subset`

– Subset expression.`select`

– Columns to select in a vector.`drop`

– Passed on to the indexing method for matrices and data frames.`..`

– Other arguments.

In order to understand the **R subset() function** with examples, first, let’s create a DataFrame in R.

```
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M',NA,'F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
```

Yields below output.

```
id name gender dob state
r1 10 sai M 1990-10-02 CA
r2 11 ram M 1981-03-24 NY
r3 12 deepika <NA> 1987-06-14 <NA>
r4 13 sahithi F 1985-08-16 <NA>
r5 14 kumar M 1995-03-02 DC
r6 15 scott M 1991-06-21 DW
r7 16 Don M 1986-03-24 AZ
r8 17 Lin F 1990-08-26 PH
```

## 2. Rows subset() Example

The `subset()`

function of R is used to get the subset of rows from the data frame based on a list of row names, a list of values, and based on conditions (certain criteria) e.t.c

### 2.1 subset() by Row Name

By using the `subset()`

function let’s see how to get the specific row by name. Use the subset argument to specify the expression on how to get the rows.

```
# subset by row name
subset(df, subset=rownames(df) == 'r1')
```

Yields below output.

```
# Output
id name gender dob state
r1 10 sai M 1990-10-02 CA
```

### 2.2 subset() by a list of values

If you have a list of row values to subset, first create the vector with the list of values and use the `%in%`

operator on condition to the `subset()`

function.

```
# subset row by of row values
subset(df, rownames(df) %in% c('r1','r2','r3'))
```

Yields below output.

```
# Output
id name gender dob state
r1 10 sai M 1990-10-02 CA
r2 11 ram M 1981-03-24 NY
r3 12 deepika <NA> 1987-06-14 <NA>
```

### 2.3 subset() by Condition or Column Value

Using subset(), you can also select rows based on column value by specifying the conditions. In the following examples, I have covered how to use a subset data frame based on single and multiple conditions and based on a list of column values.

```
# subset by condition
subset(df, gender=='M')
# subset by condition
subset(df, state %in% c('CA','DC'))
# subset by multiple conditions
subset(df, gender=='M' | state == 'PH')
# subset by multiple conditions
subset(df, gender=='M' & state %in% c('CA','NY'))
```

I would leave the above subset examples to you to run and explore the output.

## 3. Columns subset() Example

`subset()`

function is also used to get the columns or variables from the R. To subset columns use a `select`

argument with either the column name or vector of column names. To create a vector with values use c() function.

### 3.1 subset() Columns by Name

The following example returns the subset of columns by name `id`

, `name`

and `gender`

and rows where `gender`

is equal to the value `'M'`

.

```
#subset columns
subset(df,gender=='M',select=c('id','name','gender'))
```

Yields below output.

```
# Output
id name gender
r1 10 sai M
r2 11 ram M
r5 14 kumar M
r6 15 scott M
r7 16 Don M
```

To subset by single column use `subset(df,gender=='M',select='id')`

### 3.2 subset() Columns by Index

If you wanted to get the subset of columns based on an index, just pass the vector of column indexes as a value to `select`

param. The following example subset rows with `gender`

value as `'M'`

and columns with indexes 1,2, and 3.

```
#subset columns by index
subset(df,gender=='M',select=c(1,2,3))
```

Yields the same output as below.

## 4. Complete Example

```
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13,14,15,16,17),
name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
gender = c('M','M',NA,'F','M','M','M','F'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
'1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df
# subset of rows
subset(df, rownames(df) == 'r1')
# subset of rows
subset(df, rownames(df) %in% c('r1','r2','r3'))
# subset by condition
subset(df, gender=='M')
# subset by condition
subset(df, state %in% c('CA','DC'))
# subset by multiple conditions
subset(df, gender=='M' | state == 'PH')
# subset by multiple conditions
subset(df, gender=='M' & state %in% c('CA','NY'))
#subset columns
subset(df,gender=='M',select='id')
#subset columns
subset(df,gender=='M',select=c('id','name','gender'))
#subset columns
subset(df,gender=='M',select=c(1,2,3))
```

## 5. Conclusion

In this article, you have learned subset() function that is used to get the specified observations(rows) and variables(columns) in the R. Also, learned subset syntax and usage with examples.

