R select() Function from dplyr – Usage with Examples

select() is a function from dplyr R package that is used to select data frame variables by name, by index, and also is used to rename variables while selecting, and dropping variables by name. In this article, I will explain the syntax of select() function, and its usage with examples like selecting specific variables by name, by position, selecting variables from the list of names, and many more. Note that in R columns are referred to as variables and rows are referred to as observations.

dplyr is an R package that provides a grammar of data manipulation and provides a most used set of verbs that helps data science analysts to solve the most common data manipulation. In order to use this, you have to install it first using install.packages('dplyr') and load it using library(dplyr).

Sometimes you may need to change the variable names, if so read rename data frame columns in r.

1. dplyr select() Syntax

Following is the syntax of select() function of dplyr package in R. This returns an object of the same class as x (input object).


# Syntax of select()
select(x, variables_to_select)

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram','deepika','sahithi'),
  gender = c('M','M','F','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  state = c('CA','NY','DE',NA),
  row.names=c('r1','r2','r3','r4')
)
df

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY
r3 12 deepika      F 1987-06-14    DE
r4 13 sahithi      F 1985-08-16  <NA>

2. Select Variables by Index Position

The select() function of dplyr package is used to select variable names from the R data frame. Use this function if you wanted to select the data frame variables by index or position.


# Load dplyr 
library('dplyr')

# Select columns
df %>% select(2,3)

# Select columns by list of index or position
df %>% select(c(2,3))

# Select columns by index range
df %>% select(2:3)

Verb select() in dplyr package take data.frame as a first argument. When we use dplyr package, we mostly use the infix operator %>% from magrittr, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f(y) converted into f(x, y)

Yields below output


# Output
      name gender
r1     sai      M
r2     ram      M
r3 deepika      F
r4 sahithi      F

3. Select Variables by Name

You can also select variables by name, select multiple variables, and all variables in the list (contains in the list). The first example from the following selects the specified variables that are supplied to select() function with a comma separator. The second example selects all variables from the list.


# Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))

4. Drop Variables

By using select() you can also drop columns from the DataFrame by Name. To drop variables, use - along with the variables. Not that it just returns a new DataFrame without the specified variables.


# Select columns except name & gender
df %>% select(-c('name','gender'))

5. Select All Variables Between 2 Variables

You can also select all variables between two variables, in order to do so use the range operator (:). The left-hand side of the operator is the starting position and the right-hand side would be the end position. The following examples select all variables between name and state variables.


# Select columns between name and state
df %>% select('name':'state')

6. Select All Variables that starts with

Use starts_with() along with the select() to get all variables starts with a character string. The following example selects all variables that start with the gen string.


# Select columns starts with a string
df %>% select(starts_with('gen'))

7. Select All Variables that ends with

Use ends_with() along with the select() to get all variables ends with a character string. The following example selects all variables that end with the e string.


# Select columns that ends with a string
df %>% select(ends_with('e'))

8. Select Variables containing character

In case you wanted to select all variables that contain a character or string use contains(). The following example selects all variables that contain a character a.


# Select columns that contains
df %>% select(contains('a'))

9. Select All Numeric Variables

Selecting all numeric variables is one of the most used operations. If you have data frame with variables with strings and integers, performing certain statistical operations on the entire data frame results in error hence, first you need to select all numeric columns and perform the operation on the result of it.


# Select all numeric columns
df %>% select_if(is.numeric)

10. Complete Example


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram','deepika','sahithi'),
  gender = c('M','M','F','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  state = c('CA','NY','DE',NA),
  row.names=c('r1','r2','r3','r4')
)
df

# Load dplyr 
library('dplyr')

# Select columns by list of index or position
df %>% select(c(2,3))
# Select columns by index range
df %>% select(2:3)


# Select columns by label name & gender
df %>% select(c('name','gender'))
df %>% select('name','gender')

# Select columns except name & gender
df %>% select(-c('name','gender'))

# Select columns between name and state
df %>% select('name':'state')

# Select columns starts with a string
df %>% select(starts_with('gen'))

# Select columns not start with a string
df %>% select(-starts_with('gen'))

# Select columns that ends with a string
df %>% select(ends_with('e'))

# Select columns that contains
df %>% select(contains('a'))

# Select all numeric columns
df %>% select_if(is.numeric)

11. Conclusion

In this article, you have learned select() method syntax from dplyr package, how to select the variables by index position and name, select variables start with, end with e.t.c

Related Articles

References

R select function dplyr

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing R select() Function from dplyr – Usage with Examples