How to Select Columns in R?

Spread the love

There are several ways to select data frame columns in R by using the R base and dplyr package. In this article, I will explain how to select columns by using the select() function from dplyr package, R base bracket notation df[]. Using these I will cover examples like selecting a specific column from the data frame by name, by columns from the list of labels, and many more.

Sometimes you may need to change the column names, if so read rename data frame columns in r

1. Quick Examples of Select Columns from Data Frame

Following are quick examples of how to select data frame columns in R.


# Quick Examples

# R base - Select columns by name
df[,"name"]

# R base - Select columns from list
df[,c("name","gender")]

# R base - Select columns by index position
df[,c(2,3)]

# Load dplyr 
library('dplyr')

# dplyr - Select columns by list of index or position
df %>% select(c(2,3))
# Select columns by index range
df %>% select(2:3)


# dplyr - Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))


# dplyr - Select columns except name & gender
df %>% select(-c('name','gender'))

# dplyr - Select columns between name and state
df %>% select('name':'state')

# dplyr - Select columns starts with a string
df %>% select(starts_with('gen'))

# dplyr - Select columns not start with a string
df %>% select(-starts_with('gen'))

# dplyr - Select columns that ends with a string
df %>% select(ends_with('e'))

# dplyr - Select columns that contains
df %>% select(contains('a'))

# dplyr - Select all numeric columns
df %>% select_if(is.numeric)

Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram',),
  gender = c('M','M'),
  dob = as.Date(c('1990-10-02','1981-3-24')),
  state = c('CA','NY'),
  row.names=c('r1','r2')
)
df

Yields below output.


# Output
   id    name gender        dob state
r1 10     sai      M 1990-10-02    CA
r2 11     ram      M 1981-03-24    NY

2. Select Columns using R base

First, let’s use the R base bracket notation df[] to select columns from data frame in R. When working with R base on data.frame, most of the R syntax takes $ to refer to column name along with data frame object (df$id) and uses [] notation, this syntax is not easy to read, and sometimes R code becomes confusing.

2.1 Select by Column Number

The df[] notation takes syntax df[rows,columns], so when using this notation to select columns in R use the columns parameter on the right after the comma. To select columns by index, use single column index, range of column index using starting_position:end_position or by a list of index positions.


# R base - by list of positions
df[,c(2,3)]

# R base - by range
df[,2:3]

# Output
#   name gender
#r1  sai      M
#r2  ram      M

2.2 Select by Name

Similarly, you can also use this notation to select columns by name in R. All you need to pass is the column name as string to this df[]. The following example returns a column name from the data frame.


# R base - Select columns by name
df[,"name"]

#Output
#[1] "sai" "ram"

2.3 Select Columns from List

Most of the time you would like to select multiple columns from the list, to do so just create a vector with all the columns you wanted and pass it to the column section of df[]. The following example returns the name and gender from a data frame.


# R base - Select columns from list
df[,c("name","gender")]

# Output
#   name gender
#r1  sai      M
#r2  ram      M

3. Select Columns using dplyr Package

dplyr select() function is used to select the columns or variables from the data frame. This takes the first argument as the data frame and the second argument is the column name or vector of column names.

When we use dplyr package, we mostly use the infix operator %>% from magrittr, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f(y) converted into f(x, y). For more examples on this package refer to R dplyr package tutorial with examples.

3.1 Select by Column Number

The select() function of dplyr package also supports selecting columns by index from the R data frame. Use this function if you wanted to select the data frame columns by index or position. The following example returns columns 2 and 3 from the data frame.


# Load dplyr 
library('dplyr')

# Select columns
df %>% select(2,3)

# Select columns by list of index or position
df %>% select(c(2,3))

# Select columns by index range
df %>% select(2:3)

Yields below output.


# Output
      name gender
r1     sai      M
r2     ram      M

3.2 Select by Name using dplyr

You can also select data frame columns by name, select multiple columns, and all columns in the list (contains in the list) using dplyr package. The first example from the following selects the specified columns that are supplied to select() function with a comma separator. The second example selects all columns from the list.


# Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))

# Output
#   name gender
#r1  sai      M
#r2  ram      M

3.3. Select Columns Except List of Columns

By using select() from dplyr, you can also drop columns from the DataFrame by Name. To drop columns, use - along with the columns. Not that it just returns a new DataFrame without the specified columns.


# Select columns except name & gender
df %>% select(-c('name','gender'))

# Output
#   id        dob state
#r1 10 1990-10-02    CA
#r2 11 1981-03-24    NY

3.4. Select All Columns Between 2 Columns

You can also select all columns between two columns, in order to do so use the range operator (:). The left-hand side of the operator is the starting position and the right-hand side would be the end position. The following examples select all columns between name and state columns.


# Select columns between name and state
df %>% select('name':'state')

# Output
#   name gender        dob state
#r1  sai      M 1990-10-02    CA
#r2  ram      M 1981-03-24    NY

3.5. Get All Columns that starts with

Use starts_with() along with the select() to get all columns starts with a character string. The following example selects all columns that start with the gen string.


# Select columns starts with a string
df %>% select(starts_with('gen'))

# Output
#   gender
#r1      M
#r2      M

3.6. Get All Columns that ends with

Use ends_with() along with the select() to get all columns ends with a character string. The following example selects all columns that end with the e string.


# Select columns that ends with a string
df %>% select(ends_with('e'))

# Output
#   name state
#r1  sai    CA
#r2  ram    NY

3.7. Get Columns Containing character

In case you wanted to select all columns that contain a character or string use contains(). The following example selects all columns that contain a character a.


# Select columns that contains
df %>% select(contains('a'))

# Output
#   name state
#r1  sai    CA
#r2  ram    NY

3.8. Select All Numeric Columns

Selecting all numeric columns is one of the most used operations. If you have a data frame with columns with strings and integers, performing certain statistical operations on the entire data frame results in error hence, first you need to select all numeric columns using is.numeric input to select_if() and perform the operation on the result of it. Use is.character to select columns of character type.


# Select all numeric columns
df %>% select_if(is.numeric)

# Output
   id
r1 10
r2 11

5. Conclusion

In this article, you have learned how to select columns by using R base bracket notation df[] and select() method from dplyr package, how to select the columns by index position and name, select columns start with, end with e.t.c

Related Articles

References

Naveen (NNK)

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing How to Select Columns in R?