There are several ways to select data frame columns in R by using the R base and dplyr package. In this article, I will explain how to select columns by using the select() function from dplyr package, R base bracket notation df[]. Using these I will cover examples like selecting a specific column from the data frame by name, by columns from the list of labels, and many more.
Sometimes you may need to change the column names, if so read rename data frame columns in r
1. Quick Examples of Select Columns from Data Frame
Following are quick examples of how to select data frame columns in R.
# Quick Examples
# R base - Select columns by name
df[,"name"]
# R base - Select columns from list
df[,c("name","gender")]
# R base - Select columns by index position
df[,c(2,3)]
# Load dplyr
library('dplyr')
# dplyr - Select columns by list of index or position
df %>% select(c(2,3))
# Select columns by index range
df %>% select(2:3)
# dplyr - Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))
# dplyr - Select columns except name & gender
df %>% select(-c('name','gender'))
# dplyr - Select columns between name and state
df %>% select('name':'state')
# dplyr - Select columns starts with a string
df %>% select(starts_with('gen'))
# dplyr - Select columns not start with a string
df %>% select(-starts_with('gen'))
# dplyr - Select columns that ends with a string
df %>% select(ends_with('e'))
# dplyr - Select columns that contains
df %>% select(contains('a'))
# dplyr - Select all numeric columns
df %>% select_if(is.numeric)
Let’s create an R DataFrame, run these examples and explore the output. If you already have data in CSV you can easily import CSV file to R DataFrame. Also, refer to Import Excel File into R.
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13),
name = c('sai','ram',),
gender = c('M','M'),
dob = as.Date(c('1990-10-02','1981-3-24')),
state = c('CA','NY'),
row.names=c('r1','r2')
)
df
Yields below output.
# Output
id name gender dob state
r1 10 sai M 1990-10-02 CA
r2 11 ram M 1981-03-24 NY
2. Select Columns using R base
First, let’s use the R base bracket notation df[]
to select columns from data frame in R. When working with R base on data.frame, most of the R syntax takes $
to refer to column name along with data frame object (df$id
) and uses []
notation, this syntax is not easy to read, and sometimes R code becomes confusing.
2.1 Select by Column Number
The df[]
notation takes syntax df[rows,columns]
, so when using this notation to select columns in R use the columns parameter on the right after the comma. To select columns by index, use single column index, range of column index using starting_position:end_position
or by a list of index positions.
# R base - by list of positions
df[,c(2,3)]
# R base - by range
df[,2:3]
# Output
# name gender
#r1 sai M
#r2 ram M
2.2 Select by Name
Similarly, you can also use this notation to select columns by name in R. All you need to pass is the column name as string to this df[]
. The following example returns a column name from the data frame.
# R base - Select columns by name
df[,"name"]
#Output
#[1] "sai" "ram"
2.3 Select Columns from List
Most of the time you would like to select multiple columns from the list, to do so just create a vector with all the columns you wanted and pass it to the column section of df[]. The following example returns the name
and gender
from a data frame.
# R base - Select columns from list
df[,c("name","gender")]
# Output
# name gender
#r1 sai M
#r2 ram M
3. Select Columns using dplyr Package
dplyr select() function is used to select the columns or variables from the data frame. This takes the first argument as the data frame and the second argument is the column name or vector of column names.
When we use dplyr
 package, we mostly use the infix operator %>%
 from magrittr
, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f(y)
 converted into f(x, y)
. For more examples on this package refer to R dplyr package tutorial with examples.
3.1 Select by Column Number
The select()
function of dplyr
package also supports selecting columns by index from the R data frame. Use this function if you wanted to select the data frame columns by index or position. The following example returns columns 2 and 3 from the data frame.
# Load dplyr
library('dplyr')
# Select columns
df %>% select(2,3)
# Select columns by list of index or position
df %>% select(c(2,3))
# Select columns by index range
df %>% select(2:3)
Yields below output.
# Output
name gender
r1 sai M
r2 ram M
3.2 Select by Name using dplyr
You can also select data frame columns by name, select multiple columns, and all columns in the list (contains in the list) using dplyr package. The first example from the following selects the specified columns that are supplied to select() function with a comma separator. The second example selects all columns from the list.
# Select columns by label name & gender
df %>% select('name','gender')
df %>% select(c('name','gender'))
# Output
# name gender
#r1 sai M
#r2 ram M
3.3. Select Columns Except List of Columns
By using select() from dplyr, you can also drop columns from the DataFrame by Name. To drop columns, use -
along with the columns. Not that it just returns a new DataFrame without the specified columns.
# Select columns except name & gender
df %>% select(-c('name','gender'))
# Output
# id dob state
#r1 10 1990-10-02 CA
#r2 11 1981-03-24 NY
3.4. Select All Columns Between 2 Columns
You can also select all columns between two columns, in order to do so use the range operator (:)
. The left-hand side of the operator is the starting position and the right-hand side would be the end position. The following examples select all columns between name
and state
columns.
# Select columns between name and state
df %>% select('name':'state')
# Output
# name gender dob state
#r1 sai M 1990-10-02 CA
#r2 ram M 1981-03-24 NY
3.5. Get All Columns that starts with
Use starts_with()
along with the select() to get all columns starts with a character string. The following example selects all columns that start with the gen
string.
# Select columns starts with a string
df %>% select(starts_with('gen'))
# Output
# gender
#r1 M
#r2 M
3.6. Get All Columns that ends with
Use ends_with()
along with the select() to get all columns ends with a character string. The following example selects all columns that end with the e
string.
# Select columns that ends with a string
df %>% select(ends_with('e'))
# Output
# name state
#r1 sai CA
#r2 ram NY
3.7. Get Columns Containing character
In case you wanted to select all columns that contain a character or string use contains()
. The following example selects all columns that contain a character a
.
# Select columns that contains
df %>% select(contains('a'))
# Output
# name state
#r1 sai CA
#r2 ram NY
3.8. Select All Numeric Columns
Selecting all numeric columns is one of the most used operations. If you have a data frame with columns with strings and integers, performing certain statistical operations on the entire data frame results in error hence, first you need to select all numeric columns using is.numeric
input to select_if()
and perform the operation on the result of it. Use is.character
to select columns of character type.
# Select all numeric columns
df %>% select_if(is.numeric)
# Output
id
r1 10
r2 11
5. Conclusion
In this article, you have learned how to select columns by using R base bracket notation df[] and select() method from dplyr package, how to select the columns by index position and name, select columns start with, end with e.t.c
Related Articles
- R subset() Function
- R filter() Function
- RÂ Filter DataFrame by Column Value
- How to Import Text File as a String in R
- How to Read Text File to DataFrame in R
- How to Read CSV From URL in R
- How to Read Multiple CSV Files in R
- How to Read CSV Files in R
- How to Export CSV in R Using write.csv()
- How to Export Excel files in R
- How to join Data Frames in R