R – Extract Columns from DataFrame

We will discuss how to extract columns from an R dataframe. Extracting means selecting columns. We can extract columns from the dataframe in R using 7 ways.

The most basic method is to extract a single column from the dataframe using the $ operator.In the next method, we are simply specifying column names to extract columns from the dataframe and it can be possible to specify column indices to extract columns from the dataframe.In the next method, we are using subset() to extract columns from the dataframe with column names and column indices, the same thing is done in the select() method too.

1. Quick Examples

If you are in a hurry, let’s look at all the scenarios.


#Create dataframe with 5 rows and 3 columns
my_dataframe=data.frame(id=c(2,1,3,4,5),name=c('sravan','jau','chrisa','shivgami','ram'),gender=c('f','m','m','f','m'))

#Display dataframe
print(my_dataframe)

#Extract name column using $
print(my_dataframe$name)

#Extract id  column using $
print(my_dataframe$id)

#Extract id and gender columns by specifying column names
print(my_dataframe[ , c("id", "gender")]) 

#Extract id and gender columns by specifying column indices
print(my_dataframe[ , c(1,3)]) 

#Extract id and gender columns by specifying column names
print(subset(my_dataframe, select = c("id", "gender")))

#Extract id and gender columns by specifying column indices
print(subset(my_dataframe, select = c(1,3)))

#Load the dplyr package
library("dplyr") 

#Select id and gender
print(my_dataframe %>% select("id", "gender"))

#Load the dplyr package
library("dplyr") 

#Select id and gender
print(my_dataframe %>% select(1,3))

Let’s create a R dataframe with 5 rows and 3 columns.


#Create dataframe with 5 rows and 3 columns
my_dataframe=data.frame(id=c(2,1,3,4,5),name=c('sravan','jau','chrisa','shivgami','ram'),
gender=c('f','m','m','f','m'))

#Display dataframe
print(my_dataframe)

Result:


#Output
  id     name gender
1  2   sravan      f
2  1      jau      m
3  3   chrisa      m
4  4 shivgami      f
5  5      ram      m

Now, we will see different ways to extract columns from the dataframe.

2. Extract Column from R DataFrame using ‘$’ operator

In this scenario, we can able to select a single column from the dataframe using the $ operator. It will return a vector of values (values in the form of the vector) in a column.

Syntax:


#Syntax to extract column using '$' operator
my_dataframe$column

Where my_dataframe is the input dataframe and column is the column name.

Example:

In this example, we will extract the column using the $ operator.


#Extract name column using $
print(my_dataframe$name)

#Extract id  column using $
print(my_dataframe$id)

Output:


# Output
[1] "sravan"   "jau"      "chrisa"   "shivgami" "ram"     
[1] 2 1 3 4 5

We can see that id and name columns were selected separately.

3. Extract Columns from DataFrame using column names.

In this scenario, we can select single or multiple columns from the dataframe by specifying column names inside the c() function. It will return a dataframe with specified columns.

Syntax:


#Syntax
my_dataframe[ , c("column",.........)

Where my_dataframe is the input dataframe and column is the column name.

Example:

In this example, we will extract the id and gender columns.


#Extract id and gender columns by specifying column names
print(my_dataframe[ , c("id", "gender")]) 

Output:


# Output
  id gender
1  2      f
2  1      m
3  3      m
4  4      f
5  5      m

We can see that id and gender columns were selected at a time.

4. Extract Columns from DataFrame using column indices.

In this scenario, we can select single or multiple columns from the R dataframe by specifying the column index inside the c() function. It will return a dataframe with specified columns. indexing starts with 1

Syntax:


#Syntax
my_dataframe[ , c(column_index,.........)

Where my_dataframe is the input dataframe and column_index represents the column.

Example:

In this example, we will extract the id and gender columns.


#Extract id and gender columns by specifying column indices
print(my_dataframe[ , c(1,3)]) 

Output:


# Output
  id gender
1  2      f
2  1      m
3  3      m
4  4      f
5  5      m

We can see that id and gender columns were selected at a time.

5. Extract Columns from R DataFrame using column names with subset().

In this scenario, we can select single or multiple columns from the dataframe by specifying column names inside the subset() function. It will take two parameters and return a dataframe with specified columns.

Syntax:


#Syntax
subset(my_dataframe, select = c("column",..........)

Parameters:

  1. my_dataframe is the input dataframe
  2. select() method takes column names to be extracted

Example:

In this example, we will extract id and gender columns.


#Extract id and gender columns by specifying column names
print(subset(my_dataframe, select = c("id", "gender")))
 

Output:


# Output
   id gender
1  2      f
2  1      m
3  3      m
4  4      f
5  5      m

We can see that id and gender columns were selected at a time.

6. Extract Columns from DataFrame using column indices with subset().

In this scenario, we can select single or multiple columns from the dataframe by specifying the column index inside the subset() function. It will take two parameters and return a dataframe with specified columns.

Syntax:


# Syntax
subset(my_dataframe, select = c("column",..........)

Parameters:

  1. my_dataframe is the input dataframe
  2. select() method takes column indices to be extracted

Example:

In this example, we will extract the id and gender columns.


#Extract id and gender columns by specifying column indices
print(subset(my_dataframe, select = c(1,3)))

Output:


# Output
  id gender
1  2      f
2  1      m
3  3      m
4  4      f
5  5      m

We can see that id and gender columns were selected at a time.

7. Extract Columns from DataFrame using column names with select().

In this scenario, we can select single or multiple columns from the dataframe by specifying column names inside the select() function directly. The selected columns can be loaded into the dataframe by using the “%>%” operator and return a dataframe with specified columns. But we have to load the dplyr package because select() is available in this package.

Syntax:


my_dataframe %>% select("column",...........)

Parameters:

  1. my_dataframe is the input dataframe
  2. select() method takes column names to be extracted

Example:

In this example, we will extract the id and gender columns.


#Load the dplyr package
library("dplyr") 

#Select id and gender
print(my_dataframe %>% select("id", "gender"))
 

Output:


  id gender
1  2      f
2  1      m
3  3      m
4  4      f
5  5      m

We can see that id and gender columns were selected at a time.

8. Extract Columns from DataFrame using column indices with select().

In this scenario, we can select single or multiple columns from the dataframe by specifying column indices inside the select() function directly. The selected columns can be loaded into the dataframe by using the “%>%” operator and return a dataframe with a specified column index. But we have to load the dplyr package because select() is available in this package.

Syntax:


# Syntax
my_dataframe %>% select("column_index",...........)

Parameters:

  1. my_dataframe is the input dataframe
  2. select() method takes column indices to be extracted

Example:

In this example, we will extract the id and gender columns.


#Load the dplyr package
library("dplyr") 

#Select id and gender
print(my_dataframe %>% select(1,3))
 

Output:


  id gender
1  2      f
2  1      m
3  3      m
4  4      f
5  5      m

We can see that id and gender columns were selected at a time.

Conclusion

In this article, we discussed seven ways to extract columns from a dataframe. If we want to return only one column, then the ‘$’ operator is enough, otherwise, you can use any one of the other six methods.

Related Articles

R – Drop DataFrame Columns by Name

References

  1. R dataframe
  2. Select columns
extract columns R dataframe

Leave a Reply

You are currently viewing R – Extract Columns from DataFrame