You are currently viewing R Subset Data Frame by Column Value & Name

How to subset the data frame (DataFrame) by column value and name in R? By using R base df[] notation, or subset() you can easily subset the R Data Frame (data.frame) by column value or by column name.

Advertisements

1. Quick Examples of Subsetting DataFrame by Column Value & Name

Following are quick examples of subsetting a data frame name by column value and column name.


# Quick Examples

# Subset Rows by column value
df[df$gender == 'M',]
subset(df, gender == 'M')

# Subset Rows by list of column Values
df[df$state %in% c('CA','AZ','PH'),]
subset(df, state %in% c('CA','AZ','PH'))

# Subset Rows by Checking values on Multiple Columns
df[df$gender == 'M' & df$id > 15,]
subset(df, gender == 'M' & id >15)

# Subset DataFrame by column name id and name.
df[df$gender == 'M', c('id','name')]
subset(df,gender == 'M',select = c('id','name'))

Before diving into a subsetting of the data frame we need to create an R DataFrame using data.frame() function. Let’s create,


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13,14,15,16,17),
  name = c('sai','ram','deepika','sahithi','kumar','scott','Don','Lin'),
  gender = c('M','M','F','F','M','M','M','F'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16',
                  '1995-03-02','1991-6-21','1986-3-24','1990-8-26')),
  state = c('CA','NY',NA,NA,'DC','DW','AZ','PH'),
  row.names=c('r1','r2','r3','r4','r5','r6','r7','r8')
)
df

Yields below output.

r subset data frame column

2. Subset Data Frame using Column Value

You can use the R base square bracket notation df[] and subset() function to subset the data frame by column value or based on specific conditions. Both approaches allow for filtering rows based on column values of a specified column or particular conditions. Finally, returns a subset of the data frame containing only the rows that meet the specified criteria.


# Subset the data frame by column value
# Using df[]
df[df$gender == 'M',]

# Using subset()
subset(df, gender == 'M')

The above code has returned the subset of the data frame containing all rows where the column gender is equal to the value 'M',

Yields below output.

r subset data frame column

2.1. Subset Data Frame using List of Column Values

Alternatively, you can use the df[] notation and subset() function to subset the data frame by filtering the rows based on multiple column values of a specified column. Both df[] notation and the subset() function allow to use %in% operator to check the values of the specified column present in a vector. If they are present they return all rows that have a vector of values for the specified column.


# Subset a data frame by list of column values
# using df[]
df[df$state %in% c('CA','AZ','PH'),]

# Using subset()
subset(df, state %in% c('CA','AZ','PH'))

Yields below output.


# Output
   id name gender        dob state
r1 16  Don      M 1986-03-24    AZ
r2 10  sai      M 1990-10-02    CA
r3 17  Lin      F 1990-08-26    PH

2.2. Subset Data Frame using Morethan one Column

To subset the data frame and get the filtering rows based on multiple columns/multiple conditions. You can specify the multiple conditions of multiple columns using the logical AND(&) operator within a df[] notation. If both conditions are TRUE, it will return the subset of the data frame where the rows satisfy specified conditions.

Whereas using the subset() function you can subset the data frame based on multiple conditions. To specify the multiple conditions using the logical AND operator along with the data frame. It will return the subset of the data frame including all rows that meet both conditions.


# Using df[]
df[df$gender == 'M' & df$id > 15,]

# Using subset()
subset(df, gender == 'M' & id >15)

The above code has returned rows where gender is equal to M and id is greater than 15.

Yields below output.


# Output
   id name gender        dob state
r1 16  Don      M 1986-03-24    AZ

3. Subset Data Frame by Column Name

Let’s use the same df[] notation and subset() function to subset the data frame by column name in R. To subset columns use select argument with values as column names to subset().


# Using df[]
df[df$gender == 'M', 'id']

# Using subset()
subset(df,gender == 'M',select = 'id')

3.1 Subset Data Frame by List of Column Names

Similarly, let’s see how to subset the DataFrame by the list of column names in R. To have a list of column names, create a vector with the column names you are interested in using c() and assign this to the select argument. The following examples return the data frame with columns id and name.


# Using df[]
df[df$gender == 'M', c('id','name')]

# Using subset()
subset(df,gender == 'M',select = c('id','name'))

Yields below output.


# Output
   id  name
r1 10   sai
r2 11   ram
r5 14 kumar
r6 15 scott
r7 16   Don

Frequently Asked Questions on Subset a Data Frame

How do I subset a data frame in R based on a specific column value?

You can use the subset() function or R base df[] notation to filter rows based on a specific column value.

How can I subset a data frame based on multiple conditions?

To subset a data frame based on multiple conditions you can use logical operators (e.g., & for AND, | for OR).

What if I want to subset a data frame in R based on a range of values in a column?

You can use the %in% operator to subset a data frame within a range.

How can I subset a data frame using the dplyr package in R?

The filter() function in the dplyr package is used for subsetting data frames based on column values.

4. Conclusion

In this article, you have learned how to subset data frame by column value and by column name in R. You can do this by using R base subset() or the square bracket notation df[].

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium