• Post author:
  • Post category:R Programming
  • Post last modified:March 27, 2024
  • Reading time:12 mins read
You are currently viewing R Create a New Column Based on Condition

To create a new column in a data frame in R based on single/multiple conditions you can use the mutate() function from the dplyr package. Data manipulation is a crucial aspect of data analysis, and creating new columns in a DataFrame based on specific conditions is a common task in R.

Advertisements

In this article, I will explain various methods to add new columns to a DataFrame based on single or multiple conditions, using character conditions, implementing custom functions, and adding columns based on row indices.

Key points-

  • Create a logical condition using comparison operators (e.g., ==, !=, <, >, <=, >=) to evaluate against the existing column.
  • Utilize the logical condition within square brackets to assign the result to a new column, e.g., data_frame$new_column <- data_frame$existing_column == value.
  • To Implement more complex conditions, you can use ifelse() or nested if-else structures.
  • You can use functions from the dplyr package, such as mutate() and case_when(), to concise the code with a readable format.
  • Create custom functions using apply() or sapply() to perform operations on each element of the existing column.

What is DataFrame in R

A data frame in R is a fundamental data structure used for storing and manipulating structured data in the format of rows and columns similar to an RDBMS table or spreadsheet. It is a two-dimensional data structure such that one dimension refers to the row and another to a column. Each column in the data frame is a Vector of the same length, in other words, all columns in the data frame should have the same length.

Let’s create an R DataFrame, run these examples, and explore the output. If you already have data in CSV you can easily import CSV files to R DataFrame. Also, refer to Import Excel File into R.


# Create dataframe
df=data.frame(id=c(11,22),
              pages=c(32,45),
              name=c("spark","python"))
df

Yields below output.

r create new column based on condition

2. R dplyr Create New Column Based on Condition

If you want to create a new column to the existing DataFrame based on a specified condition, you can use the mutate() function from the dplyr package in R. Before implementing the dplyr functions we need to install dplyr package and load it as a library(dplyr).


# Create new column using dplyr package
# Install.packages("dplyr")
library(dplyr)

# Create the dataframe
df <- data.frame(id=c(11, 22),
                 pages=c(32, 45),
                 name=c("spark", "python"))

# Create a new column based on condition
df <- df %>% mutate(new_column = ifelse(pages > 40, "High Pages", "Low Pages"))

# Print the updated dataframe
print(df)

In this example, a new column named new_column is added based on the condition that if the value in the pages column is greater than 40, it is named as High Pages otherwise, it is named as Low Pages. You can customize the condition and the names according to your specific requirements. This example yields the below output.

r create new column based on condition

3. Create a New Column Based on the String condition

To create a new column in a data frame based on character conditions, you can use the same method as described above. First, create a data frame and use the pipe operator (%>%) to load the data frame into the mutate() function. This will create a new column based on a condition specified by the ifelse statement. Finally, we will get the updated data frame with a new column.


# Create new column based on string condition 
df <- df %>% mutate(new_column = ifelse(name == "spark", "Spark Language", "Other Language"))
df

# Output:
#   id pages   name     new_column
# 1 11    32  spark Spark Language
# 2 22    45 python Other Language

This example creates a new column named new_column based on a conditional statement using the mutate function from the dplyr package. The new column indicates whether the language is Spark Language or Other Language based on the name column. Finally, the resulting dataframe is printed.

4. Create a Column Based on Multiple Conditions

Similarly, you can use the mutate() function from dplyr to create a new column based on multiple conditions. First, create a dataframe and apply a pipe operator(%>%) to load this data frame into the mutate() function. This function will create a new column based on multiple conditions. These conditions are implemented using case_when() function. Finally, we will get the updated data frame with a new column.


# Create a new column based on multiple conditions
df <- df %>% mutate(new_column = case_when(
    pages > 40 & name == "python" ~ "High Python Pages",
    pages <= 40 & name == "spark" ~ "Low Spark Pages",
    TRUE ~ "Other"
  ))
df 

# Output:
#   id pages   name        new_column
# 1 11    32  spark   Low Spark Pages
# 2 22    45 python High Python Pages

From the above code, the new_column is created based on two conditions, resulting in different labels. the code creates a dataframe, adds a new column based on specific conditions, and then displays the resulting dataframe. The new column categorizes the data based on the values in the pages and name columns.

5. Create a New Column Based on Row Index

Alternatively, you can create a new column based on the row index using the dplyr mutate() function. Let’s use the mutate() function with the help of the ifelse statement to determine whether each row of the data frame is even or odd.


# Create a new column based on row index
df <- df %>%
  mutate(new_column = ifelse(row_number() %% 2 == 0, "Even Row", "Odd Row"))
df

# Output:
  id pages   name new_column
1 11    32  spark    Odd Row
2 22    45 python   Even Row

Here, a new column is created indicating whether each row is an even or odd row.

6. Create a New Column with a custom function

So far, we have created a new column of the data frame using the dplyr package. Now we will see using the custom function and dplyr function to create a new column of data frame based on certain conditions. First, define a custom function and pass the specified column as an argument and it will return the category based on the values of the specified column.

Then use the pipe operator(%>%) to pipe the given data frame into the mutate() function. This function will create a new column of a data frame with the help of the sapply() function which is applied to the specified column, using the custom function to determine the category for each value.


# Create a New Column with a custom function
# Define a custom function
get_category <- function(pages) {
  if (pages > 40) {
    return("High Pages")
  } else if (pages <= 40) {
    return("Low Pages")
  } else {
    return("Other")
  }
}

# Use the custom function to create the new column
df <- df %>%
  mutate(new_column = sapply(pages, get_category))
df

# Output:
#   id pages   name new_column
# 1 11    32  spark  Low Pages
# 2 22    45 python High Pages

In this example, the get_category function is used to determine the category for each row based on the pages column.

Conclusion

In this article, I have explained how to create a new column in an R data frame based on single/multiple conditions by using the mutate() function from dplyr packages and other functions. Also explained the creation of a new column by using the custom function with the help of the dplyr function with well-defined examples.

Happy Learning!!