R Sort DataFrame Rows by Column Value

How to sort DataFrame (data.frame) in R? To sort data frame by column values use the order() function. By default, the sorting function performs in ASCENDING order and provides an option to sort in descending order. Also, by default, all NA values on the sorting column are kept at the last and you can change this behavior by using optional params.

Key Points –

  • By default, sorts in ascending order.
  • To sort by descending order use decreasing=TRUE.
  • You can also prefix the sorting variable with a minus sign to perform DESCENDING order.
  • By default, sorting keeps all NA values on the sorting column at the end.
  • To keep NA values first, use na.last=FALSE.

In this article, I will explain how to sort dataframe by using the above key points along with using the following methods.

  • order() function from R base
  • arrange() function from dplyr
  • setorder() function from data.table

1. Quick Examples of Sorting DataFrame

Following are quick examples of how to sort DataFrame by column value in ascending and descending order in R programming.


# Sort Data Frame
df2 <- emp_df[order(df$price),]

# Sort by multiple columns
df2 <- df[order(df$price,df$name ),]

# Sort descending order
df2 <- df[order(df$price,decreasing=TRUE),]

# Sort by putting NA top
df2 <- df[order(df$price,decreasing=TRUE, na.last=FALSE),]

# Load dplyr library
library(dplyr)
df2 <- df %>% arrange(price)
df2 <- df %>% arrange(desc(price), desc(name) ))

# Load data.table library
library("data.table")
df2 <- setorder(df,price)

let’s create a R DataFrame


# Create Data Frame
df=data.frame(id=c(11,22,33,44,55),
          name=c("spark","python","R","jsp","java"),
          price=c(144,NA,321,567,567),
          publish_date= as.Date(
            c("2007-06-22", "2004-02-13", "2006-05-18",
              "2010-09-02","2007-07-20"))
          )

Yields below output.

r sort date frame

2. Sort DataFrame in R using order() Function

The order() is a base function that is used to sort DataFrame in R based on column value, this function can also be used to sort vectors. This function takes the ordered column indices, so we have to use [] – index and inside this, we can apply the order() function. Hence this will return the column.

2.1 Syntax of order() Function

Following is the syntax of the order() function.


# order() syntax
order(data, na.last = TRUE, decreasing = FALSE)
  • data – vector or command separated vectors to sort on multiple columns.
  • na.last – Default set to TRUE. Use FALSE to have all NA values put at the top. Use NA to remove.
  • decreasing – Default set to FALSE. Use TRUE to order in descending.

2.2 Example of Sort DataFrame by Column Value

Let’s use the above-created data.frame and order() function to sort the R dataframe by column value in ascending order. The following example sorts the data by column price. Since the order() function takes the vector as an argument, use df$price as an argument. Note that in R, every column in DataFrame is a vector.


# Sort DataFrame
df2 <- df[order(df$price),]
df2

Yields below output.

r sorting data frame

4. By Multiple Columns

If you wanted to sort by multiple columns, pass all columns/variables you wanted to join as comma-separated values. When you perform a sort on multiple columns, it first orders on the first column and when it encounters tie values it breaks ties by next arguments.


# Sort by multiple columns
df2 <- df[order(df$price,df$name ),]
df2

Yields below output.

multiple columns

6. Sort DataFrame by Descending order

By default, sorting happens by ascending or increasing order, for string columns the sort happens in alphabetical order (A-Z). You can sort the data.frame in ascending order by using decreasing=TRUE.


# Sort descending order
df2 <- df[order(df$price,decreasing=TRUE),]
df2

Yields below output.

r sort column value

When arranging the rows you can also prefix with a minus sign to perform sorting in decreasing order. This allows you to sort one column in ascending and another column in descending order.

7. Sort by having NA First

By default rows with NA on a sorting column will be put at the end in R, however, you can change this behavior and put rows with NA on top of the dataframe by using na.last=FALSE.


# Sort by putting NA at first
df2 <- df[order(df$price,decreasing=TRUE, na.last=FALSE),]
df2

Yields below output.

r order by column value

8. Sort DataFrame using dplyr Package

arrange() function from dplyr package is also used to arrange the values in an ascending or descending order. To use arrange() function, you have to install dplyr first using install.packages(‘dplyr’) and load it using library(dplyr).

All functions in dplyr package take data.frame as a first argument. When we use dplyr package, we mostly use the infix operator %>% from magrittr, it passes the left-hand side of the operator to the first argument of the right-hand side of the operator. For example, x %>% f(y) converted into f(x, y) so the result from the left-hand side is then “piped” into the right-hand side. 


# Load dplyr library
library(dplyr)
df2 <- df %>% arrange(price)
df2

To sort columns in descending order


# Sort descending order
df2 <- df %>% arrange(desc(price))
df2

9. Sort DataFrame using data.table Package

You can also use setorder() function from data.table to perform sorting on DataFrame columns. This function takes the data.frame object and column as input and return a new DataFrame after sorting by the specified column.


# Load data.table library
library("data.table")
df2 <- setorder(df,price)
df2

Conclusion

In this article, I have explained how to use the order() function to sort the DataFrame by column values in R and how to sort by descending order, keeping all NA values first. Finally converted using arrange() function from dplyr package and setorder() function from data.table to order data.frame by specified columns.

Similarly, you can also sort by date column, when performing an order on a date make sure the column is in date type.

Related Articles

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply

You are currently viewing R Sort DataFrame Rows by Column Value