You are currently viewing Calculate the Median in R

How to calculate the median of a DataFrame column or a Vector in R? You can use the R base median() function for computing the median of a Vector and DataFrame. This function takes the vector as a parameter and returns the median value as a numeric. The median of a dataset represents the middle value when the dataset is arranged in ascending order. If there are an even number of values in a dataset, the average of middle two values is the median.

Advertisements

Key points-

  • The median() function calculates the median of a vector or column in a DataFrame.
  • You can handle NA values, using the na.rm parameter. If it is set to TRUE it will ignore NA values while calculating the median.
  • The median value is the middle value of a sorted dataset. If the dataset has an even number of values, the median is the average of the two middle values.
  • DataFrames columns represent with $ notation within the median() function.
  • The median of a vector or DataFrame column can handle both odd and even numbers of values.

1. Syntax of median()

The following is the syntax of the median() function that calculates the median value.


# Syntax of median
median(x, na.rm = FALSE, …)

Parameters:

  • x – Represents a vector, these values ​​are useful for calculating the median.
  • na.rm – Default value is FALSE. When you set it to TRUE, it will ignore NA value.

2. Calculate the Median in R

To calculate the median value of the DataFrame column you can use the median() function. This function allows the data frame column(from these values we are going to get a median) as an argument and computes the median. Let’s see the following example and get the median with and without NA values on a column.


# Create Data Frame
df <- data.frame(id=c(11,25,50,42,55),
              price=c(144,NA,321,567,567))
print("Create a data frame:")
df

# Calculate median of DataFrame column
res <- median(df$id)
print("Get the median of a data frame column:")
res

Yields below output.

r median

Calculate the Median with NA Values

If the column of the data frame has NA values and wants to get the median of these values without getting the result as NA. For that, we need to use the na.rm param of the median() function to ignore NA values. Let’s pass the na.rm = TRUE into this function along with the data frame column which has NA values, to get the median as a numeric value.


# Calculate the median of data frame column with na.rm param
res <- median(df$price, na.rm=TRUE)
print("Get the median of a data farme column:")
res

# Output:
# "Get the median of a data farme column:"
# [1] 444

3. R Median of Vector

Alternatively, you can use this function to a vector and calculate the median of these values. The following examples demonstrate calculating the median when you have an even count and odd count of vector.


# Calculate median of Vector 
vec = c(7, 6, 8)
median(vec)

# Output:
# [1] 7

# Calculate median of Vector which has even values
vec = c(9, 7, 6, 8)
median(vec)

# Output:
# [1] 7.5

Median of Vector with NA Values

Finally, you can use the median() function on the vector which has NA values to get the median value, with or without specifying the na.rm parameter.


# Calculate median of Vector having NA value
# Using median() without na.rm param
vec = c(6, 7, 8, 9, NA)
median(vec)

# Output:
# [1] NA

# Calculate median of Vector ignoring NA value
# Using median() with na.rm param
vec = c(6, 7, 8, 9, NA)
median(vec, na.rm=TRUE)

# Output
# [1] 7.5

As you can observe when calculating the median of a vector that contains NA values, the median function returns NA as the result. However, if you specify the na.rm parameter as TRUE, the function will ignore the NA values and return the median of the remaining values.

4. Conclusion

In this article, I have explained the median value and how to obtain it using the R median() function on a data frame column and vector. I also provided instructions for calculating the median of a DataFrame column or Vector containing NA values.

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium