You are currently viewing Calculate the Median in R

How to calculate the median of a DataFrame column or a Vector in R? You can use the R base median() function for computing the median of a Vector. This function takes the vector as a parameter and returns the median value as a numeric. The median of a dataset is the value that, assuming the dataset is ordered from smallest to largest, falls in the middle. If there are an even number of values in a dataset, the middle two values are the median.

Key points-

  • The median() function calculates the median of a vector or column in a DataFrame.
  • You can handle NA values, using the na.rm parameter. If it is set to TRUE it will ignore NA values while calculating the median.
  • The median value is the middle value of a sorted dataset. If the dataset has an even number of values, the median is the average of the two middle values.
  • DataFrames columns represent with $ notation within the median() function.
  • The median of a vector or DataFrame column can handle both odd and even numbers of values.

1. Syntax of median()

The following is the syntax of the median() function that calculates the median value.


# Syntax of median
median(x, na.rm = FALSE, …)

Parameters:

  • x – Represents a vector, these values ​​are useful for calculating the median.
  • na.rm – Default value is FALSE. When you set it to TRUE, it will ignore NA value.

2. Calculate the Median in R

To calculate the median value of the DataFrame column you can use the median() function. This function allows the data frame column(from these values we are going to get a median) as an argument and computes the median. Let’s see the following example and get the median with and without NA values on a column.


# Create Data Frame
df <- data.frame(id=c(11,25,50,42,55),
              price=c(144,NA,321,567,567))
print("Create a data frame:")
df

# Calculate median of DataFrame column
res <- median(df$id)
print("Get the median of a data farme column:")
res

Yields below output.

median in r

If the column of the data frame has NA values and wants to get the median of these values without getting the result as NA. For that, we need to use the na.rm param of the median() function to ignore NA values. Let’s pass the na.rm = TRUE into this function along with the data frame column which has NA values, to get the median as a numeric value.


# Calculate the median of data frame column with na.rm param
res <- median(df$price, na.rm=TRUE)
print("Get the median of a data farme column:")
res

# Output
# "Get the median of a data farme column:"
# [1] 444

3. R Median of Vector

Alternatively, you can use this function to a vector and calculate the median of these values. The following examples demonstrate calculating the median when you have an even count and odd count of vector and also when you have NA values.


# Calculate median of Vector
vec = c(6,7,8)
median(vec)

# Output
# [1] 7

# Calculate median of Vector
vec = c(6, 7, 8, 9)
median(vec, na.rm=TRUE)

# Output
# [1] 7.5

# Calculate median of Vector
vec = c(10,11,6,7,8,9, NA)
median(vec, na.rm=TRUE)

# Output
# [1] 8.5

4. Conclusion

In this article, I have explained what the median value is and how to obtain it using the median() function to data frame column or vector in R. I also provided instructions for calculating the median of a DataFrame column or Vector that contains NA values.

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium