How to create a contingency table in R? You can use the R base table() function to create a contingency table, also known as a frequency table in multiple ways. A contingency table is a type of table used in statistics to display the cross-tabulation of multiple categorical variables, showing the frequency distribution of their combinations. while a frequency table displays the count of occurrences for each unique value within a single variable.
In this article, I will explain how to create and manipulate the contingency table from R objects using the table()
function and explore the relationships between two or more categorical variables.
Related: You can use the table() function to create the tables from various R objects such as vectors, columns of data frames, and factors.
Key points-
- Contingency tables, also known as frequency tables, display the cross-tabulation of categorical variables to show their frequency distribution.
- The
table()
function in R is used to create a contingency table from vectors, data frame columns, or matrices. - In a contingency table, rows typically represent categories of one variable, and columns represent categories of another variable.
- Each cell in the table shows the count of observations for each combination of categories.
- Row totals (marginals) show the total frequency for each row category, while column totals show the total frequency for each column category.
- The
addmargins()
function adds row and column sums to a contingency table, providing additional summary information. - Use
prop.table()
withmargin = 1
to compute row percentages, indicating the proportion of each row’s total count that falls into each column category. - Use
prop.table()
without additional arguments to calculate relative frequencies, showing the proportion of the total dataset represented by each cell. - Create a contingency table from specific columns of a data frame by passing those columns to the
table()
function. - Convert a matrix to a contingency table using the
as.table()
function, as thetable()
function alone does not handle matrices directly.
Create a Contingency Table from the R Vector
Use the table()
function in R to create a contingency table or frequency table from a vector. This function will count the occurrences of each unique value in the vector and present them in a tabular format.
# Create a contingency table from vector
# Create vector
vec <- c("A", "B", "B", "C", "A", "A")
# Create a contingency table
table_data <- table(vec)
print("Create a contingency table:")
print(table_data)
Yields below output.
Create a Contingency Table From a R Data Frame
To create a contingency table from a data frame, pass the data frame into the table()
function. This will return a contingency table where the values represent the frequency of the combinations of the given column values.
# Create a contingency table from data frame
df <- data.frame(
gender = c("Male", "Male", "Female", "Female", "Male", "Female"),
product = c("A", "B", "A", "B", "A", "B")
)
print("Given data frame:")
print(df)
# Create a contingency table
table_data <- table(df)
print("Create a contingency table:")
print(table_data)
The above code creates a contingency table from the df
data frame, showing the cross-tabulation of gender
and product
.
Yields below output.
Contingency Table Columns
Alternatively, you can create the contingency table from data frame columns. To do that, you can pass one or more columns to the table()
function to generate a contingency table that shows the frequency of combinations of the given column values.
# Create a contingency table from columns
df <- data.frame(
gender = c("Male", "Male", "Female", "Female", "Male", "Female"),
product = c("A", "B", "A", "B", "A", "B")
)
print("Given data frame:")
print(df)
# Create a contingency table
table_data <- table(df$gender, df$product)
print("Create a contingency table:")
print(table_data)
Yields below output.
Contingency Table by Specific Column
Similarly, you can create a contingency table from specific columns of a data frame by passing those columns as arguments to the table()
function. This generates a contingency table that highlights the relationship between the selected variables.
# Create contingency table by specific column
df <- data.frame(
gender = c("Male", "Male", "Female", "Female", "Male", "Female"),
product = c("A", "B", "A", "B", "A", "B")
)
# Create a contingency table
table_data <- table(df$gender)
print("Create a contingency table:")
print(table_data)
# Output:
# [1] "Create a contingency table:"
# Female Male
# 3 3
Contingency Tables by Rows
To create a contingency table using rows of a data frame, the table()
function alone isn’t sufficient, as it cannot generate tables directly from data frame rows. Instead, you can use the matrix()
function to select a range of rows and then apply the table()
function to create the contingency table from those rows.
# Create contingency table from rows
df <- data.frame(
gender = c("Male", "Male", "Female", "Female", "Male", "Female"),
product = c("A", "B", "A", "B", "A", "B")
)
df
# Create a contingency table
table_data <- table(as.matrix(df[1:6, ]))
print("Create a contingency table:")
print(table_data)
# Output:
# [1] "Create a contingency table:"
# A B Female Male
# 3 3 3 3
Add Margins to Contingency Table
The addmargins()
function in R adds row and column sums to the contingency table, providing additional details on your data’s distribution. It improves the contingency table by displaying the totals for each category in both rows and columns.
# Add margins to contingency table
df <- data.frame(
gender = c("Male", "Male", "Female", "Female", "Male", "Female"),
product = c("A", "B", "A", "B", "A", "B")
)
# Create a contingency table
table_data <- table(df)
add_mar <- addmargins(table_data)
print("Add margins to contingency table:")
print(add_mar)
# Output:
# [1] "Add margins to contingency table:"
# product
# gender A B Sum
# Female 1 2 3
# Male 2 1 3
# Sum 3 3 6
Contingency Tables Row Percentages.
To calculate the row percentages of a contingency table, you can use the prop.table()
function. Row percentages indicate what portion of each row’s total count is represented by each category in the columns.
By using the margin = 1
argument of the prop.table() function to compute the proportion of each cell within its respective row. Multiplying by 100 converts these proportions into percentages. Finally, returns the contingency table with the row percentages.
# Calculate row percentages
# Create a contingency table
table_data <- table(df)
row_percent <- prop.table(table_data, margin = 1) * 100
print("Contingency table with row percentages:")
print(row_percent)
# Output:
# [1] "Contingency table with row percentages:"
# A B
# Female 33.33333 66.66667
# Male 66.66667 33.33333
This shows that, for example, 33.33% of females preferred product A, and 66.67% of females preferred product B.
Contingency Tables in R Relative Frequency
To calculate relative frequencies, use the prop.table()
function, which shows the proportion of the total dataset represented by each cell in the contingency table. By passing the table data to this function, you can compute and display the overall distribution of the data across all categories.
# Calculate relative frequencies
# Create a contingency table
table_data <- table(df)
relative_freq <- prop.table(table_data)
print("Contingency table with relative frequencies:")
print(relative_freq)
# Output:
# [1] "Contingency table with relative frequencies:"
# A B
# Female 0.1666667 0.3333333
# Male 0.3333333 0.1666667
Create a Contingency Table from Matrix
Finally, you can create a contingency table from a matrix. Since the table()
function in R cannot generate a table directly from a matrix, you can use the as.table()
function to convert the matrix into a contingency table. First, create the matrix and then pass it to this function to obtain the contingency table.
# Create a matrix
matrix <- matrix(c("Male", "Female", "Male", "Female", "Male", "Female",
"A", "A", "B", "B", "A", "B"),
nrow = 6,
byrow = FALSE)
# Convert matrix to data frame
table_data <- as.table(matrix)
colnames(table_data) <- c("gender", "product")
print("Create a contingency table:")
table_data
# Output:
# [1] "Create a contingency table:"
# gender product
# A Male A
# B Female A
# C Male B
# D Female B
# E Male A
# F Female B
Conclusion
In this article, I have explained contingency tables are a powerful tool for analyzing categorical data and summarizing the relationship between categorical variables. In R, the table()
function is used to create these tables from vectors, data frames, or matrices. Also explained using addmargins()
, and prop.table()
, how we can implement the contingency tables by adding margins, computing row percentages, and relative frequencies.
Happy Learning!!