In this R data frame Tutorial with examples, you will learn what is data frame? its features, advantages, modules, packages, and how to use data frame in real-time with sample examples.
All examples provided in this R data frame tutorial are basic, simple, and easy to practice for beginners who are enthusiastic to learn R data frames and advance their careers.
1. What is Data Frame in R?
A data frame in R represents the data in rows and columns similar to pandas DataFrame and SQL. Each column in the data frame is a vector of the same length, in other words, all columns in the data frame should have the same length.
Dataframe in R stores the data in the form of rows and columns similar to RDBMS tables. So it is a two-dimensional data structure such that one dimension refers to the row and another dimension refers to a column. I will cover more on the data frame in the following sections.
In the R data frame columns are referred to as variables and rows are referred to as observations. If you are new to R Programming, I would highly recommend reading the R Programming Tutorial where I have explained R concepts with examples.
R also provides third-party package dplyr which provides a grammar for data manipulation that closely works with data.frame. In order to use this first, you need to install the package in R.
2. Create a DataFrame in R using data.frame()
The first step to exploring the data frame is by creating it. The function data.frame()
is used to create a DataFrame in an easy way. A data frame is a list of variables of the same number of rows with unique row names. Besides this, there are different ways to create a data frame in R.
2.1 Syntax of data.frame()
The following is the syntax of data.frame()
function.
#data.frame() Syntax
data.frame(…, row.names = NULL, check.rows = FALSE,
check.names = TRUE, fix.empty.names = TRUE,
stringsAsFactors = default.stringsAsFactors())
You need to follow the below guidelines when creating a DataFrame in R using data.frame() function.
- The input objects passed to
data.frame()
should have the same number of rows. - The column names should be non-empty.
- Duplicate column names are allowed, but you need to use
check.names = FALSE
. - You can assign names to rows using
row.names
param. - Character variables passed to
data.frame
are converted to factor columns.
2.2 Create R DataFrame Example
Now, let’s create a DataFrame by using data.frame()
function. This function takes the first argument either list or vector. In R, the Vector contains elements of the same type and the data types can be logical, integer, double, character, complex or raw. You can create a Vector using c()
.
# Create Vectors
id <- c(10,11,12,13)
name <- c('sai','ram','deepika','sahithi')
dob <- as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16'))
# Create DataFrame
df <- data.frame(id,name,dob)
# Print DataFrame
df
In the above example, I have used the following Vectors as arguments to the data.frame()
function, separated by commas to create a DataFrame.
id
– Numeric Vector which stores the numeric values.name
– Character Vector which stores the character values.dob
– Date Vector which stores the date values.
The above example yields the below output. R will create a data frame with the column names/variables with the same names we used for Vector. You can also use print(df)
to print the DataFrame to the console.
# Output
id name dob
1 10 sai 1990-10-02
2 11 ram 1981-03-24
3 12 deepika 1987-06-14
4 13 sahithi 1985-08-16
Notice that it by default adds an incremental sequence number to each row in a DataFrame.
Alternatively, you can create a data frame as follows by directly passing the vector to the function, both these create the DataFrame in the same fashion.
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13),
name = c('sai','ram','deepika','sahithi'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16'))
)
# Print DataFrame
df
3. Check the DataFrame Data types
Let’s check the data types of the created DataFrame by using print(sapply(df, class))
. Note that I have not specified the data types of a column while creating hence, R automatically infers the data type based on the data.
# Display datatypes
print(sapply(df, class))
# Output
# id name dob
# "numeric" "Factor" "Date"
You can also use str(df)
to check the data types.
# Display datatypes
str(df)
# Output
'data.frame': 4 obs. of 3 variables:
$ id : num 10 11 12 13
$ name: Factor w/ 4 levels "deepika","ram",..: 4 2 1 3
$ dob : Date, format: "1990-10-02" "1981-03-24" "1987-06-14" "1985-08-16"
4. Using stringsAsFactors Param for Character Data Types
If you notice above the name
column holds characters but its data type is Factor, by default R DataFrame is created with Factor data type for character columns.
You can change this behavior by adding additional param stringsAsFactors=False
while creating a DataFrame.
# Create DataFrame
df <- data.frame(
id = c(10,11,12,13),
name = c('sai','ram','deepika','sahithi'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
stringsAsFactors=FALSE
)
# Print DataFrame
str(df)
# Output
'data.frame': 4 obs. of 3 variables:
$ id : num 10 11 12 13
$ name: chr "sai" "ram" "deepika" "sahithi"
$ dob : Date, format: "1990-10-02" "1981-03-24" "1987-06-14" ...
5. Assign Row Names to DataFrame
You can assign custom names to the R DataFrame rows while creating. Use row.names
param and assign the vector with the row names. Note that the vector c()
size you are using for row.names
should exactly match the size of all columns.
# Create DataFrame with Row Names
df <- data.frame(
id = c(10,11,12,13),
name = c('sai','ram','deepika','sahithi'),
dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
row.names = c('row1','row2','row3','row4')
)
df
Yields below output.
# Output
id name dob
row1 10 sai 1990-10-02
row2 11 ram 1981-03-24
row3 12 deepika 1987-06-14
row4 13 sahithi 1985-08-16
If you already have a DataFrame, you can use the below approach to assign or change the row names.
# Assign row names to existing DataFrame
row.names(df) <- c('row1','row2','row3','row4')
df
6. Select Rows and Columns
By using R base bracket notation we can select rows/observations in R by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.
# Select Rows by index
df[3,]
# Select Rows by list of index values
df[c(3,4,6),]
# Select Rows by index range
df[3:6,]
# Select Rows by name
df['row3',]
# Select Rows by list of names
df[c('row1','row3'),]
# Using subset
subset(df, name %in% c("sai", "ram"))
# Load dplyr
# Using dplyr::filter
library('dplyr')
filter(df, name %in% c("sai", "ram"))
Similarly, you can also select columns or variables in R. Additionally use dplyr select() function or dollar in R to select columns.
# R base - Select columns by name
df[,"name"]
# R base - Select columns from list
df[,c("name","gender")]
# R base - Select columns by index position
df[,c(2,3)]
# Load dplyr
library('dplyr')
# dplyr - Select columns by list of index or position
df %>% select(c(2,3))
# Select columns by index range
df %>% select(2:3)
7. Rename Column Names
To rename a column in R use either R base functions colnames()
and names()
or use third pary packages like dplyr or data.table
# Change second column to c2
colnames(df)[2] ="c2"
# Change the column name by name
colnames(df)[colnames(df) == "id"] ="c1"
By using dplyr rename() function to rename columns.
#Change the column name - c1 to id
df <- df %>%
rename("id" = "c1")
# Rename multiple columns by name
df <- df %>% rename("id" = "c1",
"name" = "c2")
# Rename multiple columns by index
df <- df %>%
rename(col1 = 1, col2 = 2)
8. Update Values
As part of data processing, the first step would be cleaning the data, as part of the cleaning you would be required to replace column values with another value.
# Replace String with Another Stirng on a single column
df$name[df$name == 'ram'] <- 'ram krishna'
df
# Replaces on all columns
df[df=="ram"] <- "ram krishna"
df
# Replace sub string with another String
library(stringr)
df$name <- str_replace(df$name, "r", "R")
print(df)
9. Drop Rows and Columns
10. Handling Missing Values
11. Joint Data Frames
Base function merge() is used to join the data frames in R, this supports inner, left, right, outer and cross joins. The dplyr package and tidyverse package both supports all these basic joins and additionally anti join and semi-join.
# Inner join
df2 <- merge(x=emp_df,y=dept_df,
by="dept_id")
# Inner join on multiple columns
df2 <- merge(x=emp_df,y=dept_df,
by=c("dept_id","dept_branch_id"))
# Inner join on different columns
df2 <- merge(x=emp_df,y=dept_df,
by.x=c("dept_id","dept_branch_id"),
by.y=c("dept_id","dept_branch_id"))
# Load dplyr package
library(dplyr)
# Using dplyr - inner join multiple columns
df2 <- emp_df %>% inner_join( dept_df,
by=c('dept_id','dept_branch_id'))
# Using dplyr - inner join on different columns
df2 <- emp_df %>% inner_join( dept_df,
by=c('dept_id'='dept_id',
'dept_branch_id'='dept_branch_id'))
# Load tidyverse package
library(tidyverse)
# Inner Join data.frames
list_df = list(emp_df,dept_df)
df2 <- list_df %>% reduce(inner_join, by='dept_id')
df2
12. Sorting & Ordering DataFrame
By using order() function you can sort data.frame rows by column value which arranges the values either in ascending or descending order. By default, this function puts all NA values at the last and provides an option to put them first.
# Create Data Frame
df=data.frame(id=c(11,22,33,44,55),
name=c("spark","python","R","jsp","java"),
price=c(144,NA,321,567,567),
publish_date= as.Date(
c("2007-06-22", "2004-02-13", "2006-05-18",
"2010-09-02","2007-07-20"))
)
# Sort Data Frame
df2 <- emp_df[order(df$price),]
# Sort by multiple columns
df2 <- df[order(df$price,df$name ),]
# Sort descending order
df2 <- df[order(df$price,decreasing=TRUE),]
# Sort by putting NA top
df2 <- df[order(df$price,decreasing=TRUE, na.last=FALSE),]
13. Import CSV File into Data Frame
If you have a CSV file with columns separated by a delimiter like a comma, pipe e.t.c, you can easily import CSV into an R DataFrame by using read.csv()
function. This function reads the data frame CSV file and converts it into DataFrame.

Let’s read the CSV file and create a DataFrame. Note that read.csv() by default considers you have a comma-delimited CSV file.
# Create DataFrame from CSV file
df = read.csv('/Users/admin/file.csv')
df
# Check the Datatypes
str(df)
Yields DataFrame similar to above but the data type of certain columns and assigned as characters. For example, dob
column is assigned as a character. I will cover in a separate article how to change the data type.
# Output
'data.frame': 4 obs. of 3 variables:
$ id : int 10 11 12 13
$ name: chr "sai" "ram" "deepika" "sahithi"
$ dob : chr "1990-10-02" "1981-03-24" "1987-06-14" "1985-08-16"
14. Other Data Frame Examples
15. Conclusion
In this R Data Frame tutorial, you have learned what is Data frame? its usage and advantages, how to create it, select rows and columns, rename columns, drop rows and columns, and many more examples.
Happy Learning !!