R Data Frame Tutorial | Learn with Examples

In this R data frame Tutorial with examples, you will learn what is data frame? its features, advantages, modules, packages, and how to use data frame in real-time with sample examples.

All examples provided in this R data frame tutorial are basic, simple, and easy to practice for beginners who are enthusiastic to learn R data frames and advance their careers.

1. What is Data Frame in R?

data frame in R represents the data in rows and columns similar to pandas DataFrame and SQL. Each column in the data frame is a vector of the same length, in other words, all columns in the data frame should have the same length.

Dataframe in R stores the data in the form of rows and columns similar to RDBMS tables. So it is a two-dimensional data structure such that one dimension refers to the row and another dimension refers to a column. I will cover more on the data frame in the following sections.

In the R data frame columns are referred to as variables and rows are referred to as observations. If you are new to R Programming, I would highly recommend reading the R Programming Tutorial where I have explained R concepts with examples.

R also provides third-party package dplyr which provides a grammar for data manipulation that closely works with data.frame. In order to use this first, you need to install the package in R.

2. Create a DataFrame in R using data.frame()

The first step to exploring the data frame is by creating it. The function data.frame() is used to create a DataFrame in an easy way. A data frame is a list of variables of the same number of rows with unique row names. Besides this, there are different ways to create a data frame in R.

2.1 Syntax of data.frame()

The following is the syntax of data.frame() function.

#data.frame() Syntax
data.frame(…, row.names = NULL, check.rows = FALSE,
           check.names = TRUE, fix.empty.names = TRUE,
           stringsAsFactors = default.stringsAsFactors())

You need to follow the below guidelines when creating a DataFrame in R using data.frame() function.

  • The input objects passed to data.frame() should have the same number of rows.
  • The column names should be non-empty.
  • Duplicate column names are allowed, but you need to use check.names = FALSE.
  • You can assign names to rows using row.names param.
  • Character variables passed to data.frame are converted to factor columns.

2.2 Create R DataFrame Example

Now, let’s create a DataFrame by using data.frame() function. This function takes the first argument either list or vector. In R, the Vector contains elements of the same type and the data types can be logical, integer, double, character, complex or raw. You can create a Vector using c().

# Create Vectors
id <- c(10,11,12,13)
name <- c('sai','ram','deepika','sahithi')
dob <- as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16'))

# Create DataFrame
df <- data.frame(id,name,dob)

# Print DataFrame

In the above example, I have used the following Vectors as arguments to the data.frame() function, separated by commas to create a DataFrame.

  • id – Numeric Vector which stores the numeric values.
  • name – Character Vector which stores the character values.
  • dob – Date Vector which stores the date values.

The above example yields the below output. R will create a data frame with the column names/variables with the same names we used for Vector. You can also use print(df) to print the DataFrame to the console.

# Output
  id    name        dob
1 10     sai 1990-10-02
2 11     ram 1981-03-24
3 12 deepika 1987-06-14
4 13 sahithi 1985-08-16

Notice that it by default adds an incremental sequence number to each row in a DataFrame.

Alternatively, you can create a data frame as follows by directly passing the vector to the function, both these create the DataFrame in the same fashion.

# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram','deepika','sahithi'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16'))

# Print DataFrame

3. Check the DataFrame Data types

Let’s check the data types of the created DataFrame by using print(sapply(df, class)). Note that I have not specified the data types of a column while creating hence, R automatically infers the data type based on the data.

# Display datatypes
print(sapply(df, class))

# Output
#         id        name         dob 
#  "numeric"    "Factor"      "Date"

You can also use str(df) to check the data types.

# Display datatypes

# Output
'data.frame':	4 obs. of  3 variables:
 $ id  : num  10 11 12 13
 $ name: Factor w/ 4 levels "deepika","ram",..: 4 2 1 3
 $ dob : Date, format: "1990-10-02" "1981-03-24" "1987-06-14" "1985-08-16"

4. Using stringsAsFactors Param for Character Data Types

If you notice above the name column holds characters but its data type is Factor, by default R DataFrame is created with Factor data type for character columns.

You can change this behavior by adding additional param stringsAsFactors=False while creating a DataFrame.

# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram','deepika','sahithi'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),

# Print DataFrame

# Output
'data.frame':	4 obs. of  3 variables:
 $ id  : num  10 11 12 13
 $ name: chr  "sai" "ram" "deepika" "sahithi"
 $ dob : Date, format: "1990-10-02" "1981-03-24" "1987-06-14" ...

5. Assign Row Names to DataFrame

You can assign custom names to the R DataFrame rows while creating. Use row.names param and assign the vector with the row names. Note that the vector c() size you are using for row.names should exactly match the size of all columns.

# Create DataFrame with Row Names
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram','deepika','sahithi'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  row.names = c('row1','row2','row3','row4')

Yields below output.

# Output
     id    name        dob
row1 10     sai 1990-10-02
row2 11     ram 1981-03-24
row3 12 deepika 1987-06-14
row4 13 sahithi 1985-08-16

If you already have a DataFrame, you can use the below approach to assign or change the row names.

# Assign row names to existing DataFrame
row.names(df) <- c('row1','row2','row3','row4')

6. Select Rows and Columns

By using R base bracket notation we can select rows/observations in R by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

# Select Rows by index

# Select Rows by list of index values

# Select Rows by index range

# Select Rows by name

# Select Rows by list of names

# Using subset
subset(df, name %in% c("sai", "ram"))

# Load dplyr 
# Using dplyr::filter
filter(df, name %in% c("sai", "ram"))

Similarly, you can also select columns or variables in R. Additionally use dplyr select() function or dollar in R to select columns.

# R base - Select columns by name

# R base - Select columns from list

# R base - Select columns by index position

# Load dplyr 

# dplyr - Select columns by list of index or position
df %>% select(c(2,3))

# Select columns by index range
df %>% select(2:3)

7. Rename Column Names

To rename a column in R use either R base functions colnames() and names() or use third pary packages like dplyr or data.table

# Change second column to c2
colnames(df)[2] ="c2"

# Change the column name by name
colnames(df)[colnames(df) == "id"] ="c1"

By using dplyr rename() function to rename columns.

#Change the column name - c1 to id
df <- df %>% 
    rename("id" = "c1")

# Rename multiple columns by name
df <- df %>% rename("id" = "c1",
                          "name" = "c2")

# Rename multiple columns by index
df <- df %>% 
       rename(col1 = 1, col2 = 2)

8. Update Values

As part of data processing, the first step would be cleaning the data, as part of the cleaning you would be required to replace column values with another value.

# Replace String with Another Stirng on a single column
df$name[df$name == 'ram'] <- 'ram krishna'

# Replaces on all columns
df[df=="ram"] <- "ram krishna"

# Replace sub string with another String
df$name <- str_replace(df$name, "r", "R")

9. Drop Rows and Columns

drop rows and drop columns

10. Handling Missing Values

11. Joint Data Frames

Base function merge() is used to join the data frames in R, this supports inner, left, right, outer and cross joins. The dplyr package and tidyverse package both supports all these basic joins and additionally anti join and semi-join.

# Inner join
df2 <- merge(x=emp_df,y=dept_df, 

# Inner join on multiple columns
df2 <- merge(x=emp_df,y=dept_df, 

# Inner join on different columns
df2 <- merge(x=emp_df,y=dept_df, 

# Load dplyr package

# Using dplyr - inner join multiple columns
df2 <- emp_df %>% inner_join( dept_df, 

# Using dplyr - inner join on different columns
df2 <- emp_df %>% inner_join( dept_df, 

# Load tidyverse package

# Inner Join  data.frames
list_df = list(emp_df,dept_df)
df2 <- list_df %>% reduce(inner_join, by='dept_id')

12. Sorting & Ordering DataFrame

By using order() function you can sort data.frame rows by column value which arranges the values either in ascending or descending order. By default, this function puts all NA values at the last and provides an option to put them first.

# Create Data Frame
          publish_date= as.Date(
            c("2007-06-22", "2004-02-13", "2006-05-18",

# Sort Data Frame
df2 <- emp_df[order(df$price),]

# Sort by multiple columns
df2 <- df[order(df$price,df$name ),]

# Sort descending order
df2 <- df[order(df$price,decreasing=TRUE),]

# Sort by putting NA top
df2 <- df[order(df$price,decreasing=TRUE, na.last=FALSE),]

13. Import CSV File into Data Frame

If you have a CSV file with columns separated by a delimiter like a comma, pipe e.t.c, you can easily import CSV into an R DataFrame by using read.csv() function. This function reads the data frame CSV file and converts it into DataFrame.

r create dataframe
Read CSV file to create a DataFrame

Let’s read the CSV file and create a DataFrame. Note that read.csv() by default considers you have a comma-delimited CSV file.

# Create DataFrame from CSV file
df = read.csv('/Users/admin/file.csv')
# Check the Datatypes

Yields DataFrame similar to above but the data type of certain columns and assigned as characters. For example, dob column is assigned as a character. I will cover in a separate article how to change the data type.

# Output
'data.frame':	4 obs. of  3 variables:
 $ id  : int  10 11 12 13
 $ name: chr  "sai" "ram" "deepika" "sahithi"
 $ dob : chr  "1990-10-02" "1981-03-24" "1987-06-14" "1985-08-16"

14. Other Data Frame Examples

15. Conclusion

In this R Data Frame tutorial, you have learned what is Data frame? its usage and advantages, how to create it, select rows and columns, rename columns, drop rows and columns, and many more examples.

Happy Learning !!


Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing R Data Frame Tutorial | Learn with Examples