You are currently viewing Different Ways to Create a DataFrame in R

You can create a DataFrame in R using many ways for instance using data.frame(), as.data.frame() functions and by using other third-party packages like data.table, tible, dplyr e.t.c.

Besides these, you can also create a DataFrame in R programming from a list, JSON, by reading a CSV e.t.c. In this article, I will try to explain using all these methods with examples. You can also use this to create an empty data frame.

1. Create a DataFrame in R using data.frame()

The function data.frame() is used to create a DataFrame in an easy way. A data frame is a list of variables of the same number of rows with unique row names. To learn more about data frames refer to R Data Frame Tutorial.

1.1 Syntax of data.frame()

The following is a syntax of data.frame() function.


#data.frame() Syntax
data.frame(…, row.names = NULL, check.rows = FALSE,
           check.names = TRUE, fix.empty.names = TRUE,
           stringsAsFactors = default.stringsAsFactors())

You need to follow the below guidelines when creating a DataFrame in R using data.frame() function.

  • The input objects passed to data.frame() should have the same number of rows.
  • The column names should be non-empty.
  • Duplicate column names are allowed, but you need to use check.names = FALSE.
  • You can assign names to rows using row.names param.
  • Character variables passed to data.frame are converted to factor columns.

1.2 Create R DataFrame Example

Now, let’s create a DataFrame by using data.frame() function. This function takes the first argument either list or vector. In R, the Vector contains elements of the same type and the data types can be logical, integer, double, character, complex or raw. You can create a Vector using c().


# Create Vectors
id <- c(10,11,12,13)
name <- c('sai','ram','deepika','sahithi')
dob <- as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16'))

# Create DataFrame
df <- data.frame(id,name,dob)

# Print DataFrame
df 

In the above example, I have used the following Vectors as arguments to the data.frame() function, separated by commas to create a DataFrame.

  • id – Numeric Vector which stores the numeric values.
  • name – Character Vector which stores the character values.
  • dob – Date Vector which stores the date values.

The above example yields the below output. R will create a data frame with the column names/variables with the same names we used for Vector. You can also use print(df) to print the DataFrame to the console.


# Output
  id    name        dob
1 10     sai 1990-10-02
2 11     ram 1981-03-24
3 12 deepika 1987-06-14
4 13 sahithi 1985-08-16

Notice that it by default adds an incremental sequence number to each row in a DataFrame.

Alternatively, you can create a data.frame as follows by directly passing the vector to the function, both these create the DataFrame in the same fashion.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram','deepika','sahithi'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16'))
)

# Print DataFrame
df

1.3 Check the DataFrame Data types

Let’s check the data types of the created DataFrame by using print(sapply(df, class)). Note that I have not specified the data types for columns while creating hence, R automatically infers the data type based on the data.


# Display datatypes
print(sapply(df, class))

# Output
#         id        name         dob 
#  "numeric"    "Factor"      "Date"

You can also use str(df) to check the data types.


# Display datatypes
str(df)

# Output
'data.frame':	4 obs. of  3 variables:
 $ id  : num  10 11 12 13
 $ name: Factor w/ 4 levels "deepika","ram",..: 4 2 1 3
 $ dob : Date, format: "1990-10-02" "1981-03-24" "1987-06-14" "1985-08-16"

If you wanted to select the rows or select the columns in R use df[] notation.

1.4 Using stringsAsFactors Param for Character Data Types

If you notice above the name column holds characters but its data type is Factor, the reason being by default R DataFrame is created with Factor data type for character columns.

You can change this behavior by adding additional param stringsAsFactors=False while creating a DataFrame.


# Create DataFrame
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram','deepika','sahithi'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  stringsAsFactors=FALSE
)

# Print DataFrame
str(df)

Yields below output. Note that the data type for name column/variable is chr which is character. In R, you are often required to change the data frame from Factor to Charcter before you perform some operations/transformations.


# Output
'data.frame':	4 obs. of  3 variables:
 $ id  : num  10 11 12 13
 $ name: chr  "sai" "ram" "deepika" "sahithi"
 $ dob : Date, format: "1990-10-02" "1981-03-24" "1987-06-14" ...

1.5 Assign Row Names while Creating DataFrame

You can assign custom names to the R DataFrame rows while creating. Use row.names param and assign the vector with the row names. Note that the vector c() size you are using for row.names should exactly match the size of all columns.


# Create DataFrame with Row Names
df <- data.frame(
  id = c(10,11,12,13),
  name = c('sai','ram','deepika','sahithi'),
  dob = as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16')),
  row.names = c('row1','row2','row3','row4')
)
df

Yields below output.


# Output
     id    name        dob
row1 10     sai 1990-10-02
row2 11     ram 1981-03-24
row3 12 deepika 1987-06-14
row4 13 sahithi 1985-08-16

If you already have a DataFrame, you can use the below approach to assign or change the row names.


# Assign row names to existing DataFrame
row.names(df) <- c('row9','row8','row7','row6','row5')
df

2. Using as.data.frame() to Create DataFrame

Using as.data.frame() is another approach and I use this to create an R DataFrame from the list by taking do.call() function as a parameter. So each list inside a nested list will be a column in a DataFrame.

2.1 Syntax of as.data.frame()


# Syntax as.data.frame()
as.data.frame(x, row.names = NULL, optional = FALSE, …)

2.2 Example of as.data.frame()

Let’s create an R DataFrame using as.data.frame().


# Create nested list (3 lists inside)
my_nested_list <- list(
        id=list(1,2,3,4,5),
        name=list('sai','ram','hari','deepika','sahithi'),
        gender=list('m','m','m','f','f')
     )

# Convert nested list to the dataframe by columns
df <- as.data.frame(do.call(cbind, my_nested_list))
df

Yields below output.


# Output
  id    name gender
1  1     sai      m
2  2     ram      m
3  3    hari      m
4  4 deepika      f
5  5 sahithi      f

3. Create R DataFrame from CSV File

If you have a CSV file with columns separated by a delimiter like a comma, pipe e.t.c, you can easily load this into an R DataFrame by using read.csv() function. This function reads the data from CSV file and converts it into DataFrame.

r create dataframe
Read CSV file to create a DataFrame

Let’s import the CSV file into DataFrame in R. Note that read.csv() by default considers you have a comma-delimited CSV file.


# Create DataFrame from CSV file
df = read.csv('/Users/admin/file.csv')
df
# Check the Datatypes
str(df)

Yields DataFrame similar to above but the data type of certain columns and assigned as characters. For example, the dob column is assigned as a character. I will cover in a separate article how to change the data type.


# Output
'data.frame':	4 obs. of  3 variables:
 $ id  : int  10 11 12 13
 $ name: chr  "sai" "ram" "deepika" "sahithi"
 $ dob : chr  "1990-10-02" "1981-03-24" "1987-06-14" "1985-08-16"

4. From Vector

The vector is a single dimension that contains elements of the same type and the types can be logical, integer, double, character, complex or raw. Whereas the R Data frame is a 2-dimensional structure that is used to hold the values in rows and columns. There are multiple ways to create a R Data Frame from a vector, below is one example.


# Create Vectors
id <- c(10,11,12,13)
name <- c('sai','ram','deepika','sahithi')
dob <- as.Date(c('1990-10-02','1981-3-24','1987-6-14','1985-8-16'))

# Create DataFrame
df <- data.frame(id,name,dob)
df 

Yields below output.


# Output
  id    name        dob
1 10     sai 1990-10-02
2 11     ram 1981-03-24
3 12 deepika 1987-06-14
4 13 sahithi 1985-08-16

5. From the existing Data Frame

If you wanted to create a data frame by slicing the existing data frame use the square bracket notation [] to select the columns you want and assign them to the new data frame object. Using the same [] notation, you can also select rows from the R data frame.


# Select rows and columns
df2 <- df[c(1,3,4),c(2,3)]

Yields the below output. It creates a new df2 object with rows 1,3 and 4 and columns 2 and 3 from an existing data frame.


# Output
     name        dob
1     sai 1990-10-02
3 deepika 1987-06-14
4 sahithi 1985-08-16

6. Empty Data Frame

Empty DataFrame in R usually refers to 0 rows and 0 columns however, sometimes, you would require to have column names and specify the data types for each column, but without any rows. The following example just creates an empty with no rows and no columns.


# Create an Empty DataFrame
df = data.frame()
df

# Output
#data frame with 0 columns and 0 rows

7. From Excel File

You can also create a data frame by importing an excel file in R. R base doesn’t provide a feature to read a CSV file hence, will use read_excel() from third party package readxl.


# Load readxl package
library("readxl")

# Read xlsx files
df = read_excel("file.xlsx")

8. From Text File

Use read.table() function to import text file into a data frame in r. This function takes two parameters first file name you wanted to read and the second would be the delimiter of how the fields are separated in a file.


# Read text file
df = read.table('file.txt',sep='\t')

9. Conclusion

In this article, you have learned multiple ways to create a DataFrame in R using data.frame(). Also learned how to read a CSV file into a DataFrame with examples.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium