R – Read CSV File with Examples

How do I read data from a CSV file into R DataFrame? Use read.csv() function in R to import a CSV file into a DataFrame. CSV file format is the easiest way to store scientific, analytical, or any structured data (two-dimensional with rows and columns). Data in CSV is separated by delimiter most commonly comma (,) but you can also use any character like pipe, tab e.t.c

In this article, I will explain how to read a CSV file into DataFrame and also explain different options you can use while reading a CSV effectively without errors. In order to write or export a CSV file use write.csv().

1. Quick Examples of Read CSV File in R

The following are quick examples of how to import a CSV in R by using the read.csv() function and its optional arguments.


# Quick examples

# Read CSV into DataFrame
read_csv = read.csv('/Users/admin/file.csv')

# Read with custom delimiter
read_csv = read.csv('/Users/admin/file.csv',sep=',')

# Read without header
read_csv = read.csv('/Users/admin/file_noheader.csv',header=FALSE)

# Set Column Names
colnames(read_csv) = c('id','name','dob','gender')
str(read_csv)

# Replaces all -1 and empty string as <NA>
read_csv = read.csv('/Users/admin/file.csv',na.strings=c(-1,''))

# Keep String as Character.
read_csv = read.csv('/Users/admin/file_noheader.csv', stringsAsFactors='FALSE')

# Use UTF-8 encoding
read_csv = read.csv('/Users/admin/file_noheader.csv', encoding='utf-8')

2. Read CSV File in R

In order to read a CSV file in R use its base function read.csv(), which loads the data from the CSV file into DataFrame. Once the data frame was created and to perform operations refer to R data frame tutorial for examples.

Following is the syntax of the read.csv() function in R. Note that CSV and Excel files are different hence to load an excel file in R use packages like readxlxlsx, and openxlsx


# Syntax of read.csv()
read.csv(file, header = TRUE, sep = ",", quote = "\"",
         dec = ".", fill = TRUE, comment.char = "", …)

Let’s read a comma separate CSV file into a DataFrame.


# Read CSV into DataFrame
read_csv = read.csv('/Users/admin/file.csv')
print(read_csv)

Yields below output.


# Output
  id name        dob gender
1 10  sai 1990-10-02      M
2 NA  ram 1981-03-24       
3 -1 <NA> 1987-06-14      F
4 13      1985-08-16   <NA>

3. Read CSV with Custom Delimiter using sep Argument

By default read.csv() function uses a comma delimiter however, you can use any custom delimiter by using sep argument. For example, use sep='|' to read a CSV file with data separated by a pipe, for tab use sep='\t'.


# Usage of sep param
read_csv = read.csv('/Users/admin/file.csv',sep=',')
print(read_csv)

4. Read CSV without Header using header Argument

Sometimes you may receive the CSV file without a header row (column names), if you receive such a file, use the header argument with FALSE to not consider the first record in a CSV file as a header. By default header param is set to a value TRUE hence, it automatically considers the first record in a CSV file as a header.

Let’s take another CSV file file_noheader.csv without a header row (column names) and load into DataFrame.


# Use header=False
read_csv = read.csv('/Users/admin/file_noheader.csv', header=FALSE)
print(read_csv)

Yields below output.


# Output
  V1   V2         V3     V4
1 10  sai 1990-10-02      M
2 NA  ram 1981-03-24       
3 -1 <NA> 1987-06-14      F
4 13      1985-08-16   <NA>

Note that the default column names it assigns as V1, V2, V3, and V4. To rename columns on DataFrame to your own use colnames().


# Set column names
colnames(read_csv) = c('id','name','dob','gender')
print(read_csv)

5. Usage of na.strings Argument

When you are working with large or small files, you often get missing or unexpected data in certain cells of rows & columns. Usually, these missing data are represented as empty. If you notice our DataFrame result from the above outputs, you would see some missing values like an empty string on name, gender, and -1 unexpect value for id column.

By using na.strings argument, you can specify vector of values you would like to consider as NA. In the below example, I have used c(-1,'') to instruct read.csv() method to consider all -1 and empty strings as NA. You can also replace an empty string with NA on the DataFrame.


# Replaces all -1 and empty string as <NA>
read_csv = read.csv('/Users/admin/file.csv',na.strings=c(-1,''))
print(read_csv)

Yields below output.


# Output
  id name        dob gender
1 10  sai 1990-10-02      M
2 NA  ram 1981-03-24   <NA>
3 NA <NA> 1987-06-14      F
4 13 <NA> 1985-08-16   <NA>

Sometimes you would also be required to replace NA values with 0 on numeric columns on DataFrame.

6. Usage of stringsAsFactors Argument

If you are using an older version which is prior to R 4.0, all columns that have character string data are by default converted to factor types. When a column is in factor type, you can’t perform many string operations hence, to keep string columns as character type use stringsAsFactors=FALSE while reading a CSV file in R.

With a newer version on R, you don’t have to use this argument as R by default considers character data as a string. I am using R version 4.0, hence all my string columns are converted to character (chr) type. Use str() to display the structure of the DataFrame.


# Keep String as Character.
read_csv = read.csv('/Users/admin/file_noheader.csv', stringsAsFactors='FALSE')
str(read_csv)

Yields below output.


# Output
# I am using R new version hence string is in chr type
'data.frame':	4 obs. of  4 variables:
 $ id    : int  10 11 12 13
 $ name  : chr  "sai" "ram" NA "sahithi"
 $ dob   : chr  "1990-10-02" "1981-03-24" "1987-06-14" "1985-08-16"
 $ gender: chr  "M" "" "F" "F"

7. CSV encoding

If you receive a CSV file with other encodings, for example having Spanish characters e.t.c, you should use encoding param with the appropriate encoding. To read it as UTF-8, use encoding=UTF-8 argument while importing a file into DataFrame.


# Use UTF-8 encoding
read_csv = read.csv('/Users/admin/file_noheader.csv', encoding='utf-8')
print(read_csv)

8. read.csv2()

read.csv2() is another R function to import CSV file into DataFrame. This function by default uses a comma as a decimal point and a semicolon as a field separator.


# Using read_csv()
read_csv = read.csv2('/Users/admin/file_noheader.csv')
print(read_csv)

9. Import CSV using read.table()

To import a CSV file in R use read.table(), which doesn’t use any default delimiter. You need to explicitly specify what delimiter and how you wanted to read a CSV file. Functions read.csv(), read.csv2() are wrappers and uses read.table() internally.


# Using read.table()
read_csv = read.table('/Users/admin/file.csv',sep=',')
print(read_csv)

10. Use read_csv()

If you are working with larger files, you should use the read_csv() function readr package. readr is a third-party library hence, in order to use readr library, you need to first install it by using install.packages('readr'). Once installation completes, load the readr library in order to use this read_csv() method. To load a library in R use library("readr").


# Load readr
library("readr")

# Read CSV into DataFrame
read_csv = read_csv('/Users/admin/file.csv')
print(read_csv)

Conclusion

In this article, you have learned how to import a CSV file into R DataFrame using read.csv(), read.csv2(), read.table() and finally read_csv() from readr package.

Related Articles

References

r read csv

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing R – Read CSV File with Examples