Do pandas read/import CSV from the string? We are often required to read a CSV file but in some cases, you might want to import from a String variable into DataFrame. In this article, I will explain how to read a CSV from a String with examples.
1. Quick Examples of Read CSV from Stirng
The following are quick examples of how to read a CSV from a string variable.
# Below are quick examples
# Convert String into StringIO
csvStringIO = StringIO(csvString)
df = pd.read_csv(csvStringIO, sep=",", header=None)
# Assign column names
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(StringIO(csvString), sep=",", header=None,names=columns)
# Using split by line
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')],
columns=columns)
2. Using StringIO to Read CSV from String
In order to read a CSV from a String into pandas DataFrame first you need to convert the string into StringIO. so import StringIO from the io library before use. If you are using Python version 2 or earlier use from StringIO import StringIO.
from io import StringIO
# CSV String with out headers
csvString = """Spark,25000,50 Days,2000
Pandas,20000,35 Days,1000
Java,15000,,800
Python,15000,30 Days,500
PHP,18000,30 Days,800"""
# Convert String into StringIO
csvStringIO = StringIO(csvString)
Now pass the StringIO object as a param to pandas read_csv() function, this converts the String into DataFrame.
# Load CSV String into DataFrame
import pandas as pd
df = pd.read_csv(csvStringIO, sep=",", header=None)
print(df)
Yields below output. Note that our CSV in a string doesn’t have a header hence I use header=None
param to read a CSV without a header.
# Output:
0 1 2 3
0 Spark 25000 50 Days 2000
1 Pandas 20000 35 Days 1000
2 Java 15000 NaN 800
3 Python 15000 30 Days 500
4 PHP 18000 30 Days 800
If you have a header just remove header=None
param. To set the column names on DataFrame use names
param with value as a list of column names.
# Assign column names
csvStringIO = StringIO(csvString)
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(csvStringIO, sep=",", header=None,names=columns)
print(df)
Yields below output.
# Output:
Course Fee Duration Discount
0 Spark 25000 50 Days 2000
1 Pandas 20000 35 Days 1000
2 Java 15000 NaN 800
3 Python 15000 30 Days 500
4 PHP 18000 30 Days 800
3. Read CSV from String using Split
Alternatively, you can read CSV from a string by splitting the string by a new line and then split the record by column separator to convert it into a nested list of rows. and then create a pandas DataFrame from the list.
# Using split by line
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')],
columns=columns)
print(df)
Yields same output as above.
4. Interactive Way
Note that this solution can’t be used in your project as this works when you wanted to quickly test something interactively from the clipboard.
First copy the CSV from the string to the clipboard, you can find the CSV referenced here at the GitHub project.
# From reading csv string from clipboard
df = pd.read_clipboard(sep=',')
print(df)
5. Complete Example
import pandas as pd
from io import StringIO
csvString = """Spark,25000,50 Days,2000
Pandas,20000,35 Days,1000
Java,15000,,800
Python,15000,30 Days,500
PHP,18000,30 Days,800"""
# Read from CSV String
csvStringIO = StringIO(csvString)
df = pd.read_csv(csvStringIO, sep=",", header=None)
print(df)
# With columns
csvStringIO = StringIO(csvString)
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(csvStringIO, sep=",", header=None,names=columns)
print(df)
# Ignore Header and assign column names
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')], columns=columns)
print(df)
# Read from clipboard
df = pd.read_clipboard(sep=',')
print(df)
You can find this example at python-pandas-examples GitHub project.
Conclusion
In this article, you have learned to load a CSV from a String with and without a header and assign custom column names. Loading can be done by using the StringIO package or by just splitting the CSV into a list of lists and creating a dataframe from it.