How to Read CSV from String in Pandas

Do pandas read/import CSV from the string? We are often required to read a CSV file but in some cases, you might want to import from a String variable into DataFrame. In this article, I will explain how to read a CSV from a String with examples.

1. Quick Examples of Read CSV from Stirng

The following are quick examples of how to read a CSV from a string variable.


# Below are quick examples

# Convert String into StringIO
csvStringIO = StringIO(csvString)
df = pd.read_csv(csvStringIO, sep=",", header=None)

# Assign column names
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(StringIO(csvString), sep=",", header=None,names=columns)

# Using split by line
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')], 
                   columns=columns)

2. Using StringIO to Read CSV from String

In order to read a CSV from a String into pandas DataFrame first you need to convert the string into StringIO. so import StringIO from the io library before use. If you are using Python version 2 or earlier use from StringIO import StringIO.


from io import StringIO

# CSV String with out headers
csvString = """Spark,25000,50 Days,2000
Pandas,20000,35 Days,1000
Java,15000,,800
Python,15000,30 Days,500
PHP,18000,30 Days,800"""

# Convert String into StringIO
csvStringIO = StringIO(csvString)

Now pass the StringIO object as a param to pandas read_csv() function, this converts the String into DataFrame.


# Load CSV String into DataFrame

import pandas as pd
df = pd.read_csv(csvStringIO, sep=",", header=None)
print(df)

Yields below output. Note that our CSV in a string doesn’t have a header hence I use header=None param to read a CSV without a header.


# Output:
        0      1        2     3
0   Spark  25000  50 Days  2000
1  Pandas  20000  35 Days  1000
2    Java  15000      NaN   800
3  Python  15000  30 Days   500
4     PHP  18000  30 Days   800

If you have a header just remove header=None param. To set the column names on DataFrame use names param with value as a list of column names.


# Assign column names
csvStringIO = StringIO(csvString)
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(csvStringIO, sep=",", header=None,names=columns)
print(df)

Yields below output.


# Output:
   Course    Fee Duration  Discount
0   Spark  25000  50 Days      2000
1  Pandas  20000  35 Days      1000
2    Java  15000      NaN       800
3  Python  15000  30 Days       500
4     PHP  18000  30 Days       800

3. Read CSV from String using Split

Alternatively, you can read CSV from a string by splitting the string by a new line and then split the record by column separator to convert it into a nested list of rows. and then create a pandas DataFrame from the list.


# Using split by line
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')], 
                   columns=columns)
print(df)

Yields same output as above.

4. Interactive Way

Note that this solution can’t be used in your project as this works when you wanted to quickly test something interactively from the clipboard.

First copy the CSV from the string to the clipboard, you can find the CSV referenced here at the GitHub project.


# From reading csv string from clipboard
df = pd.read_clipboard(sep=',')
print(df)

5. Complete Example


import pandas as pd
from io import StringIO

csvString = """Spark,25000,50 Days,2000
Pandas,20000,35 Days,1000
Java,15000,,800
Python,15000,30 Days,500
PHP,18000,30 Days,800"""

# Read from CSV String
csvStringIO = StringIO(csvString)
df = pd.read_csv(csvStringIO, sep=",", header=None)
print(df)

# With columns
csvStringIO = StringIO(csvString)
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(csvStringIO, sep=",", header=None,names=columns)
print(df)

# Ignore Header and assign column names
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')], columns=columns)
print(df)

# Read from clipboard
df = pd.read_clipboard(sep=',')
print(df)

You can find this example at python-pandas-examples GitHub project.

Conclusion

In this article, you have learned to load a CSV from a String with and without a header and assign custom column names. Loading can be done by using the StringIO package or by just splitting the CSV into a list of lists and creating a dataframe from it.

Naveen (NNK)

I am Naveen (NNK) working as a Principal Engineer. I am a seasoned Apache Spark Engineer with a passion for harnessing the power of big data and distributed computing to drive innovation and deliver data-driven insights. I love to design, optimize, and managing Apache Spark-based solutions that transform raw data into actionable intelligence. I am also passion about sharing my knowledge in Apache Spark, Hive, PySpark, R etc.

Leave a Reply

You are currently viewing How to Read CSV from String in Pandas