• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:14 mins read
You are currently viewing How to Read CSV from String in Pandas

Do Pandas read/import CSV from the string? We are often required to read a CSV file but in some cases, you might want to import from a String variable into DataFrame. In this article, I will explain how to read a CSV from a String with examples.

1. Quick Examples of Read CSV from String

The following are quick examples of how to read a CSV from a string variable.


# Below are quick examples

# Example 1: Convert String into StringIO
csvStringIO = StringIO(csvString)
df = pd.read_csv(csvStringIO, sep=",", header=None)

# Example 2: Assign column names
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(StringIO(csvString), sep=",", header=None,names=columns)

# Example 3: Using split by line
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')], 
                   columns=columns)

2. Using StringIO to Read CSV from String

To read a CSV from a String into pandas DataFrame first you need to convert the string into StringIO. So import StringIO from the io library before use. If you are using Python version 2 or earlier use from StringIO import StringIO.


from io import StringIO

# CSV String with out headers
csvString = """Spark,25000,50 Days,2000
Pandas,20000,35 Days,1000
Java,15000,,800
Python,15000,30 Days,500
PHP,18000,30 Days,800"""

# Convert String into StringIO
csvStringIO = StringIO(csvString)

Now pass the StringIO object as a param to Pandas read_csv() function, this converts the String into DataFrame.


# Load CSV String into DataFrame

import pandas as pd
df = pd.read_csv(csvStringIO, sep=",", header=None)
print("After reading CSV string into DataFrame:\n", df)

Yields below output. Note that our CSV in a string doesn’t have a header hence I use header=None a param to read a CSV without a header.

pandas read csv from string

If you have a header just remove the header=None param. To set the column names on DataFrame use the names param with value as a list of column names.


# Assign column names
csvStringIO = StringIO(csvString)
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(csvStringIO, sep=",", header=None,names=columns)
print("After assigning column names to Dataframe:\n", df)

Yields below output.

pandas read csv from string

3. Read CSV from String using Split

Alternatively, you can read CSV from a string by splitting the string by a new line and then split the record by column separator to convert it into a nested list of rows. and then create a pandas DataFrame from the list.


# Using split by line
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')], 
                   columns=columns)
print(df)

Yields the same output as above.

4. Interactive Way

Note that this solution can’t be used in your project as this works when you want to quickly test something interactively from the clipboard.

First, copy the CSV from the string to the clipboard, you can find the CSV referenced here at the GitHub project.


# From reading csv string from clipboard
df = pd.read_clipboard(sep=',')
print(df)

5. Complete Example


import pandas as pd
from io import StringIO

csvString = """Spark,25000,50 Days,2000
Pandas,20000,35 Days,1000
Java,15000,,800
Python,15000,30 Days,500
PHP,18000,30 Days,800"""

# Read from CSV String
csvStringIO = StringIO(csvString)
df = pd.read_csv(csvStringIO, sep=",", header=None)
print(df)

# With columns
csvStringIO = StringIO(csvString)
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(csvStringIO, sep=",", header=None,names=columns)
print(df)

# Ignore Header and assign column names
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')], columns=columns)
print(df)

# Read from clipboard
df = pd.read_clipboard(sep=',')
print(df)

You can find this example at python-pandas-examples GitHub project.

Frequently Asked Questions on Read CSV from String in Pandas

How can I read a CSV from a string using Pandas?

To read a CSV from a string, you can use the pd.read_csv function along with the io.StringIO class from the io module. For example,
csvstring = "Spark,25000,50 Days,2000"
df = pd.read_csv(StringIO(csvstring))

How can I read a CSV with a different delimiter from a string?

You can specify the delimiter using the sep parameter in the pd.read_csv() function. For example, if our CSV contains tab-separated, csv_string = "col1\tcol2\n1\t2\n3\t4"
df = pd.read_csv(StringIO(csvstring), sep='\t')

How to handle CSVs with a header in the string?

If your CSV string has a header, you can use the header parameter in pd.read_csv. By default, it is set to ‘infer’, which means Pandas will try to infer the header. For example, df = pd.read_csv(StringIO(csvstring))

Is it possible to skip rows when reading from a CSV string?

Yes, you can use the skiprows parameter to skip a specific number of rows from the beginning of the file. For example,
df = pd.read_csv(StringIO(csvstring), skiprows=1)

How can I handle missing values when reading from a CSV string?

The na_values parameter allows you to specify a list of values to be treated as NaN. For example, df = pd.read_csv(StringIO(csvstring), na_values=[''])

Conclusion

In this article, you have learned to load a CSV from a String with and without a header and assign custom column names. Loading can be done by using the StringIO package or by just splitting the CSV into a list of lists and creating a DataFrame from it.

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium