Do Pandas read/import CSV from the string? We are often required to read a CSV file but in some cases, you might want to import from a String variable into DataFrame. In this article, I will explain how to read a CSV from a String with examples.
1. Quick Examples of Read CSV from String
The following are quick examples of how to read a CSV from a string variable.
# Below are quick examples
# Example 1: Convert String into StringIO
csvStringIO = StringIO(csvString)
df = pd.read_csv(csvStringIO, sep=",", header=None)
# Example 2: Assign column names
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(StringIO(csvString), sep=",", header=None,names=columns)
# Example 3: Using split by line
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')],
columns=columns)
2. Using StringIO to Read CSV from String
To read a CSV from a String into pandas DataFrame first you need to convert the string into StringIO. So import StringIO from the io library before use. If you are using Python version 2 or earlier use from StringIO import StringIO.
from io import StringIO
# CSV String with out headers
csvString = """Spark,25000,50 Days,2000
Pandas,20000,35 Days,1000
Java,15000,,800
Python,15000,30 Days,500
PHP,18000,30 Days,800"""
# Convert String into StringIO
csvStringIO = StringIO(csvString)
Now pass the StringIO object as a param to Pandas read_csv() function, this converts the String into DataFrame.
# Load CSV String into DataFrame
import pandas as pd
df = pd.read_csv(csvStringIO, sep=",", header=None)
print("After reading CSV string into DataFrame:\n", df)
Yields below output. Note that our CSV in a string doesn’t have a header hence I use header=None
a param to read a CSV without a header.
If you have a header just remove the header=None
param. To set the column names on DataFrame use the names
param with value as a list of column names.
# Assign column names
csvStringIO = StringIO(csvString)
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(csvStringIO, sep=",", header=None,names=columns)
print("After assigning column names to Dataframe:\n", df)
Yields below output.
3. Read CSV from String using Split
Alternatively, you can read CSV from a string by splitting the string by a new line and then split the record by column separator to convert it into a nested list of rows. and then create a pandas DataFrame from the list.
# Using split by line
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')],
columns=columns)
print(df)
Yields the same output as above.
4. Interactive Way
Note that this solution can’t be used in your project as this works when you want to quickly test something interactively from the clipboard.
First, copy the CSV from the string to the clipboard, you can find the CSV referenced here at the GitHub project.
# From reading csv string from clipboard
df = pd.read_clipboard(sep=',')
print(df)
5. Complete Example
import pandas as pd
from io import StringIO
csvString = """Spark,25000,50 Days,2000
Pandas,20000,35 Days,1000
Java,15000,,800
Python,15000,30 Days,500
PHP,18000,30 Days,800"""
# Read from CSV String
csvStringIO = StringIO(csvString)
df = pd.read_csv(csvStringIO, sep=",", header=None)
print(df)
# With columns
csvStringIO = StringIO(csvString)
columns=['Course','Fee','Duration','Discount']
df = pd.read_csv(csvStringIO, sep=",", header=None,names=columns)
print(df)
# Ignore Header and assign column names
df = pd.DataFrame([row.split(',') for row in csvString.split('\n')], columns=columns)
print(df)
# Read from clipboard
df = pd.read_clipboard(sep=',')
print(df)
You can find this example at python-pandas-examples GitHub project.
Frequently Asked Questions on Read CSV from String in Pandas
To read a CSV from a string, you can use the pd.read_csv
function along with the io.StringIO
class from the io
module. For example, csvstring = "Spark,25000,50 Days,2000"
df = pd.read_csv(StringIO(csvstring))
You can specify the delimiter using the sep
parameter in the pd.read_csv()
function. For example, if our CSV contains tab-separated, csv_string = "col1\tcol2\n1\t2\n3\t4"
df = pd.read_csv(StringIO(csvstring), sep='\t')
If your CSV string has a header, you can use the header
parameter in pd.read_csv
. By default, it is set to ‘infer’, which means Pandas will try to infer the header. For example, df = pd.read_csv(StringIO(csvstring))
Yes, you can use the skiprows
parameter to skip a specific number of rows from the beginning of the file. For example,
df = pd.read_csv(StringIO(csvstring), skiprows=1)
The na_values
parameter allows you to specify a list of values to be treated as NaN. For example, df = pd.read_csv(StringIO(csvstring), na_values=[''])
Conclusion
In this article, you have learned to load a CSV from a String with and without a header and assign custom column names. Loading can be done by using the StringIO package or by just splitting the CSV into a list of lists and creating a DataFrame from it.
Related Articles
- Convert Pandas DatetimeIndex to String
- How to Replace String in Pandas DataFrame
- Pandas Replace substring in DataFrame
- Pandas – Convert DataFrame to JSON String
- Pandas Change String Object to Date in DataFrame
- Pandas Convert Date (datetime) to String Format
- Pandas Get Row Number of DataFrame
- Pandas – Convert JSON to CSV
- Pandas Read Multiple CSV Files into DataFrame
- Pandas Write DataFrame to CSV
- Export Pandas to CSV without Index & Header
- Pandas Read JSON File with Examples
- Pandas Convert JSON to DataFrame
- How to Convert Pandas Uppercase Column
- Convert Pandas Column to Lowercase
- Pandas DataFrame quantile() Function
- Pandas Read Text with Examples
- Export Pandas to CSV without Index & Header