Pandas Write DataFrame to CSV

By using pandas.DataFrame.to_csv() method you can write/save/export a pandas DataFrame to CSV File. By default to_csv() method export DataFrame to a CSV file with comma delimiter and row index as the first column. In this article, I will cover how to export to CSV file by a custom delimiter, with or without column header, ignoring index, encoding, quotes, and many more.

In order to explain, first let’s create a DataFrame with a few rows and columns.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python"],
    'Fee' :[22000,25000,np.nan,24000],
    'Duration':['30day',None,'55days',np.nan],
    'Discount':[1000,2300,1000,np.nan]
          }
df = pd.DataFrame(technologies)

1. Write Pandas DataFrame to CSV File

Pandas DataFrame provides to_csv() method to write/export DataFrame to CSV comma-separated delimiter file along with header and index.


# Write DataFrame to CSV File with Default params.
df.to_csv("c:/tmp/courses.csv")

This creates a courses.csv file at the specified location with the below contents in a file.

pandas write to csv
Write pandas DataFrame to CSV File

As you see by default CSV file was created with a comma-separated delimiter file, with column header and row index. You can change this behavior by supplying param to the method. to_csv() takes multiple optional params as shown in the below syntax.


# to_csv() Syntax
DataFrame.to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, 
columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, 
compression='infer', quoting=None, quotechar='"', line_terminator=None, 
chunksize=None, date_format=None, doublequote=True, escapechar=None, 
decimal='.', errors='strict', storage_options=None)

2. Write DataFrame to CSV without Header

You can use header=False param to write DataFrame without a header (column names). By default to_csv() method exports DataFrame to CSV file with header hence you need to use this param to ignore the header.


# Write DataFrame to CSV without Header
df.to_csv("c:/tmp/courses.csv", header=False)

# Writes Below Content to CSV File
#0,Spark,22000.0,30day,1000.0
#1,PySpark,25000.0,,2300.0
#2,Hadoop,,55days,1000.0
#3,Python,24000.0,,

3. Writing Using Custom Delimiter

By default CSV file is created with a comma delimiter, you can change this behavior by using sep param (separator) and chose other delimiters like tab (\t), pipe (|) e.t.c.


# Using Custom Delimiter
df.to_csv("c:/tmp/courses.csv", header=False, sep='|')

# Writes Below Content to CSV File
#0|Spark|22000.0|30day|1000.0
#1|PySpark|25000.0||2300.0
#2|Hadoop||55days|1000.0
#3|Python|24000.0||
#

4. Writing to CSV ignoring Index

As I said earlier, by default the DataFrame would be exported to CSV with row index, you can ignore this by using param index=False.


# Write DataFrame to CSV without Index
df.to_csv("c:/tmp/courses.csv", index=False)

# Writes Below Content to CSV File
#Courses,Fee,Duration,Discount
#Spark,22000.0,30day,1000.0
#PySpark,25000.0,,2300.0
#Hadoop,,55days,1000.0
#Python,24000.0,,

5. Export Selected Columns to CSV File

Sometimes you would be required to export selected columns from DataFrame to CSV File, In order to select specific columns use columns param. In this example, I have created a list column_names with the required columns and used it on to_csv() method. You can also select columns from pandas DataFrame before writing to a file.


# Export Selected Columns to CSV File
column_names = ['Courses', 'Fee','Discount']
df.to_csv("c:/tmp/courses.csv",index=False, columns=column_names)

# Writes Below Content to CSV File
#Courses,Fee,Discount
#Spark,22000.0,1000.0
#PySpark,25000.0,2300.0
#Hadoop,,1000.0
#Python,24000.0,

6. Change Header Column Names While Writing

Use header param to change the column names on the header while writing pandas DataFrame to CSV File. You can also rename pandas DataFrame columns before writing to a file.


# Change Header Column Names While Writing
column_names = ['Courses', 'Course_Fee','Course_Duration','Course_Discount']
df.to_csv("c:/tmp/courses.csv",index=False, header=column_names)

# Writes Below Content to CSV File
#Courses,Course_Fee,Course_Duration,Course_Discount
#Spark,22000.0,30day,1000.0
#PySpark,25000.0,,2300.0
#Hadoop,,55days,1000.0
#Python,24000.0,,

7. Handling Missing Values (None/NaN)

If you notice all the above examples, None/NaN values are written as an empty string. You can control this behavior and assign custom values using na_rep param.


# Handling Missing Values (None/NaN)
df.to_csv("c:/tmp/courses.csv",index=False, na_rep='Unknown')

# Writes Below Content to CSV File
#Courses,Fee,Duration,Discount
#Spark,22000.0,30day,1000.0
#PySpark,25000.0,Unknown,2300.0
#Hadoop,Unknown,55days,1000.0
#Python,24000.0,Unknown,Unknown

8. Write DataFrame to CSV by Encoding

Sometimes you face problems with encoding, I recommend you specify encoding while writing DataFrame to a CSV file. To use a specific encoding use the encoding argument. For example, encoding='utf-8' exports pandas DataFrame in utf-8 encoding to CSV file.


# Write CSV by Encoding
df.to_csv(file_name, sep='\t', encoding='utf-8')

9. Append DataFrame to existing CSV File

When you write pandas DataFrame to an existing CSV file, it overwrites the file with the new contents. To append a DataFrame to an existing CSV file, you need to specify the append write mode using mode='a'.


# Append DataFrame to existing CSV File
df.to_csv("c:/tmp/courses.csv", header=False, sep='|', index=False, mode='a')

10. Other Frequenty Used Params

float_format : Format string for floating-point numbers.

path_or_buf : Path to a file, if not specified it returns as a String.

compression : Used to compress. supported compression ‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’

quotechar : String of length 1. The character used to quote fields. Default double quotes “

line_terminator : Character to use for the terminate line. You can use os.linesep

date_format : Format string for datetime objects.

errors : Specifies how encoding and decoding errors are to be handled. 

Conclusion

By default pandas.DataFrame.to_csv() writes DataFrame with header, index, and comma separator delimiter. you can change this behavior by using optional params. For example, header=False to ignore header, index=False to ignore row index, sep='|' to change delimiter e.t.c.

You may Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas Write DataFrame to CSV