• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:19 mins read
You are currently viewing Pandas Write DataFrame to CSV

By using pandas.DataFrame.to_csv() method you can write/save/export a pandas DataFrame to CSV File. By default to_csv() method export DataFrame to a CSV file with comma delimiter and row index as the first column. In this article, I will cover how to export to CSV file by a custom delimiter, with or without column header, ignoring index, encoding, quotes, and many more.

In order to explain, first let’s create a DataFrame with a few rows and columns.


# Create DataFrame
import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python"],
    'Fee' :[22000,25000,np.nan,24000],
    'Duration':['30day',None,'55days',np.nan],
    'Discount':[1000,2300,1000,np.nan]
          }
df = pd.DataFrame(technologies)

1. Write Pandas DataFrame to CSV File

Pandas DataFrame provides to_csv() method to write/export DataFrame to CSV comma-separated delimiter file along with header and index.


# Write DataFrame to CSV File with Default params.
df.to_csv("c:/tmp/courses.csv")

This creates a courses.csv file at the specified location with the below contents in a file.

pandas write to csv
Write pandas DataFrame to CSV File

As you see by default CSV file was created with a comma-separated delimiter file, with column header and row index. You can change this behavior by supplying param to the method. to_csv() takes multiple optional params as shown in the below syntax.


# To_csv() Syntax
DataFrame.to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, 
columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, 
compression='infer', quoting=None, quotechar='"', line_terminator=None, 
chunksize=None, date_format=None, doublequote=True, escapechar=None, 
decimal='.', errors='strict', storage_options=None)

2. Write DataFrame to CSV without Header

You can use header=False param to write DataFrame without a header (column names). By default to_csv() method exports DataFrame to CSV file with header hence you need to use this param to ignore the header.


# Write DataFrame to CSV without Header
df.to_csv("c:/tmp/courses.csv", header=False)

# Output:
# Writes Below Content to CSV File
# 0,Spark,22000.0,30day,1000.0
# 1,PySpark,25000.0,,2300.0
# 2,Hadoop,,55days,1000.0
# 3,Python,24000.0,,

3. Writing Using Custom Delimiter

By default CSV file is created with a comma delimiter, you can change this behavior by using sep param (separator) and chose other delimiters like tab (\t), pipe (|) e.t.c.


# Using Custom Delimiter
df.to_csv("c:/tmp/courses.csv", header=False, sep='|')

# Output:
# Writes Below Content to CSV File
# 0|Spark|22000.0|30day|1000.0
# 1|PySpark|25000.0||2300.0
# 2|Hadoop||55days|1000.0
# 3|Python|24000.0||

4. Writing to CSV ignoring Index

As I said earlier, by default the DataFrame would be exported to CSV with row index, you can ignore this by using param index=False.


# Write DataFrame to CSV without Index
df.to_csv("c:/tmp/courses.csv", index=False)

# Output:
# Writes Below Content to CSV File
# Courses,Fee,Duration,Discount
# Spark,22000.0,30day,1000.0
# PySpark,25000.0,,2300.0
# Hadoop,,55days,1000.0
# Python,24000.0,,

5. Export Selected Columns to CSV File

Sometimes you would be required to export selected columns from DataFrame to CSV File, In order to select specific columns use columns param. In this example, I have created a list column_names with the required columns and used it on to_csv() method. You can also select columns from pandas DataFrame before writing to a file.


# Export Selected Columns to CSV File
column_names = ['Courses', 'Fee','Discount']
df.to_csv("c:/tmp/courses.csv",index=False, columns=column_names)

# Output:
# Writes Below Content to CSV File
# Courses,Fee,Discount
# Spark,22000.0,1000.0
# PySpark,25000.0,2300.0
# Hadoop,,1000.0
# Python,24000.0,

6. Change Header Column Names While Writing

Use header param to change the column names on the header while writing pandas DataFrame to CSV File. You can also rename pandas DataFrame columns before writing to a file.


# Change Header Column Names While Writing
column_names = ['Courses', 'Course_Fee','Course_Duration','Course_Discount']
df.to_csv("c:/tmp/courses.csv",index=False, header=column_names)

# Output:
# Writes Below Content to CSV File
# Courses,Course_Fee,Course_Duration,Course_Discount
# Spark,22000.0,30day,1000.0
# PySpark,25000.0,,2300.0
# Hadoop,,55days,1000.0
# Python,24000.0,,

7. Handling Missing Values (None/NaN)

If you notice all the above examples, None/NaN values are written as an empty string. You can control this behavior and assign custom values using na_rep param.


# Handling Missing Values (None/NaN)
df.to_csv("c:/tmp/courses.csv",index=False, na_rep='Unknown')

# Output:
# Writes Below Content to CSV File
# Courses,Fee,Duration,Discount
# Spark,22000.0,30day,1000.0
# PySpark,25000.0,Unknown,2300.0
# Hadoop,Unknown,55days,1000.0
# Python,24000.0,Unknown,Unknown

8. Write DataFrame to CSV by Encoding

Sometimes you face problems with encoding, I recommend you specify encoding while writing DataFrame to a CSV file. To use a specific encoding use the encoding argument. For example, encoding='utf-8' exports pandas DataFrame in utf-8 encoding to CSV file.


# Write CSV by Encoding
df.to_csv(file_name, sep='\t', encoding='utf-8')

9. Append DataFrame to existing CSV File

When you write pandas DataFrame to an existing CSV file, it overwrites the file with the new contents. To append a DataFrame to an existing CSV file, you need to specify the append write mode using mode='a'.


# Append DataFrame to existing CSV File
df.to_csv("c:/tmp/courses.csv", header=False, sep='|', index=False, mode='a')

10. Other Frequenty Used Params

float_format: Format string for floating-point numbers.

path_or_buf: Path to a file, if not specified it returns as a String.

compression: Used to compress. supported compression ‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’

quotechar: String of length 1. The character used to quote fields. Default double quotes “

line_terminator: Character to use for the terminate line. You can use os.linesep

date_format : Format string for datetime objects.

errors: Specifies how encoding and decoding errors are to be handled. 

Frequently Asked Questions on Pandas Write DataFrame to CSV

How can I write a Pandas DataFrame to a CSV file?

To write a Pandas DataFrame to a CSV file, you can use the to_csv method. The index=False parameter is optional but is often used to exclude the row indices from being written to the CSV file.

How do I specify a different file path for the CSV output?

To specify a different file path for the CSV output, you can provide the full or relative file path as the first argument in the to_csv method. For example, replace '/path/to/your/directory/output.csv' with the desired file path and name for your CSV file. You can use a relative path or an absolute path based on your needs.

Can I include the index in the CSV file?

You can include the index in the CSV file when writing a Pandas DataFrame using the to_csv method. By default, the index is included. If you want to include the index explicitly, or if you want to exclude it, you can use the index parameter.

How do I handle missing values (NaN) when writing to a CSV file?

When writing a Pandas DataFrame to a CSV file, you may want to specify how missing values (NaN) should be handled. The to_csv method provides the na_rep parameter for this purpose. You can use na_rep to specify a string representation for missing values in the CSV file.

How can I write the CSV file with a specific encoding?

You can specify the encoding of the CSV file when writing a Pandas DataFrame using the to_csv method. The default encoding is usually UTF-8, but you can choose a different encoding if needed. The encoding parameter is used for this purpose.

What if I want to use a different delimiter in the CSV file?

If you want to use a different delimiter in the CSV file, you can specify it using the sep parameter in the to_csv method. By default, the delimiter is a comma (,), but you can choose another character such as a tab (\t) for tab-separated values (TSV) or a semicolon (;) for semicolon-separated values.

Conclusion

By default pandas.DataFrame.to_csv() writes DataFrame with header, index, and comma separator delimiter. you can change this behavior by using optional params. For example, header=False to ignore header, index=False to ignore row index, sep='|' to change delimiter etc.

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply