By using pandas.DataFrame.to_csv()
method you can write/save/export a pandas DataFrame to CSV File. By default to_csv()
method export DataFrame to a CSV file with comma delimiter and row index as the first column. In this article, I will cover how to export to CSV file by a custom delimiter, with or without column header, ignoring index, encoding, quotes, and many more.
- Write Pandas DataFrame to CSV File
- Write DataFrame to CSV without Header
- Writing Using Custom Delimiter
- Writing to CSV without Index
- Export Selected Columns to CSV File
- Change Header Column Names While Writing
- Handling Missing Values (None/NaN)
- Write DataFrame to CSV by Encoding
- Append DataFrame to existing CSV File
- Other Frequently Used Params
Key Points –
- Use
DataFrame.to_csv()
to write a DataFrame to a CSV file. - The first argument specifies the file path or file object where the CSV will be saved.
- The
sep
parameter allows setting a custom delimiter, with the default being a comma. - The
header
parameter controls whether or not column headers are written to the CSV file. - The
index
parameter controls whether the DataFrame’s index is written as the first column. - The
encoding
parameter specifies the character encoding of the output file. - The
mode
parameter defines how the file is opened, with default being'w'
(write mode). - The
na_rep
parameter allows specifying a string to replace missing values (NaNs).
In order to explain, first let’s create a DataFrame with a few rows and columns.
# Create DataFrame
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python"],
'Fee' :[22000,25000,np.nan,24000],
'Duration':['30day',None,'55days',np.nan],
'Discount':[1000,2300,1000,np.nan]
}
df = pd.DataFrame(technologies)
1. Write Pandas DataFrame to CSV File
Pandas DataFrame provides to_csv()
method to write/export DataFrame to CSV comma-separated delimiter file along with header and index.
# Write DataFrame to CSV File with Default params.
df.to_csv("c:/tmp/courses.csv")
This creates a courses.csv
file at the specified location with the below contents in a file.
As you see by default CSV file was created with a comma-separated delimiter file, with column header and row index. You can change this behavior by supplying param to the method. to_csv()
takes multiple optional params as shown in the below syntax.
# To_csv() Syntax
DataFrame.to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None,
columns=None, header=True, index=True, index_label=None, mode='w', encoding=None,
compression='infer', quoting=None, quotechar='"', line_terminator=None,
chunksize=None, date_format=None, doublequote=True, escapechar=None,
decimal='.', errors='strict', storage_options=None)
2. Write DataFrame to CSV without Header
You can use header=False
param to write DataFrame without a header (column names). By default to_csv() method exports DataFrame to CSV file with header hence you need to use this param to ignore the header.
# Write DataFrame to CSV without Header
df.to_csv("c:/tmp/courses.csv", header=False)
# Output:
# Writes Below Content to CSV File
# 0,Spark,22000.0,30day,1000.0
# 1,PySpark,25000.0,,2300.0
# 2,Hadoop,,55days,1000.0
# 3,Python,24000.0,,
3. Writing Using Custom Delimiter
By default CSV file is created with a comma delimiter, you can change this behavior by using sep
param (separator) and chose other delimiters like tab (\t), pipe (|) e.t.c.
# Using Custom Delimiter
df.to_csv("c:/tmp/courses.csv", header=False, sep='|')
# Output:
# Writes Below Content to CSV File
# 0|Spark|22000.0|30day|1000.0
# 1|PySpark|25000.0||2300.0
# 2|Hadoop||55days|1000.0
# 3|Python|24000.0||
4. Writing to CSV ignoring Index
As I said earlier, by default the DataFrame would be exported to CSV with row index, you can ignore this by using param index=False
.
# Write DataFrame to CSV without Index
df.to_csv("c:/tmp/courses.csv", index=False)
# Output:
# Writes Below Content to CSV File
# Courses,Fee,Duration,Discount
# Spark,22000.0,30day,1000.0
# PySpark,25000.0,,2300.0
# Hadoop,,55days,1000.0
# Python,24000.0,,
5. Export Selected Columns to CSV File
Sometimes you would be required to export selected columns from DataFrame to CSV File, In order to select specific columns use columns
param. In this example, I have created a list column_names
with the required columns and used it on to_csv()
method. You can also select columns from pandas DataFrame before writing to a file.
# Export Selected Columns to CSV File
column_names = ['Courses', 'Fee','Discount']
df.to_csv("c:/tmp/courses.csv",index=False, columns=column_names)
# Output:
# Writes Below Content to CSV File
# Courses,Fee,Discount
# Spark,22000.0,1000.0
# PySpark,25000.0,2300.0
# Hadoop,,1000.0
# Python,24000.0,
6. Change Header Column Names While Writing
Use header param
to change the column names on the header while writing pandas DataFrame to CSV File. You can also rename pandas DataFrame columns before writing to a file.
# Change Header Column Names While Writing
column_names = ['Courses', 'Course_Fee','Course_Duration','Course_Discount']
df.to_csv("c:/tmp/courses.csv",index=False, header=column_names)
# Output:
# Writes Below Content to CSV File
# Courses,Course_Fee,Course_Duration,Course_Discount
# Spark,22000.0,30day,1000.0
# PySpark,25000.0,,2300.0
# Hadoop,,55days,1000.0
# Python,24000.0,,
7. Handling Missing Values (None/NaN)
If you notice all the above examples, None
/NaN
values are written as an empty string. You can control this behavior and assign custom values using na_rep
param.
# Handling Missing Values (None/NaN)
df.to_csv("c:/tmp/courses.csv",index=False, na_rep='Unknown')
# Output:
# Writes Below Content to CSV File
# Courses,Fee,Duration,Discount
# Spark,22000.0,30day,1000.0
# PySpark,25000.0,Unknown,2300.0
# Hadoop,Unknown,55days,1000.0
# Python,24000.0,Unknown,Unknown
8. Write DataFrame to CSV by Encoding
Sometimes you face problems with encoding, I recommend you specify encoding while writing DataFrame to a CSV file. To use a specific encoding use the encoding
argument. For example, encoding='utf-8'
exports pandas DataFrame in utf-8
encoding to CSV file.
# Write CSV by Encoding
df.to_csv(file_name, sep='\t', encoding='utf-8')
9. Append DataFrame to existing CSV File
When you write pandas DataFrame to an existing CSV file, it overwrites the file with the new contents. To append a DataFrame to an existing CSV file, you need to specify the append write mode using mode='a'
.
# Append DataFrame to existing CSV File
df.to_csv("c:/tmp/courses.csv", header=False, sep='|', index=False, mode='a')
10. Other Frequenty Used Params
float_format: Format string for floating-point numbers.
path_or_buf: Path to a file, if not specified it returns as a String.
compression: Used to compress. supported compression ‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’
quotechar: String of length 1. The character used to quote fields. Default double quotes “
line_terminator: Character to use for the terminate line. You can use os.linesep
date_format : Format string for datetime objects.
errors: Specifies how encoding and decoding errors are to be handled.
Frequently Asked Questions on Pandas Write DataFrame to CSV
To write a Pandas DataFrame to a CSV file, you can use the to_csv
method. The index=False
parameter is optional but is often used to exclude the row indices from being written to the CSV file.
To specify a different file path for the CSV output, you can provide the full or relative file path as the first argument in the to_csv
method. For example, replace '/path/to/your/directory/output.csv'
with the desired file path and name for your CSV file. You can use a relative path or an absolute path based on your needs.
You can include the index in the CSV file when writing a Pandas DataFrame using the to_csv
method. By default, the index is included. If you want to include the index explicitly, or if you want to exclude it, you can use the index
parameter.
When writing a Pandas DataFrame to a CSV file, you may want to specify how missing values (NaN) should be handled. The to_csv
method provides the na_rep
parameter for this purpose. You can use na_rep
to specify a string representation for missing values in the CSV file.
You can specify the encoding of the CSV file when writing a Pandas DataFrame using the to_csv
method. The default encoding is usually UTF-8, but you can choose a different encoding if needed. The encoding
parameter is used for this purpose.
If you want to use a different delimiter in the CSV file, you can specify it using the sep
parameter in the to_csv
method. By default, the delimiter is a comma (,
), but you can choose another character such as a tab (\t
) for tab-separated values (TSV) or a semicolon (;
) for semicolon-separated values.
Conclusion
By default pandas.DataFrame.to_csv()
writes DataFrame with header, index, and comma separator delimiter. you can change this behavior by using optional params. For example, header=False
to ignore header, index=False
to ignore row index, sep='|'
to change delimiter etc.
Related Articles
- Append a List as a Row to Pandas DataFrame
- Pandas Difference Between loc[] vs iloc[]
- Pandas – Convert JSON to CSV
- Pandas read_csv() with Examples
- How to Read CSV from String in Pandas
- Install Python Pandas on Windows, Linux & Mac OS
- Install Anaconda & Run pandas on Jupyter Notebook
- How to Shuffle Pandas DataFrame Rows Examples
- Pandas Read Multiple CSV Files into DataFrame
- How to read CSV without headers in pandas
- Export Pandas to CSV without Index & Header