Use pandas read_csv()
function to read CSV file (comma separated) into python pandas DataFrame and supports options to read any delimited file. In this pandas article, I will explain how to read a CSV file with or without a header, skip rows, skip columns, set columns to index, and many more with examples.
CSV files are plain text that is used to store 2-dimensional data in a simple human-readable format, this is the format mostly used in industry to exchange big batch files between organizations. In some cases, these files are also used to store metadata.
Related: Pandas Write to CSV File
1. read_csv() Syntax
Following is the Syntax of read_csv() function.
# Syntax of read_csv()
pandas.read_csv(filepath_or_buffer, sep=NoDefault.no_default, delimiter=None, header='infer', names=NoDefault.no_default, index_col=None, usecols=None, squeeze=None, prefix=NoDefault.no_default, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=None, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, encoding_errors='strict', dialect=None, error_bad_lines=None, warn_bad_lines=None, on_bad_lines=None, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None, storage_options=None)
As you see above, it takes several optional parameters to support reading CSV files with different options. When you are dealing with huge files, some of these params helps you in loading CSV file faster. In this article, I will explain the usage of some of these options with examples.
2. Pandas Read CSV into DataFrame
To read a CSV file with comma delimiter use pandas.read_csv()
and to read tab delimiter (\t) file use read_table(). Besides these, you can also use pipe or any custom separator file.
I will use the above data to read CSV file, you can find the data file at GitHub.
# Import pandas
import pandas as pd
# Read CSV file into DataFrame
df = pd.read_csv('courses.csv')
print(df)
# Output:
# Courses Fee Duration Discount
# 0 Spark 25000 50 Days 2000
# 1 Pandas 20000 35 Days 1000
# 2 Java 15000 NaN 800
# 3 Python 15000 30 Days 500
# 4 PHP 18000 30 Days 800
By default, it reads first rows on CSV as column names (header) and it creates an incremental numerical number as index starting from zero.
Use sep
or delimiter
to specify the separator of the columns. By default it uses comma.
3. Set Column as Index
You can set a column as an index using index_col
as param. This param takes values {int, str, sequence of int/str, or False, optional, default None}.
# Set column as Index
df = pd.read_csv('courses.csv', index_col='Courses')
print(df)
# Output:
# Fee Duration Discount
# Courses
# Spark 25000 50 Days 2000
# Pandas 20000 35 Days 1000
# Java 15000 NaN 800
# Python 15000 30 Days 500
# PHP 18000 30 Days 800
Alternatively, you can also use index/position to specify the column name. When used a list of values, it creates a MultiIndex.
4. Skip Rows
Sometimes you may need to skip first-row or skip footer rows, use skiprows
and skipfooter
param respectively.
# Skip first few rows
df = pd.read_csv('courses.csv', header=None, skiprows=2)
print(df)
# Output:
# 0 1 2 3
# 0 Pandas 20000 35 Days 1000
# 1 Java 15000 NaN 800
# 2 Python 15000 30 Days 500
# 3 PHP 18000 30 Days 800
Skip rows param also takes a list of rows to skip.
5. Read CSV by Ignoring Column Names
By default, it considers the first row from excel as a header and used it as DataFrame column names. In case you wanted to consider the first row from excel as a data record use header=None
param and use names
param to specify the column names. Not specifying names result in column names with numerical numbers.
# Ignore header and assign new columns
columns = ['courses','course_fee','course_duration','course_discount']
df = pd.read_csv('courses.csv', header=None,names=columns,skiprows=1)
print(df)
# Output:
# courses course_fee course_duration course_discount
# 0 Spark 25000 50 Days 2000
# 1 Pandas 20000 35 Days 1000
# 2 Java 15000 NaN 800
# 3 Python 15000 30 Days 500
# 4 PHP 18000 30 Days 800
6. Load only Selected Columns
Using usecols
param you can select columns to load from the CSV file. This takes columns as a list of strings or a list of int.
# Load only selected columns
columns = ['courses','course_fee','course_duration','course_discount']
df = pd.read_csv('courses.csv', usecols =['Courses','Fee','Discount'])
print(df)
# Output:
# Courses Fee Discount
# 0 Spark 25000 2000
# 1 Pandas 20000 1000
# 2 Java 15000 800
# 3 Python 15000 500
# 4 PHP 18000 800
7. Set DataTypes to Columns
By default read_csv()
assigns the data type that best fits based on the data. For example Fee
and Discount
for DataFrame is given int64 and Courses
and Duration
are given string.
Let’s change the Fee columns to float type.
# Set column data types
df = pd.read_csv('courses.csv', dtype={'Courses':'string','Fee':'float'})
print(df.dtypes)
# Output:
# Courses string
# Fee float64
# Duration object
# Discount int64
# dtype: object
7. Other Params of pandas read_csv()
nrows
– Specify how many rows to read.true_value
– What all values to consider as True.false_values
– What all values to consider as False.mangle_dupe_cols
– Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’.converters
– Supply Dict of values you wanted to convert.skipinitialspace
– Similar to right trim. Skips spaces after separator.na_values
– Specify what all values to consider as NaN/NA.keep_default_na
– Specify whether to load NaN values from the data.na_filter
– Detect missing values. set this to False to improve performance.skip_blank_lines
– skip empty lines with out data.parse_dates
– Specify how you wanted to parse dates.thousands
– Separator for thousdand.decimal
– Character for decimal point.lineterminator
– Line separator.quotechar
– Use quote character when you wanted to consider delimiter within a value.
Besides these, there are many more optional params, refer to pandas documentation for details.
Frequently Asked Questions on Pandas read_csv() with Examples
read_csv()
is a function in the Pandas library used to read data from a CSV (Comma-Separated Values) file into a DataFrame, which is a two-dimensional tabular data structure in Pandas.
The read_csv()
function in Pandas can read data directly from a URL. You just need to provide the URL as the file path. For example, the CSV data located at the specified URL (‘https://example.com/data.csv‘) is read into a Pandas DataFrame (df
). Ensure that the URL is accessible and contains the CSV data in the correct format.
You can use the skiprows
parameter of the read_csv()
function in Pandas to skip a specific number of rows or lines at the beginning of the file. This can be useful, for example, when you have header information that you want to skip.
You can use the index_col
parameter of the read_csv()
function in Pandas to specify which column should be used as the index column. For example, the index_col='id'
parameter indicates that the ‘id’ column from the CSV file should be used as the index of the resulting DataFrame. Adjust the column name according to the column you want to use as the index.
You can use the na_values
parameter of the read_csv()
function in Pandas to specify which values should be treated as missing values when reading a CSV file.
You can use the usecols
parameter of the read_csv()
function in Pandas to specify which columns you want to read from the CSV file. For example, the usecols
parameter is set to a list of column names (‘column1’, ‘column2’) that you want to include in the resulting DataFrame. Adjust the list according to the specific columns you want to read.
Conclusion
In this Python article, you have learned what is CSV file, how to load it into pandas DataFrame. Also learned skipping rows, selecting columns, ignoring header, and many more examples.
Related Articles
- pandas ExcelWriter Usage with Examples
- Read Excel file into pandas DataFrame
- Pandas Read Excel with Examples
- Pandas Read JSON File with Examples
- Pandas Write DataFrame to CSV
- Pandas DataFrame isna() Function
- Pandas Read Text with Examples
- Pandas – Convert JSON to CSV
- How to Read CSV from String in Pandas
- Pandas Read Multiple CSV Files into DataFrame
- How to read CSV without headers in pandas
- Export Pandas to CSV without Index & Header