Pandas Read Multiple CSV Files into DataFrame

Sometimes you may need to read or import multiple CSV files from a folder or from a list of files and convert them into Pandas DataFrame. You can do this by reading each CSV file into DataFrame and appending or concatenating the DataFrames to create a single DataFrame with data from all files.

1. Read Multiple CSV Files from the List

When you want to read multiple CSV files that exist in different folders, first create a list of strings with absolute paths and use it as shown below to load all CSV files and create one big Pandas DataFrame.


# Read CSV files from List
df = pd.concat(map(pd.read_csv, ['d1.csv', 'd2.csv','d3.csv']))

Note that by default concat() method performs an append operation meaning, it appends each DataFrame at the end of another DataFrame and creates a single DataFrame. Similar to SQL union.

2. Read Multiple CSV Files from a Folder

Unfortunately, read_csv() doesn’t support reading multiple CSV files from a folder into DataFrame, maybe in future pandas versions, it might support it, until then we have to use workarounds to read multiple CSV files from a folder and merge them into DataFrame.


# Import libraries
import glob
import pandas as pd

# Get CSV files list from a folder
path = '/apps/data_csv_files
csv_files = glob.glob(path + "/*.csv")

# Read each CSV file into DataFrame
# This creates a list of dataframes
df_list = (pd.read_csv(file) for file in csv_files)

# Concatenate all DataFrames
big_df   = pd.concat(df_list, ignore_index=True)

An alternate approach using the map() function.


# Approach using map() function.
df = pd.concat(map(pd.read_csv, glob.glob(path + "/*.csv")))

In case you want to use optional params of the read_csv() function use it by defining the function.


# By using function
def readcsv(args):
    return pd.read_csv(args, header=None)

df = pd.concat(map(readcsv, filepaths))

3. Using Dask DataFrames

The Dask Dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call df.compute() to convert the DataFrame into a Pandas DataFrame.

The Dask library can be used to read a data frame from multiple files. Before you use the Dask library, first you need to install it using the pip command or any other approach.


# Using data library
import dask.dataframe as dd
df = dd.read_csv(path + "/*.csv")

Frequently Asked Questions on Pandas Read Multiple CSV Files into DataFrame

How can I read multiple CSV files into a single DataFrame in Pandas?

To read multiple CSV files into a single DataFrame, you can use a loop or list comprehension along with the pd.concat() function. For example, df = pd.concat((pd.read_csv(f) for f in files), ignore_index=True)

How can I handle different column names or orders in the CSV files?

You can specify a common set of columns or handle the columns dynamically. For example, common_columns = ['column1', 'column2', 'column3']
dfs = [pd.read_csv(f)[common_columns] for f in files]
df = pd.concat(dfs, ignore_index=True)

What if the CSV files have different delimiters or separators?

Use the sep parameter in pd.read_csv() to specify the delimiter. For example, df = pd.concat((pd.read_csv(f, sep=';') for f in files), ignore_index=True)

How can I read only specific columns from the CSV files?

You can use the usecols parameter in pd.read_csv() to specify the columns you want to include. For example, df = pd.concat((pd.read_csv(f, usecols=['column1', 'column2']) for f in files), ignore_index=True)

Conclusion

In this article, you have learned multiple ways of reading CSV files from a folder and creating one big DataFrame. Since the read_csv() function doesn’t support reading you have to load each CSV into a separate DataFrame and combine them using the concat() function.

Happy Learning !!

1. Read Multiple CSV Files from the List

2. Read Multiple CSV Files from a Folder

3. Using Dask DataFrames

Frequently Asked Questions on Pandas Read Multiple CSV Files into DataFrame

Conclusion

Related Articles