Sometimes you may need to read or import multiple CSV files from a folder or from a list of files and convert them into Pandas DataFrame. You can do this by reading each CSV file into DataFrame and appending or concatenating the DataFrames to create a single DataFrame with data from all files.
Here, I will use read_csv() to read CSV files and the concat() function to concatenate DataFrams together to create one big DataFrame.
1. Read Multiple CSV Files from the List
When you want to read multiple CSV files that exist in different folders, first create a list of strings with absolute paths and use it as shown below to load all CSV files and create one big Pandas DataFrame.
# Read CSV files from List
df = pd.concat(map(pd.read_csv, ['d1.csv', 'd2.csv','d3.csv']))
Note that by default concat() method performs an append operation meaning, it appends each DataFrame at the end of another DataFrame and creates a single DataFrame. Similar to SQL union.
2. Read Multiple CSV Files from a Folder
Unfortunately, read_csv() doesn’t support reading multiple CSV files from a folder into DataFrame, maybe in future pandas versions, it might support it, until then we have to use workarounds to read multiple CSV files from a folder and merge them into DataFrame.
# Import libraries
import glob
import pandas as pd
# Get CSV files list from a folder
path = '/apps/data_csv_files
csv_files = glob.glob(path + "/*.csv")
# Read each CSV file into DataFrame
# This creates a list of dataframes
df_list = (pd.read_csv(file) for file in csv_files)
# Concatenate all DataFrames
big_df = pd.concat(df_list, ignore_index=True)
An alternate approach using the map() function.
# Approach using map() function.
df = pd.concat(map(pd.read_csv, glob.glob(path + "/*.csv")))
In case you want to use optional params of the read_csv() function use it by defining the function.
# By using function
def readcsv(args):
return pd.read_csv(args, header=None)
df = pd.concat(map(readcsv, filepaths))
3. Using Dask DataFrames
The Dask Dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call df.compute()
to convert the DataFrame into a Pandas DataFrame.
The Dask library can be used to read a data frame from multiple files. Before you use the Dask library, first you need to install it using the pip command or any other approach.
# Using data library
import dask.dataframe as dd
df = dd.read_csv(path + "/*.csv")
Frequently Asked Questions on Pandas Read Multiple CSV Files into DataFrame
To read multiple CSV files into a single DataFrame, you can use a loop or list comprehension along with the pd.concat()
function. For example, df = pd.concat((pd.read_csv(f) for f in files), ignore_index=True)
You can specify a common set of columns or handle the columns dynamically. For example, common_columns = ['column1', 'column2', 'column3']
dfs = [pd.read_csv(f)[common_columns] for f in files]
df = pd.concat(dfs, ignore_index=True)
Use the sep
parameter in pd.read_csv()
to specify the delimiter. For example, df = pd.concat((pd.read_csv(f, sep=';') for f in files), ignore_index=True)
You can use the usecols
parameter in pd.read_csv()
to specify the columns you want to include. For example, df = pd.concat((pd.read_csv(f, usecols=['column1', 'column2']) for f in files), ignore_index=True)
Conclusion
In this article, you have learned multiple ways of reading CSV files from a folder and creating one big DataFrame. Since the read_csv() function doesn’t support reading you have to load each CSV into a separate DataFrame and combine them using the concat() function.
Happy Learning !!
Related Articles
- How to Read Excel Multiple Sheets in Pandas
- Pandas Drop Index Column Explained
- Pandas Groupby Aggregate Explained
- Convert Pandas DataFrame to Series
- Pandas Read Text with Examples
- Pandas Read JSON File with Examples
- Pandas ExcelWriter Explained with Examples
- Pandas Read Excel with Examples
- Pandas Set Column as Index in DataFrame
- Pandas – Convert JSON to CSV
- How to Read CSV from String in Pandas
- How to read CSV without headers in pandas
- Pandas Write DataFrame to CSV
- Export Pandas to CSV without Index & Header