Sometimes you may need to read or import multiple CSV files from a folder or from a list of files and convert them into pandas DataFrame. You can do this by reading each CSV file into DataFrame and appending or concatenating the DataFrames to create a single DataFrame with data from all files.
1. Read Multiple CSV Files from List
When you wanted to read multiple CSV files that exist in different folders, first create a list of strings with absolute paths and use it as shown below to load all CSV files and create one big pandas DataFrame.
# Read CSV files from List df = pd.concat(map(pd.read_csv, ['d1.csv', 'd2.csv','d3.csv']))
Note that by default concat() method performs append operation meaning, it appends each DataFrame at the end of another DataFrame and creates a single DataFrame. Similar to SQL union.
2. Read Multiple CSV Files from a Folder
Unfortunately, read_csv() doesn’t support reading multiple CSV files from a folder into DataFrame, maybe in future pandas versions, it might support it, until then we have to use workarounds to read multiple CSV files from a folder and merge them into DataFrame.
# Import libraries import glob import pandas as pd # Get CSV files list from a folder path = '/apps/data_csv_files csv_files = glob.glob(path + "/*.csv") # Read each CSV file into DataFrame # This creates a list of dataframes df_list = (pd.read_csv(file) for file in csv_files) # Concatenate all DataFrames big_df = pd.concat(df_list, ignore_index=True)
An alternate approach using map() function.
df = pd.concat(map(pd.read_csv, glob.glob(path + "/*.csv")))
In case you want to use optional params of read_csv() function use it by defining function.
# By using function def readcsv(args): return pd.read_csv(args, header=None) df = pd.concat(map(readcsv, filepaths))
3. Using Dask DataFrames
The Dask Dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call
df.compute() to convert the DataFrame into a Pandas DataFrame.
The Dask library can be used to read a data frame from multiple files. Before you use Dask library, first you need to install it using pip command or any other approach.
# Using data library import dask.dataframe as dd df = dd.read_csv(path + "/*.csv")
In this article, you have learned multiple ways of reading CSV files from a folder and creating one big DataFrame. Since read_csv() function doesn’t support reading you have to use loading each CSV into a separate DataFrame and combining them using concat() function.
Happy Learning !!
- How to Read Excel Multiple Sheets in Pandas
- Pandas Drop Index Column Explained
- Pandas Groupby Aggregate Explained
- Convert Pandas DataFrame to Series
- Pandas Read Text with Examples
- Pandas Read JSON File with Examples
- Pandas ExcelWriter Explained with Examples
- Pandas Read Excel with Examples
- Pandas Set Column as Index in DataFrame