Sometimes you may need to read or import multiple CSV files from a folder or from a list of files and convert them into Pandas DataFrame. You can do this by reading each CSV file into DataFrame and appending or concatenating the DataFrames to create a single DataFrame with data from all files.
Here, I will use read_csv()
to read CSV files and the concat() function to concatenate DataFrams together to create one big DataFrame.
Key Points –
- Use
glob
to locate all CSV files in a directory and load them into a list for easy batch processing. - Load each file into individual DataFrames, then concatenate them using
pd.concat()
to combine into a single DataFrame. - Use a loop to read each file with
pd.read_csv()
and store them in a list, allowing flexibility with processing each file individually if needed. - With
glob
, specify patterns like*.csv
to match files with a specific extension or naming convention. - Control whether files are appended as additional rows or merged by columns, depending on your analysis needs.
1. Read Multiple CSV Files from the List
When you want to read multiple CSV files that exist in different folders, first create a list of strings with absolute paths and use it as shown below to load all CSV files and create one big Pandas DataFrame.
# Read CSV files from List
df = pd.concat(map(pd.read_csv, ['d1.csv', 'd2.csv','d3.csv']))
Note that by default concat()
method performs an append operation meaning, it appends each DataFrame at the end of another DataFrame and creates a single DataFrame. Similar to SQL union.
2. Read Multiple CSV Files from a Folder
Unfortunately, read_csv()
doesn’t support reading multiple CSV files from a folder into DataFrame, maybe in future pandas versions, it might support it, until then we have to use workarounds to read multiple CSV files from a folder and merge them into DataFrame.
# Import libraries
import glob
import pandas as pd
# Get CSV files list from a folder
path = '/apps/data_csv_files
csv_files = glob.glob(path + "/*.csv")
# Read each CSV file into DataFrame
# This creates a list of dataframes
df_list = (pd.read_csv(file) for file in csv_files)
# Concatenate all DataFrames
big_df = pd.concat(df_list, ignore_index=True)
An alternate approach using the map() function.
# Approach using map() function.
df = pd.concat(map(pd.read_csv, glob.glob(path + "/*.csv")))
In case you want to use optional params of the read_csv() function use it by defining the function.
# By using function
def readcsv(args):
return pd.read_csv(args, header=None)
df = pd.concat(map(readcsv, filepaths))
3. Using Dask DataFrames
The Dask Dataframes implement a subset of the Pandas dataframe API. If all the data fits into memory, you can call df.compute()
to convert the DataFrame into a Pandas DataFrame.
The Dask library can be used to read a data frame from multiple files. Before you use the Dask library, first you need to install it using the pip command or any other approach.
# Using data library
import dask.dataframe as dd
df = dd.read_csv(path + "/*.csv")
Frequently Asked Questions on Pandas Read Multiple CSV Files into DataFrame
To read multiple CSV files into a single DataFrame, you can use a loop or list comprehension along with the pd.concat()
function. For example, df = pd.concat((pd.read_csv(f) for f in files), ignore_index=True)
You can specify a common set of columns or handle the columns dynamically. For example, common_columns = ['column1', 'column2', 'column3']
dfs = [pd.read_csv(f)[common_columns] for f in files]
df = pd.concat(dfs, ignore_index=True)
Use the sep
parameter in pd.read_csv()
to specify the delimiter. For example, df = pd.concat((pd.read_csv(f, sep=';') for f in files), ignore_index=True)
You can use the usecols
parameter in pd.read_csv()
to specify the columns you want to include. For example, df = pd.concat((pd.read_csv(f, usecols=['column1', 'column2']) for f in files), ignore_index=True)
Conclusion
In this article, you have learned multiple ways of reading CSV files from a folder and creating one big DataFrame. Since the read_csv()
function doesn’t support reading you have to load each CSV into a separate DataFrame and combine them using the concat()
function.
Happy Learning !!
Related Articles
- Pandas Read Text with Examples
- Pandas Write DataFrame to CSV
- Pandas Read Excel with Examples
- Pandas – Convert JSON to CSV
- Pandas Read TSV with Examples
- Pandas Drop Index Column Explained
- Pandas Groupby Aggregate Explained
- Convert Pandas DataFrame to Series
- Pandas Read JSON File with Examples
- Pandas ExcelWriter Explained with Examples
- Pandas Set Column as Index in DataFrame
- How to Read CSV from String in Pandas
- How to read CSV without headers in pandas
- How to Read Excel Multiple Sheets in Pandas
- Export Pandas to CSV without Index & Header
- Pandas Difference Between map, applymap and apply Methods