Pandas Create DataFrame From List

  • Post author:
  • Post category:Pandas
  • Post last modified:February 14, 2024
  • Reading time:13 mins read

How to create a Pandas DataFrame from List? Most of the time we create a pandas DataFrame by reading a CSV file or from other sources however sometimes you may need to create it from a list, multiple lists, or even a list of lists. In this article, I will cover creating a DataFrame from all these different ways with examples.

Table of contents

  1. Create DataFrame from list
  2. Create from multiple lists
  3. Create from list of lists
  4. Create from Dict of lists
  5. Complete Example

Key Points –

  • Use the DataFrame() constructor from the Pandas library to create a DataFrame from a list.
  • Pass the list as an argument to the data parameter within the DataFrame() constructor.
  • Ensure the consistency of data dimensions and alignment when constructing DataFrames from lists to avoid errors.
  • Ensure that the length and structure of the nested lists align with the desired DataFrame structure.
  • Optionally, provide additional parameters such as column names using the columns parameter to customize the DataFrame’s structure.
  • The list should typically contain nested lists or other iterable objects representing rows of data.

1. Create Pandas DataFrame from List

One simple way to create Pandas DataFrame from a list is by using the DataFrame constructor. DataFrame constructor takes several optional parameters that are used to specify the characteristics of the DataFrame.

First, let’s create a list with some values, pass this list object to the DataFrame constructor as data argument. Note that you don’t have to explicitly specify the data argument while creating.


import pandas as pd
technologies =  ['Spark','PySpark','Java','PHP']

# Create DataFrame from list
df=pd.DataFrame(technologies)
print(df)

Yields below output.

pandas create dataframe list

2. Create Pandas DataFrame from Multiple Lists

Now let’s see how to create a Pandas DataFrame from multiple lists, since we are not giving labels to columns and rows(index), DataFrame by default assigns incremental sequence numbers as labels to both rows and columns.


# Create DataFrame from multiple lists
technologies =  ['Spark','PySpark','Java','PHP']
fee = [20000,20000,15000,10000]
duration = ['35days','35days','40days','30days']
df = pd.DataFrame(list(zip(technologies,fee,duration)))
print(df)

Yields below output.

pandas dataframe from list

Column names with sequence numbers don’t make sense as it’s hard to identify what data holds on each column hence, it is always best practice to provide column names that identify the data it holds. Use column param and index param to provide column & row labels respectively to the DataFrame.

Alternatively, you can also add column names to DataFrame and set the index using pandas.DataFrame.set_index() method.


# Create from multiple lists
columns=['Courses','Fee','Duration']
index=['r0','r1','r2','r3']
df = pd.DataFrame(list(zip(technologies, fee,duration)),
                 columns=columns,index=index )
print(df)

Yields below output.

create dataframe multiple lists

3. Create DataFrame from List of List

When you have records in multiple lists, ideally each row representing as a list, you can create these all lists into a multi-dimensional list and create a DataFrame from it as shown in the below example.


# Creating from multi list (list of list)
courses = [['Spark','20000', '35days'],['Pyspark','20000','35days'],
           ['Java','15000','40days'],['PHP','10000','30days']]
df = pd.DataFrame(courses,columns=columns,index=index )
print(df)

This results in the same output as above.

4. Create from Dict of List

The below example demonstrates how to create it from the dictionary object that contains lists as values.


# Creating from dict of list
courses = {'Courses':['Spark','PySpark','Java','PHP'],
           'Fee':[20000,20000,15000,10000],
           'Duration':['35days','35days','40days','30days']}
df = pd.DataFrame(courses,index=index )
print(df)

Yields the same output as above.

5. Complete Example of Create DataFrame from List

Below is complete examples of how to create DataFrame from the list, multiple lists, two-dimensional e.t.c


import pandas as pd

# Create DataFrame from list
technologies =  ['Spark','PySpark','Java','PHP']
df=pd.DataFrame(technologies)
print(df)

# Create DataFrame from multiple lists
technologies =  ['Spark','PySpark','Java','PHP']
fee = [20000,20000,15000,10000]
duration = ['35days','35days','40days','30days']
df = pd.DataFrame(list(zip(technologies, fee,duration)))
print(df)

# Add column names and index labels
columns=['Courses','Fee','Duration']
index=['r0','r1','r2','r3']
df = pd.DataFrame(list(zip(technologies, fee,duration)),
                 columns=columns,index=index )
print(df)

# Creating from multi list (list of list)
courses = [['Spark','20000', '35days'],['Pyspark','20000','35days'],
           ['Java','15000','40days'],['PHP','10000','30days']]
df = pd.DataFrame(courses,columns=columns,index=index )
print(df)

Frequently Asked Questions on Pandas Create DataFrame From List

How should the list be structured to create a DataFrame using Pandas?

The list should typically contain nested lists or other iterable objects representing rows of data. These nested lists align with the rows and columns of the desired DataFrame.

Can you customize the column names when creating a DataFrame from a list?

You can provide additional parameters such as column names using the columns parameter to customize the DataFrame’s structure.

What should one ensure when constructing DataFrames from lists to avoid errors?

It’s essential to ensure the consistency of data dimensions and alignment. The length and structure of nested lists should align with the desired DataFrame structure to avoid errors during creation.

Are there any alternative methods for creating DataFrames from lists in Pandas?

Besides using the DataFrame() constructor, you can also use functions like from_records() or from_dict() to create DataFrames from lists or dictionaries, respectively.

Conclusion

In this article, you have learned to create a Pandas DataFrame from the list, multiple lists, and two-dimensional lists by using the constructor. Also learned how to add columns and indexes while creating a DataFrame.

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply