pandas Create DataFrame From List

Most of the time we create a pandas DataFrame by reading a CSV file or from other sources however some times you may need to create it from a list, multiple lists, or even a list of lists. In this article, I will cover creating a DataFrame from all these different ways with examples.

Table of contents

  1. Create DataFrame from list
  2. Create from multiple lists
  3. Create from list of lists
  4. Create from Dict of lists
  5. Complete Example

1. Create pandas DataFrame from List

One simple way to create pandas from a list is by using the DataFrame constructor. DataFrame constructor takes several optional params that are used to specify the characteristics of the DataFrame.

First, let’s create a list with some values, pass this list object to the DataFrame constructor as data argument. Note that you don’t have to explicitly specify the data argument while creating.


import pandas as pd
technologies =  ['Spark','PySpark','Java','PHP']

# Create DataFrame from list
df=pd.DataFrame(technologies)
print(df)

Yields below output.

pandas create dataframe list

2. Create pandas DataFrame from Multiple Lists

Now let’s see how to create a pandas DataFrame from multiple lists, since we are not giving labels to columns and rows(index), DataFrame by default assigns incremental sequence numbers as labels to both rows and columns.


# Create DataFrame from multiple lists
technologies =  ['Spark','PySpark','Java','PHP']
fee = [20000,20000,15000,10000]
duration = ['35days','35days','40days','30days']
df = pd.DataFrame(list(zip(technologies,fee,duration)))
print(df)

Yields below output.

pandas create dataframe list

Column names with sequence numbers don’t make sense as it’s hard to identify what data holds on each column hence, it is always best practice to provide column names that identify the data it holds. Use column param and index param to provide column & row labels respectively to the DataFrame.

Alternatively, you can also add column names to DataFrame and set the index using pandas.DataFrame.set_index() method.


# Create from multiple lists
columns=['Courses','Fee','Duration']
index=['r0','r1','r2','r3']
df = pd.DataFrame(list(zip(technologies, fee,duration)),
                 columns=columns,index=index )
print(df)

Yields below output.

create dataframe multiple lists

3. Create DataFrame from List of List

When you have records in multiple lists, ideally each row representing as a list, you can create these all lists into a multi-dimensional list and create a DataFrame from it as shown in the below example.


# Creating from multi list (list of list)
courses = [['Spark','20000', '35days'],['Pyspark','20000','35days'],
           ['Java','15000','40days'],['PHP','10000','30days']]
df = pd.DataFrame(courses,columns=columns,index=index )
print(df)

This results in the same output as above.

4. Create from Dict of List

The below example demonstrates how to create it from the dictionary object that contains lists as values.


# Creating from dict of list
courses = {'Courses':['Spark','PySpark','Java','PHP'],
           'Fee':[20000,20000,15000,10000],
           'Duration':['35days','35days','40days','30days']}
df = pd.DataFrame(courses,index=index )
print(df)

Yields same output as above.

5. Complete Example of pands create DataFrame from List

Below is complete examples of how to create pandas DataFrame from the list, multiple lists, two-dimensional e.t.c


import pandas as pd

# Create DataFrame from list
technologies =  ['Spark','PySpark','Java','PHP']
df=pd.DataFrame(technologies)
print(df)

# Create DataFrame from multiple lists
technologies =  ['Spark','PySpark','Java','PHP']
fee = [20000,20000,15000,10000]
duration = ['35days','35days','40days','30days']
df = pd.DataFrame(list(zip(technologies, fee,duration)))
print(df)

# Add column names and index labels
columns=['Courses','Fee','Duration']
index=['r0','r1','r2','r3']
df = pd.DataFrame(list(zip(technologies, fee,duration)),
                 columns=columns,index=index )
print(df)

# Creating from multi list (list of list)
courses = [['Spark','20000', '35days'],['Pyspark','20000','35days'],
           ['Java','15000','40days'],['PHP','10000','30days']]
df = pd.DataFrame(courses,columns=columns,index=index )
print(df)

Conclusion

In this article, you have learned to create a DataFrame from the list, multiple lists, and two-dimensional lists by using the constructor. Also learned how to add columns and indexes while creating a DataFrame.

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing pandas Create DataFrame From List