Pandas Create DataFrame From Dict (Dictionary)

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:January 17, 2022

Python dict (dictionary) which is a key-value pair can be used to create a pandas DataFrame, In real-time, mostly we create a pandas DataFrame by reading a CSV file or from other sources however some times you may need to create it from a dict (dictionary) object.

Python pandas is widely used for data science/data analysis and machine learning applications. It is built on top of another popular package named Numpy, which provides scientific computing in Python. pandas DataFrame is a 2-dimensional labeled data structure with rows and columns (columns of potentially different types like integers, strings, float, None, Python objects e.t.c). You can think of it as an excel spreadsheet or SQL table.

In my last article, I have explained how easy to create a DataFrame from a list object, similarly, I will explain how easy to create pandas DataFrame from different types of dict (dictionary) objects.

Table of contents

1. Create pandas DataFrame from Dict (Dictionary)

By using the pandas DataFrame constructor you can create a DataFrame from dict (dictionary) object. From dict key-value pair, key represented as column name and values is used for column values in DataFrame.


# Dict object
courses = {'Courses':['Spark','PySpark','Java','PHP'],
           'Fee':[20000,20000,15000,10000],
           'Duration':['35days','35days','40days','30days']}

# Create DataFrame from dict
df = pd.DataFrame.from_dict(courses)
print(df)

Yields below output.

pandas create DataFrame from dict

You can set custom index to DataFrame.


index=['r0','r1','r2','r3']
# Create DataFrame with index
df = pd.DataFrame.from_dict(courses,index=index)

# set index to existing DataFrame
df.set_index(index, inplace=True)

Yields below output.

pandas create dataframe from dictionary

3. Create from Dict with Selected Columns

In case you wanted to use only selected columns from the dict to create DataFrame, use columns param and specify the names as a list.


# Create for selected columns
df = pd.DataFrame(courses, columns = ['Courses', 'Fee'])
print(df)

This creates a DataFrame with Courses and Fee columns

4. From Nested Dict Object

Finally, we can also create it from a nested JSON dictionary. This creates a DataFrame with keys as columns and values as indices. As you know this is not right. Now we need to transpose() this by converting rows into columns and columns into rows.


# Creating from nested dictionary
courses = {'r0':{'Courses':'Spark','Fee':'20000','Duration':'35days'},
           'r1':{'Courses':'PySpark','Fee':'20000','Duration':'35days'},
           'r2':{'Courses':'Java','Fee':'15000','Duration':'40days'},
           'r3':{'Courses':'PHP','Fee':'10000','Duration':'30days'}}

df=pd.DataFrame(courses).transpose()
print(df)

5. Create using pandas.DataFrame.from_dict()

pandas.DataFrame.from_dict() can be used to create a pandas DataFrame from Dict (Dictionary) object. This method takes parameters dataorientdtypecolumns and returns a DataFrame. Note that this is a class method which means you can access it from DataFrame class without creating its object.


# Syntax of from_dict()
DataFrame.from_dict(data, orient='columns', dtype=None, columns=None)

Now pass the dict object to from_dict() method to create. By default it uses orient=columns.


# Create DataFrame from dict using from_dict()
df = pd.DataFrame.from_dict(courses)

# set index to existing DataFrame
df.set_index(index, inplace=True)
print(df)

Yields same output as above.

6. Create DataFrame from Dict by using Values a Rows

In case you have a dict with the list of values and each list you wanted as a row in DataFrame, use orient=index.

Note that when using the ‘index’ orientation, the column names need to be specified manually in order to have the right column names. Not specifying column names, it creates default names as 0, 1, 2 e.t.c


# Dict object
courses = {'r0':['Spark',20000,'35days'],
           'r1':['PySpark',20000,'35days'],
           'r2':['Java',15000,'40days'],
           'r3':['PHP',10000,'30days'],}
columns=['Courses','Fee','Duration']

#Create from from_dict() using orient=index
df = pd.DataFrame.from_dict(courses, orient='index', columns=columns)
print(df)

7. Complete Example of pands create DataFrame from Dict

Below is complete examples of how to create DataFrame from the dictionary.


import pandas as pd

# Dict object
courses = {'Courses':['Spark','PySpark','Java','PHP'],
           'Fee':[20000,20000,15000,10000],
           'Duration':['35days','35days','40days','30days']}

# Create DataFrame from dict
df = pd.DataFrame.from_dict(courses)
print(df)

# Create for selected columns
df = pd.DataFrame(courses, columns = ['Courses', 'Fee'])
print(df)

# Create from from_dict()
df = pd.DataFrame.from_dict(courses)
print(df)

# Dict object
courses = {'r0':['Spark',20000,'35days'],
           'r1':['PySpark',20000,'35days'],
           'r2':['Java',15000,'40days'],
           'r3':['PHP',10000,'30days'],}
columns=['Courses','Fee','Duration']

# Create from from_dict() using orient=index
df = pd.DataFrame.from_dict(courses, orient='index', columns=columns)
print(df)

# Creating from nested dictionary
courses = {'r0':{'Courses':'Spark','Fee':'20000','Duration':'35days'},
           'r1':{'Courses':'PySpark','Fee':'20000','Duration':'35days'},
           'r2':{'Courses':'Java','Fee':'15000','Duration':'40days'},
           'r3':{'Courses':'PHP','Fee':'10000','Duration':'30days'}}

df=pd.DataFrame(courses).transpose()
print(df)

Conclusion

In this article, you have learned to create a DataFrame from the dict by using the DataFames constructor and from_dict() method. Also learned how to add columns and indexes while creating a DataFrame and to this existing one.

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas Create DataFrame From Dict (Dictionary)