• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:16 mins read
You are currently viewing Pandas Create DataFrame From Dict (Dictionary)

Python dict (dictionary) which is a key-value pair can be used to create a pandas DataFrame, In real-time, mostly we create a pandas DataFrame by reading a CSV file or from other sources however some times you may need to create it from a dict (dictionary) object.

Python pandas is widely used for data science/data analysis and machine learning applications. It is built on top of another popular package named Numpy, which provides scientific computing in Python. pandas DataFrame is a 2-dimensional labeled data structure with rows and columns (columns of potentially different types like integers, strings, float, None, Python objects e.t.c). You can think of it as an excel spreadsheet or SQL table.

In my last article, I have explained how easy to create a DataFrame from a list object, similarly, I will explain how easy to create Pandas DataFrame from different types of dict (dictionary) objects.

Key Points –

  • Pandas DataFrame can be created from a Python dictionary, where keys represent column names, and values are lists or arrays containing column data.
  • The keys of the dictionary become the column names, ensuring a clear association between the dictionary’s structure and the DataFrame’s columns.
  • Values in the dictionary’s lists or arrays are aligned based on their index positions, forming columns in the DataFrame with corresponding data.
  • The dictionary-based approach allows for flexibility in specifying data types for each column, as Pandas can infer or accept explicit data type declarations for better control.
  • Creating a DataFrame from a dictionary is a concise and efficient way to initialize tabular data, especially when dealing with structured data from sources like CSV files, databases, or external APIs.

Table of contents

Create pandas DataFrame from Dict (Dictionary)

By using the pandas DataFrame constructor you can create a DataFrame from dict (dictionary) object. From dict key-value pair, key represented as column name and values is used for column values in DataFrame.


import pandas as pd
# Dict object
courses = {'Courses':['Spark','PySpark','Java','PHP'],
           'Fee':[20000,20000,15000,10000],
           'Duration':['35days','35days','40days','30days']}

# Create DataFrame from dict
df = pd.DataFrame.from_dict(courses)
print(df)

Yields below output.

pandas create DataFrame from dict

You can set custom index to DataFrame.


index=['r0','r1','r2','r3']
# Create DataFrame with index
df = pd.DataFrame.from_dict(courses,index=index)

# Set index to existing DataFrame
df.set_index(index, inplace=True)

Yields below output.

pandas create dataframe from dictionary

Create from Dict with Selected Columns

In case you wanted to use only selected columns from the dict to create DataFrame, use columns param and specify the names as a list.


# Create DataFrame for selected columns
selected_columns = ['Courses', 'Fee']
df_selected = pd.DataFrame(courses, columns=selected_columns)
print(df_selected)

# Create for selected columns
df = pd.DataFrame(courses, columns = ['Courses', 'Fee'])
print(df)

In the above example, the df_selected DataFrame is created with only the ‘Courses’ and ‘Fee’ columns from the courses dictionary. Adjust the selected_columns list based on your requirements. The resulting DataFrame will contain only the specified columns.

Create DataFrame From Nested Dict Object

Finally, we can also create it from a nested JSON dictionary. This creates a DataFrame with keys as columns and values as indices. As you know this is not right. Now we need to transpose() this by converting rows into columns and columns into rows.


# Creating from nested dictionary
courses = {'r0':{'Courses':'Spark','Fee':'20000','Duration':'35days'},
           'r1':{'Courses':'PySpark','Fee':'20000','Duration':'35days'},
           'r2':{'Courses':'Java','Fee':'15000','Duration':'40days'},
           'r3':{'Courses':'PHP','Fee':'10000','Duration':'30days'}}

df=pd.DataFrame(courses).transpose()
print(df)

Create using pandas.DataFrame.from_dict()

pandas.DataFrame.from_dict() can be used to create a pandas DataFrame from Dict (Dictionary) object. This method takes parameters dataorientdtypecolumns and returns a DataFrame. Note that this is a class method which means you can access it from DataFrame class without creating its object.


# Syntax of from_dict()
DataFrame.from_dict(data, orient='columns', dtype=None, columns=None)

Now pass the dict object to from_dict() method to create. By default it uses orient=columns.


# Create DataFrame from dict using from_dict()
df = pd.DataFrame.from_dict(courses)

# Set index to existing DataFrame
df.set_index(index, inplace=True)
print(df)

Yields the same output as above.

Create DataFrame from Dict by using Values a Rows

In case you have a dict with the list of values and each list you wanted as a row in DataFrame, use orient=index.

Note that when using the ‘index’ orientation, the column names need to be specified manually in order to have the right column names. Not specifying column names, it creates default names as 0, 1, 2 e.t.c


# Dict object
courses = {'r0':['Spark',20000,'35days'],
           'r1':['PySpark',20000,'35days'],
           'r2':['Java',15000,'40days'],
           'r3':['PHP',10000,'30days'],}
columns=['Courses','Fee','Duration']

# Create from from_dict() using orient=index
df = pd.DataFrame.from_dict(courses, orient='index', columns=columns)
print(df)

Complete Example of Pandas Create DataFrame from Dict

Below are complete examples of how to create DataFrame from the dictionary.


import pandas as pd

# Dict object
courses = {'Courses':['Spark','PySpark','Java','PHP'],
           'Fee':[20000,20000,15000,10000],
           'Duration':['35days','35days','40days','30days']}

# Create DataFrame from dict
df = pd.DataFrame.from_dict(courses)
print(df)

# Create for selected columns
df = pd.DataFrame(courses, columns = ['Courses', 'Fee'])
print(df)

# Create from from_dict()
df = pd.DataFrame.from_dict(courses)
print(df)

# Dict object
courses = {'r0':['Spark',20000,'35days'],
           'r1':['PySpark',20000,'35days'],
           'r2':['Java',15000,'40days'],
           'r3':['PHP',10000,'30days'],}
columns=['Courses','Fee','Duration']

# Create from from_dict() using orient=index
df = pd.DataFrame.from_dict(courses, orient='index', columns=columns)
print(df)

# Creating from nested dictionary
courses = {'r0':{'Courses':'Spark','Fee':'20000','Duration':'35days'},
           'r1':{'Courses':'PySpark','Fee':'20000','Duration':'35days'},
           'r2':{'Courses':'Java','Fee':'15000','Duration':'40days'},
           'r3':{'Courses':'PHP','Fee':'10000','Duration':'30days'}}

df=pd.DataFrame(courses).transpose()
print(df)

Frequently Asked Questions on Create DataFrame From Dict (Dictionary)

How do I create a DataFrame from a dictionary in Pandas?

You can create a DataFrame from a dictionary in Pandas using the pd.DataFrame() constructor. For example, the keys of the dictionary become the column names of the DataFrame, and the values become the data in the columns.

What should the structure of the dictionary be for creating a DataFrame?

Keys of the dictionary become column names, and values can be lists, arrays, or other iterables representing column data.

Can I create a DataFrame with selected columns from the dictionary?

You can create a DataFrame with selected columns from a dictionary in Pandas. When using the pd.DataFrame() constructor, you can specify the columns parameter to include only the columns you are interested in.

An I create an empty DataFrame and add data later?

You can create an empty DataFrame in Pandas and add data to it later. To create an empty DataFrame, you can use the pd.DataFrame() constructor without providing any data.

How can I handle missing values in the dictionary?

Handling missing values in a dictionary before creating a DataFrame in Pandas is crucial to prevent potential issues. Here are a few approaches to handle missing values.

How does Pandas handle data types when creating a DataFrame from a dictionary?

Pandas infers data types for each column based on the data provided. You can also explicitly specify data types using the dtype parameter.

Conclusion

In this article, you have learned to create a DataFrame from the dict by using the DataFames constructor and from_dict() method. Also learned how to add columns and indexes while creating a DataFrame and to this existing one.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium