Site icon Spark By {Examples}

Pandas Create DataFrame From Dict (Dictionary)

pandas create dataframe dict

Python dict (dictionary) which is a key-value pair can be used to create a pandas DataFrame, In real-time, mostly we create a pandas DataFrame by reading a CSV file or from other sources however some times you may need to create it from a dict (dictionary) object.

Python pandas is widely used for data science/data analysis and machine learning applications. It is built on top of another popular package named Numpy, which provides scientific computing in Python. pandas DataFrame is a 2-dimensional labeled data structure with rows and columns (columns of potentially different types like integers, strings, float, None, Python objects e.t.c). You can think of it as an excel spreadsheet or SQL table.

In my last article, I have explained how easy to create a DataFrame from a list object, similarly, I will explain how easy to create Pandas DataFrame from different types of dict (dictionary) objects.

Key Points –

Table of contents

Create pandas DataFrame from Dict (Dictionary)

By using the pandas DataFrame constructor you can create a DataFrame from dict (dictionary) object. From dict key-value pair, key represented as column name and values is used for column values in DataFrame.


import pandas as pd
# Dict object
courses = {'Courses':['Spark','PySpark','Java','PHP'],
           'Fee':[20000,20000,15000,10000],
           'Duration':['35days','35days','40days','30days']}

# Create DataFrame from dict
df = pd.DataFrame.from_dict(courses)
print(df)

Yields below output.

pandas create DataFrame from dict

You can set custom index to DataFrame.


index=['r0','r1','r2','r3']
# Create DataFrame with index
df = pd.DataFrame.from_dict(courses,index=index)

# Set index to existing DataFrame
df.set_index(index, inplace=True)

Yields below output.

pandas create dataframe from dictionary

Create from Dict with Selected Columns

In case you wanted to use only selected columns from the dict to create DataFrame, use columns param and specify the names as a list.


# Create DataFrame for selected columns
selected_columns = ['Courses', 'Fee']
df_selected = pd.DataFrame(courses, columns=selected_columns)
print(df_selected)

# Create for selected columns
df = pd.DataFrame(courses, columns = ['Courses', 'Fee'])
print(df)

In the above example, the df_selected DataFrame is created with only the ‘Courses’ and ‘Fee’ columns from the courses dictionary. Adjust the selected_columns list based on your requirements. The resulting DataFrame will contain only the specified columns.

Create DataFrame From Nested Dict Object

Finally, we can also create it from a nested JSON dictionary. This creates a DataFrame with keys as columns and values as indices. As you know this is not right. Now we need to transpose() this by converting rows into columns and columns into rows.


# Creating from nested dictionary
courses = {'r0':{'Courses':'Spark','Fee':'20000','Duration':'35days'},
           'r1':{'Courses':'PySpark','Fee':'20000','Duration':'35days'},
           'r2':{'Courses':'Java','Fee':'15000','Duration':'40days'},
           'r3':{'Courses':'PHP','Fee':'10000','Duration':'30days'}}

df=pd.DataFrame(courses).transpose()
print(df)

Create using pandas.DataFrame.from_dict()

pandas.DataFrame.from_dict() can be used to create a pandas DataFrame from Dict (Dictionary) object. This method takes parameters dataorientdtypecolumns and returns a DataFrame. Note that this is a class method which means you can access it from DataFrame class without creating its object.


# Syntax of from_dict()
DataFrame.from_dict(data, orient='columns', dtype=None, columns=None)

Now pass the dict object to from_dict() method to create. By default it uses orient=columns.


# Create DataFrame from dict using from_dict()
df = pd.DataFrame.from_dict(courses)

# Set index to existing DataFrame
df.set_index(index, inplace=True)
print(df)

Yields the same output as above.

Create DataFrame from Dict by using Values a Rows

In case you have a dict with the list of values and each list you wanted as a row in DataFrame, use orient=index.

Note that when using the ‘index’ orientation, the column names need to be specified manually in order to have the right column names. Not specifying column names, it creates default names as 0, 1, 2 e.t.c


# Dict object
courses = {'r0':['Spark',20000,'35days'],
           'r1':['PySpark',20000,'35days'],
           'r2':['Java',15000,'40days'],
           'r3':['PHP',10000,'30days'],}
columns=['Courses','Fee','Duration']

# Create from from_dict() using orient=index
df = pd.DataFrame.from_dict(courses, orient='index', columns=columns)
print(df)

Complete Example of Pandas Create DataFrame from Dict

Below are complete examples of how to create DataFrame from the dictionary.


import pandas as pd

# Dict object
courses = {'Courses':['Spark','PySpark','Java','PHP'],
           'Fee':[20000,20000,15000,10000],
           'Duration':['35days','35days','40days','30days']}

# Create DataFrame from dict
df = pd.DataFrame.from_dict(courses)
print(df)

# Create for selected columns
df = pd.DataFrame(courses, columns = ['Courses', 'Fee'])
print(df)

# Create from from_dict()
df = pd.DataFrame.from_dict(courses)
print(df)

# Dict object
courses = {'r0':['Spark',20000,'35days'],
           'r1':['PySpark',20000,'35days'],
           'r2':['Java',15000,'40days'],
           'r3':['PHP',10000,'30days'],}
columns=['Courses','Fee','Duration']

# Create from from_dict() using orient=index
df = pd.DataFrame.from_dict(courses, orient='index', columns=columns)
print(df)

# Creating from nested dictionary
courses = {'r0':{'Courses':'Spark','Fee':'20000','Duration':'35days'},
           'r1':{'Courses':'PySpark','Fee':'20000','Duration':'35days'},
           'r2':{'Courses':'Java','Fee':'15000','Duration':'40days'},
           'r3':{'Courses':'PHP','Fee':'10000','Duration':'30days'}}

df=pd.DataFrame(courses).transpose()
print(df)

Frequently Asked Questions on Create DataFrame From Dict (Dictionary)

How do I create a DataFrame from a dictionary in Pandas?

You can create a DataFrame from a dictionary in Pandas using the pd.DataFrame() constructor. For example, the keys of the dictionary become the column names of the DataFrame, and the values become the data in the columns.

What should the structure of the dictionary be for creating a DataFrame?

Keys of the dictionary become column names, and values can be lists, arrays, or other iterables representing column data.

Can I create a DataFrame with selected columns from the dictionary?

You can create a DataFrame with selected columns from a dictionary in Pandas. When using the pd.DataFrame() constructor, you can specify the columns parameter to include only the columns you are interested in.

An I create an empty DataFrame and add data later?

You can create an empty DataFrame in Pandas and add data to it later. To create an empty DataFrame, you can use the pd.DataFrame() constructor without providing any data.

How can I handle missing values in the dictionary?

Handling missing values in a dictionary before creating a DataFrame in Pandas is crucial to prevent potential issues. Here are a few approaches to handle missing values.

How does Pandas handle data types when creating a DataFrame from a dictionary?

Pandas infers data types for each column based on the data provided. You can also explicitly specify data types using the dtype parameter.

Conclusion

In this article, you have learned to create a DataFrame from the dict by using the DataFames constructor and from_dict() method. Also learned how to add columns and indexes while creating a DataFrame and to this existing one.

References

Exit mobile version