Create Pandas DataFrame With Examples

One simplest way to create a pandas DataFrame is by using its constructor. Besides this, there are many other ways to create a DataFrame in pandas. For example, creating DataFrame from a list, created by reading a CSV file, creating it from a Series, creating empty DataFrame, and many more.

1. Create Pandas DataFrame

One of the easiest ways to create a pandas DataFrame is by using its constructor. DataFrame constructor takes several optional params that are used to specify the characteristics of the DataFrame.

Below is the syntax of the DataFrame constructor.


# DataFrame constructor syntax
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)

Now, let’s create a DataFrame from a list of lists (with a few rows and columns).


# Create pandas DataFrame from List
import pandas as pd
technologies = [ ["Spark",20000, "30days"], 
                 ["Pandas",25000, "40days"], 
               ]
df=pd.DataFrame(technologies)
print(df)

Since we have not given index and column labels, DataFrame by default assigns incremental sequence numbers as labels to both rows and columns.


# Output:
        0      1       2
0   Spark  20000  30days
1  Pandas  25000  40days

Column names with sequence numbers don’t make sense as it’s hard to identify what data holds on each column hence, it is always best practice to provide column names that identify the data it holds. Use column param and index param to provide column & custom index respectively to the DataFrame.


# Add Column & Row Labels to the DataFrame
column_names=["Courses","Fee","Duration"]
row_label=["a","b"]
df=pd.DataFrame(technologies,columns=column_names,index=row_label)
print(df)

Yields below output. Alternatively, you can also add columns labels to the existing DataFrame.


# Output:
  Courses    Fee Duration
a   Spark  20000   30days
b  Pandas  25000   40days

By default, pandas identify the data types from the data and assign’s to the DataFrame. df.dtypes returns the data type of each column.


# Output:
Courses     object
Fee          int64
Duration    object
dtype: object

You can also assign custom data types to columns.


# Set custom types to DataFrame
types={'Courses': str,'Fee':float,'Duration':str}
df=df.astype(types)

2. Create DataFrame from the Dic (dictionary).

Another most used way to create pandas DataFrame is from the python Dict (dictionary) object. This comes in handy if you wanted to convert the dictionary object into DataFrame. Key from the Dict object becomes column and value convert into rows.


# Create DataFrame from Dict
technologies = {
    'Courses':["Spark","Pandas"],
    'Fee' :[20000,25000],
    'Duration':['30days','40days']
              }
df = pd.DataFrame(technologies)
print(df)

3. Create DataFrame with Index

By default, DataFrame add’s a numeric index starting from zero. It can be changed with a custom index while creating a DataFrame.


# Create DataFrame with Index.
technologies = {
    'Courses':["Spark","Pandas"],
    'Fee' :[20000,25000],
    'Duration':['30days','40days']
              }
index_label=["r1","r2"]
df = pd.DataFrame(technologies, index=index_label)
print(df)

4. Creating DataFrame from List of Dicts Object

Sometimes we get data in JSON string (similar dict), you can convert it to DataFrame as shown below.


# Creates DataFrame from list of dict
technologies = [{'Courses':'Spark', 'Fee': 20000, 'Duration':'30days'},
        {'Courses':'Pandas', 'Fee': 25000, 'Duration': '40days'}]

df = pd.DataFrame(technologies)
print(df)

5. Creating DataFrame From Series

By using concat() method you can create Dataframe from multiple Series. This takes several params, for the scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows.


# Create pandas Series
courses = pd.Series(["Spark","Pandas"])
fees = pd.Series([20000,25000])
duration = pd.Series(['30days','40days'])

# Create DataFrame from series objects.
df=pd.concat([courses,fees,duration],axis=1)
print(df)

# Outputs
#        0      1       2
# 0   Spark  20000  30days
# 1  Pandas  25000  40days

6. Add Column Labels

As you see above, by default concat() method doesn’t add column labels. You can do so as below.


# Assign Index to Series
index_labels=['r1','r2']
courses.index = index_labels
fees.index = index_labels
duration.index = index_labels

# Concat Series by Changing Names
df=pd.concat({'Courses': courses,
              'Course_Fee': fees,
              'Course_Duration': duration},axis=1)
print(df)

# Outputs:
# Courses  Course_Fee Course_Duration
# r1   Spark       20000          30days
# r2  Pandas       25000          40days

7. Creating DataFrame using zip() function

Multiple lists can be merged using zip() method and the output is used to create a DataFrame.


# Create Lists
Courses = ['Spark', 'Pandas']
Fee = [20000,25000]
Duration = ['30days','40days']
   
# Merge lists by using zip().
tuples_list = list(zip(Courses, Fee, Duration))
df = pd.DataFrame(tuples_list, columns = ['Courses', 'Fee', 'Duration'])

8. Create an Empty DataFrame in Pandas

Sometimes you would need to create an empty pandas DataFrame with or without columns. This would be required in many cases, below is one example.

When working with files, there are times when a file may not be available for processing. However, we may still need to manually create a DataFrame with the expected column names. Failing to use the correct column names can cause operations or transformations, such as unions, to fail, as they rely on columns that may not exist.

To handle situations like these, it’s important to always create a DataFrame with the expected columns, ensuring that the column names and data types are consistent, whether the file exists or if we’re processing an empty file.


# Create Empty DataFrame
df = pd.DataFrame()
print(df)

# Outputs:
# Empty DataFrame
# Columns: []
# Index: []

To create an empty DataFrame with just column names but no data.


# Create Empty DataFraem with Column Labels
df = pd.DataFrame(columns = ["Courses","Fee","Duration"])
print(df)

# Outputs:
# Empty DataFrame
# Columns: [Courses, Fee, Duration]
# Index: []

9. Create DataFrame From CSV File

In real-time we are often required to read the contents of CSV files and create a DataFrame. In pandas, creating a DataFrame from CSV is done by using pandas.read_csv() method. This returns a DataFrame with the contents of a CSV file.


# Create DataFrame from CSV file
df = pd.read_csv('data_file.csv')

10. Create From Another DataFrame

Finally, you can also copy a DataFrame from another DataFrame using copy() method.


# Copy DataFrame to another
df2=df.copy()
print(df2)

FAQ on Create Pandas DataFrame

What is a Pandas DataFrame?

A Pandas DataFrame is a 2-dimensional labeled data structure, similar to a table in a database or an Excel spreadsheet. It contains rows and columns, and each column can have different data types (e.g., integers, strings, floats, etc.).

How can I create a Pandas DataFrame from a dictionary?

You can create a DataFrame from a dictionary where keys represent column names and values are lists or arrays representing the column data.

How do I create a Pandas DataFrame from a list of lists?

To create a Pandas DataFrame from a list of lists, you can pass the list to the pd.DataFrame() constructor. Each inner list will represent a row in the DataFrame, and you can optionally provide column names using the columns parameter.

How can I create an empty DataFrame?

To create an empty DataFrame in Pandas, you can simply call pd.DataFrame() without passing any data or index.

How do I create a DataFrame with a specific index?

To create a DataFrame with a specific index in Pandas, you can pass a list or array to the index parameter when creating the DataFrame.

How do I create a DataFrame from a JSON file?

To create a DataFrame from a JSON file in Pandas, you can use the pd.read_json() function. This function allows you to read the contents of a JSON file and convert it into a DataFrame.

Conclusion

In this article, you have learned different ways to create a pandas DataFrame with examples. It can be created from a constructor, list, dictionary, series, CSV file, and many more.

Happy Learning !!

References

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html