Pandas Create Empty DataFrame

I will explain how to create an empty DataFrame in pandas with or without column names (column names) and Indices. Below I have explained one of the many scenarios where you would need to create an empty DataFrame.

While working with files, sometimes we may not receive a file for processing, however, we still need to create a DataFrame manually with the same column names we expect. If we don’t create with the same column names, our operations/transformations (like unions) on DataFrame fail as we refer to the columns that may not be present.

To handle situations similar to these, we always need to create a DataFrame with the same schema, which means the same column names and datatypes regardless of the file exists or empty file processing.

Note: DataFrame contains rows with all NaN values not considered as empty. To consider DF empty it needs to have shape (0, n). shape (n,0) is not considered empty as it has n rows.

1. Quick Examples of Creating Empty DataFrame in pandas

If you are in a hurry, below are some quick examples of how to create an empty DataFrame in pandas.


# Below are quick example

# create empty DataFrame using constucor
df = pd.DataFrame()

# Creating Empty DataFrame with Column Names
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"])

# Create DataFrame with index and columns
# Note this is not considered empty DataFrame
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"],index=['index1'])

# Add rows to empty DataFrame
df2 = df.append({"Courses":"Spark","Fee":20000,"Duration":'30days',"Discount":1000},ignore_index = True)

# Check if DataFrame empty
print("Empty DataFrame :"+ str(df.empty))

To understand in detail, follow reading the article.

2. Create Empty DataFrame Using Constructor

One simple way to create an empty pandas DataFrame is by using its constructor. The below example creates a DataFrame with zero rows and columns (empty).


# create empty DataFrame using constucor
df = pd.DataFrame()
print(df)
print("Empty DataFrame : "+str(df1.empty))

Yields below output. Notice that the columns and Index have no values.

pandas create empty dataframe

3. Creating Empty DataFrame with Column Names

The column labels also can be added while creating an empty DataFrame. In this case, DataFrame contains only columns but not rows/Indexes. To do this, will use DataFrame constructor with columns param. columns param accepts a list of column labels.


# Creating Empty DataFrame with Column Names
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"])
print(df)
print("Empty DataFrame : "+str(df.empty))

Yields below output.


Empty DataFrame
Columns: [Courses, Fee, Duration, Discount]
Index: []
Empty DataFrame : True

All columns on the above DataFrame have type object, you can change it by assigning a custom data type.


#Create empty DataFrame with specific column types
df = pd.DataFrame({'Courses': pd.Series(dtype='str'),
                   'Fee': pd.Series(dtype='int'),
                   'Duration': pd.Series(dtype='str'),
                   'Discount': pd.Series(dtype='float')})
print(df.dtypes)

Yields below output


Courses      object
Fee           int32
Duration     object
Discount    float64
dtype: object

4. Add Columns and Index While Creating DataFrame

Let’s see how to add a DataFrame with columns and rows with nan values. Note that this is not considered an empty DataFrame as it has rows with NaN, you can check this by calling df.empty attribute, which returns False. Use DataFrame.dropna() to drop all NaN values. To add index/row, will use index param, along with columns param for column labels.


#Add columns and index while creating empty DataFrame
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"],index=['index1'])
print(df)
print("Empty DataFrame : "+str(df.empty))

Yields below output. Note that, this is not an empty DataFrame as it has rows with NaN values.


       Courses  Fee Duration Discount
index1     NaN  NaN      NaN      NaN
Empty DataFrame : False

5. Check if DataFrame is Empty

DataFrame.empty property is used to check if a DataFrame is empty or not. When it is empty it returns True otherwise False. DataFrame is considered non-empty if it contains 1 or more rows. Having all rows with NaN values is still considered a non-empty DataFrame.


if df.empty:
  print("Empty DataFrame")
else
  print("Non Empty DataFrame")

6. Create Empty DataFrame From Another DataFrame

You can also create a zero record DataFrame from another existing DF. This would be done to create a blank DataFrame with the same columns as the existing but without rows.


# create empty DataFrame from another DataFrame
columns_list = df.columns
df2 = pd.DataFrame(columns = columns_list)
print(df2)

Yields below output.


Empty DataFrame
Columns: [Courses, Fee, Duration, Discount]
Index: []

7. Add Rows to Empty DataFrame

DataFrame.append() method is used to append/add rows to empty DataFrame. Use append() if you wanted to add few rows as it has a performance issue. To add hundreds or thousands of rows to a DataFrame, use a constructor with data in a list collection.


# Add rows to empty DataFrame
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"])
df2 = df.append({"Courses":"Spark","Fee":20000,"Duration":'30days',"Discount":1000},ignore_index = True)
print(df2)

Yields below output.


  Courses    Fee Duration Discount
0   Spark  20000   30days     1000

To add more rows use a constructor.


# Collect rows into list.
data = []
db_data=get_data()
for Courses, Fee, Duration, Discount in db_data:
    data.append([Courses, Fee, Duration, Discount])

# Fill DataFrame with rows.
df = pd.DataFrame(data, columns=["Courses", "Fee", "Duration","Discount"])

8. Add Rows From Another DataFrame

If you have an empty data frame and fill it with data from one or multiple DataFrame’s, you can do this as below


#creates a new empty DataFrame
df = pd.DataFrame() 
df = df.append(df2, ignore_index = True)
df = df.append(df3, ignore_index = True)

9. Complete Example of Create Empty DataFrame in pandas


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
index_labels=['r1','r2','r3','r4']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# create empty DataFrame using constucor
df2  = pd.DataFrame()
print(df2)

# Add column names/labels to empty DataFrame
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"])
print(df2)

#Add columns and index while creating empty DataFrame
index_labels=['index1']
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"],index=index_labels)
df.append({"Courses":"Spark","Fee":20000,"Duration":'30days',"Discount":1000},ignore_index = True)
print(df2)

# create empty DataFrame from another DataFrame
columns_list = df.columns
df2 = pd.DataFrame(columns = columns_list)
print(df2)

# Add rows to empty DataFrame
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"])
df2 = df.append({"Courses":"Spark","Fee":20000,"Duration":'30days',"Discount":1000},ignore_index = True)
print(df2)

Conclusion

In this article, you have learned how to create a DataFrame with zero rows, with or without columns, add rows to the DataFrame, and many more with examples.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

Pandas Create Empty DataFrame