• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:10 mins read
You are currently viewing Pandas Empty DataFrame with Column Names & Types

Sometimes you would be required to create an empty DataFrame with column names and specific types in pandas, In this article, I will explain how to do this with several examples. In my last article, I have explained Different ways to create panda DataFrame.

1. Quick Examples

If you are in a hurry, below are quick examples.


# Below are the quick examples
# Example 1: Create empty DataFrame with specific column names & types
df = pd.DataFrame({'Courses': pd.Series(dtype='str'),
                   'Fee': pd.Series(dtype='int'),
                   'Duration': pd.Series(dtype='str'),
                   'Discount': pd.Series(dtype='float')})
# Example 2: Using NumPy
dtypes = np.dtype(
    [
        ("Courses", str),
        ("Fee", int),
        ("Duration", str),
        ("Discount", float),
        ('date',np.datetime64)
    ]
)
df = pd.DataFrame(np.empty(0, dtype=dtypes))

2. Pandas Empty DataFrame with Column Names & Types

You can assign column names and data types to an empty DataFrame in pandas at the time of creation or updating on the existing DataFrame.

Note that when you create an empty pandas DataFrame with columns, by default it creates all column types as String/object.


# Pandas Empty DataFrame with Column Names & Types
import pandas as pd

# Create empty DataFrame
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"])
print("Create an empty DataFrame:\n", df) 
print("Get the type of the columns:\n", df.dtypes)

Yields below output.

pandas empty dataframe types

To assign column types to DataFrame, use the below example where the dict key with column names and value with the type. In the below example, I have used Fee as int, and Discount as float type, and the rest are string. Note that in pandas strings are represented as an object type.


# Create empty DataFrame with specific column types
df = pd.DataFrame({'Courses': pd.Series(dtype='str'),
                   'Fee': pd.Series(dtype='int'),
                   'Duration': pd.Series(dtype='str'),
                   'Discount': pd.Series(dtype='float')})
print("Get specific data type of the columns:\n", df.dtypes)

Yields below output.

pandas empty dataframe types

3. Using Numpy

If you are using numpy, use the below approach to assign an empty DataFrame with column names and types. NumPy is a Python library for scientific computing and provides a multidimensional array object. At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance.


# Using Numpy
import pandas as pd
import numpy as np
dtypes = np.dtype(
    [
        ("Courses", str),
        ("Fee", int),
        ("Duration", str),
        ("Discount", float),
        ('date',np.datetime64)
    ]
)
df = pd.DataFrame(np.empty(0, dtype=dtypes))
print(df.dtypes)

This yields the same output as above.

Frequently Asked Questions of Pandas Empty DataFrame

How can I create an empty DataFrame with specific column names and data types?

You can use the pd.DataFrame constructor and specify the columns parameter with a list of column names. You can also use the dtype parameter to set the data types for each column.

How can I create an empty DataFrame without specifying data types?

If you don’t specify data types, Pandas will infer them based on the first data you insert into the DataFrame. For example, column_names = ['Courses', 'Fee', 'Discount'] <br/>df = pd.DataFrame(columns=column_names)

How can I add rows to an empty DataFrame?

You can add rows to an empty DataFrame using the append() method. Make sure the data(which you want to append)matches the column names.

What if I want to create an empty DataFrame with a specific index?

You can set the index parameter when creating the DataFrame or use the set_index method later. For example,
column_names = ['Courses', 'Fee', 'Discount']
index_values = ['a', 'b', 'c']
df = pd.DataFrame(columns=column_names, index=index_values)

Conclusion

In summary, you have learned how to get a DataFrame with column names and specific data types. If you have not assigned the types, by default pandas assign objects to all columns in DataFrame.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply