Sometimes you would be required to create an empty DataFrame with column names and specific types in pandas, In this article, I will explain how to do this with several examples. In my last article, I have explained Different ways to create panda DataFrame.
1. Quick Examples
If you are in a hurry, below are quick examples.
# Below are the quick examples
# Example 1: Create empty DataFrame with specific column names & types
df = pd.DataFrame({'Courses': pd.Series(dtype='str'),
'Fee': pd.Series(dtype='int'),
'Duration': pd.Series(dtype='str'),
'Discount': pd.Series(dtype='float')})
# Example 2: Using NumPy
dtypes = np.dtype(
[
("Courses", str),
("Fee", int),
("Duration", str),
("Discount", float),
('date',np.datetime64)
]
)
df = pd.DataFrame(np.empty(0, dtype=dtypes))
2. Pandas Empty DataFrame with Column Names & Types
You can assign column names and data types to an empty DataFrame in pandas at the time of creation or updating on the existing DataFrame.
Note that when you create an empty pandas DataFrame with columns, by default it creates all column types as String/object.
# Pandas Empty DataFrame with Column Names & Types
import pandas as pd
# Create empty DataFrame
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"])
print("Create an empty DataFrame:\n", df)
print("Get the type of the columns:\n", df.dtypes)
Yields below output.
To assign column types to DataFrame, use the below example where the dict key with column names and value with the type. In the below example, I have used Fee as int, and Discount as float type, and the rest are string. Note that in pandas strings are represented as an object type.
# Create empty DataFrame with specific column types
df = pd.DataFrame({'Courses': pd.Series(dtype='str'),
'Fee': pd.Series(dtype='int'),
'Duration': pd.Series(dtype='str'),
'Discount': pd.Series(dtype='float')})
print("Get specific data type of the columns:\n", df.dtypes)
Yields below output.
3. Using Numpy
If you are using numpy, use the below approach to assign an empty DataFrame with column names and types. NumPy is a Python library for scientific computing and provides a multidimensional array object. At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance.
# Using Numpy
import pandas as pd
import numpy as np
dtypes = np.dtype(
[
("Courses", str),
("Fee", int),
("Duration", str),
("Discount", float),
('date',np.datetime64)
]
)
df = pd.DataFrame(np.empty(0, dtype=dtypes))
print(df.dtypes)
This yields the same output as above.
Frequently Asked Questions of Pandas Empty DataFrame
You can use the pd.DataFrame
constructor and specify the columns
parameter with a list of column names. You can also use the dtype
parameter to set the data types for each column.
If you don’t specify data types, Pandas will infer them based on the first data you insert into the DataFrame. For example, column_names = ['Courses', 'Fee', 'Discount'] <br/>df = pd.DataFrame(columns=column_names)
You can add rows to an empty DataFrame using the append()
method. Make sure the data(which you want to append)matches the column names.
You can set the index
parameter when creating the DataFrame or use the set_index
method later. For example,column_names = ['Courses', 'Fee', 'Discount']
index_values = ['a', 'b', 'c']
df = pd.DataFrame(columns=column_names, index=index_values)
Conclusion
In summary, you have learned how to get a DataFrame with column names and specific data types. If you have not assigned the types, by default pandas assign objects to all columns in DataFrame.
Happy Learning !!
Related Articles
- Add an Empty Column to a Pandas DataFrame
- Combine Two Text Columns of Pandas DataFrame
- Get Column Names as List From Pandas DataFrame
- Shuffle Pandas DataFrame Rows Examples
- Pandas Append Rows & Columns to Empty DataFrame
- Pandas Replace Blank Values (empty) with NaN
- Pandas Check If DataFrame is Empty | Examples
- Pandas Replace NaN with Blank/Empty String