Sometimes you would be required to create an empty DataFrame with column names and specific types in pandas, In this article, I will explain how to do this with several examples. In my last article, I have explained Different ways to create pandas DataFrame.
1. Quick Examples
If you are in hurry, below are quick examples.
# Create empty DataFrame with specific column names & types
df = pd.DataFrame({'Courses': pd.Series(dtype='str'),
'Fee': pd.Series(dtype='int'),
'Duration': pd.Series(dtype='str'),
'Discount': pd.Series(dtype='float')})
# Using NumPy
dtypes = np.dtype(
[
("Courses", str),
("Fee", int),
("Duration", str),
("Discount", float),
('date',np.datetime64)
]
)
df = pd.DataFrame(np.empty(0, dtype=dtypes))
2. Pandas Empty DataFrame with Column Names & Types
You can assign column names and data types to an empty DataFrame in pandas at the time of creation or updating on the existing DataFrame.
Note that when you create an empty pandas DataFrame with columns, by default it creates all column types as String/object.
# Pandas Empty DataFrame with Column Names & Types
import pandas as pd
# Create empty DataFrame
df = pd.DataFrame(columns = ["Courses", "Fee", "Duration","Discount"])
print(df)
# Output:
# Empty DataFrame
# Columns: [Courses, Fee, Duration, Discount]
# Index: []
print(df.dtypes)
# Output:
# Courses object
# Fee object
# Duration object
# Discount object
# dtype: object
To assign column types to DataFrame, use the below example where the dict key with column names and value with the type. In the below example I have used Fee as int, and Discount as float type, and the rest are string. Note that in pandas strings are represented as an object type.
# Create empty DataFrame with specific column types
df = pd.DataFrame({'Courses': pd.Series(dtype='str'),
'Fee': pd.Series(dtype='int'),
'Duration': pd.Series(dtype='str'),
'Discount': pd.Series(dtype='float')})
print(df.dtypes)
# Output:
# Courses object
# Fee int32
# Duration object
# Discount float64
# dtype: object
3. Using Numpy
If you are using numpy, use the below approach to assign an empty DataFrame with column names and types. NumPy is a Python library for scientific computing and provides a multidimensional array object. At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance.
# Using Numpy
import pandas as pd
import numpy as np
dtypes = np.dtype(
[
("Courses", str),
("Fee", int),
("Duration", str),
("Discount", float),
('date',np.datetime64)
]
)
df = pd.DataFrame(np.empty(0, dtype=dtypes))
print(df.dtypes)
This yields the same output as above.
Conclusion
In summary, you have learned how to get a DataFrame with column names and specific data types. If you have not assigned the types, by default pandas assign objects to all columns in DataFrame.
Happy Learning !!
Related Articles
- Pandas Create Empty DataFrame
- Add an Empty Column to a Pandas DataFrame
- Combine Two Text Columns of Pandas DataFrame
- Get Column Names as List From Pandas DataFrame
- Shuffle Pandas DataFrame Rows Examples
- Pandas Append Rows & Columns to Empty DataFrame
- Pandas Replace Blank Values (empty) with NaN
- Pandas Check If DataFrame is Empty | Examples
- Pandas Replace NaN with Blank/Empty String