You can get/select a list of pandas DataFrame columns based on data type in several ways. In this article, I will explain different ways to get all the column names of the data type (for example object
) and get column names of multiple data types with examples. To select int types just use int64
, to select float type, use float64
, and to select DateTime, use datetime64[ns]
.
1. Quick Examples of Get List of DataFrame Columns Based on Data Type
If you are in a hurry, below are some quick examples of how to get a list of DataFrame columns based on the data type.
# Below are the quick examples
# Select column names of object date type
sel_cols = list(df.select_dtypes(include='object'))
# Returns DataFrame by selected column names
df2=df.select_dtypes(include='object')
# Alternate way to get column names by data type
sel_cols = [column for column, is_type in (df.dtypes=="object").items() if is_type]
# Get DataFrame Column Names of a Multiple Data Types
sel_cols = list(df.select_dtypes(include=['object', 'datetime64[ns]' ]).columns)
# Get DataFrame Column Names of a Multiple Data Types
sel_cols = [c for c in df.columns if df[c].dtype in ['object', 'datetime64[ns]']]
# By Using groupby
col = df.columns.to_series().groupby(df.dtypes).groups
col2 = {k.name: v for k, v in col.items()}
Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
, Duration
, Discount
and StartDate
.
# Create DataFrame
import pandas as pd
import numpy as np
technologies = [
("Spark", 22000,'30days',1000.0,"2021-11-21"),
("PySpark",25000,'50days',2300.0,"2020-08-21"),
("Hadoop",23000,'55days',1500.0,"2021-10-02")
]
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount', "StartDate"])
df['StartDate'] = pd.to_datetime(df['StartDate'], format='%Y-%m-%d')
print(df)
# Use Dataframe.dtypes to get data types of all columns
print(df.dtypes)
Yields below output.
# Output:
Courses Fee Duration Discount StartDate
0 Spark 22000 30days 1000.0 2021-11-21
1 PySpark 25000 50days 2300.0 2020-08-21
2 Hadoop 23000 55days 1500.0 2021-10-02
Courses object
Fee int64
Duration object
Discount float64
StartDate datetime64[ns]
dtype: object
As you see above, you can get the data types of all columns using df.dtypes
. You can also get the same using df.infer_objects().dtypes
.
2. Get DataFrame Column Names of a Selected Data Type
Using DateFrame.select_dtypes()
methods you can get the pandas DataFrame column names based on the data type.
# Select column names of object date type
sel_cols = list(df.select_dtypes(include='object'))
print(sel_cols)
# Output:
# ['Courses', 'Duration']
In case if you wanted to select the DataFrame columns based on the data type.
# Returns DataFrame with selected columns.
df2=df.select_dtypes(include='object')
print(df2)
# Output:
# Courses Duration
# 0 Spark 30days
# 1 PySpark 50days
# 2 Hadoop 55days
Alternatively, if you are using an older version, you can use it as below to get column names by data type.
# Alternate way to get column names by data type
sel_cols = [column for column, is_type in (df.dtypes=="object").items() if is_type]
3. Get DataFrame Column Names of a Multiple Data Types
You can use DateFrame.select_dtypes()
method to get the pandas DataFrame column names of multiple data types.
# Get DataFrame Column Names of a Multiple Data Types
sel_cols = list(df.select_dtypes(include=['object', 'datetime64[ns]' ]).columns)
print(sel_cols)
# Output:
# ['Courses', 'Duration', 'StartDate']
Another way to get the same output.
sel_cols = [c for c in df.columns if df[c].dtype in ['object', 'datetime64[ns]']]
print(sel_cols)
4. Use DataFrame.columns.to_series() & groupby() Function
Let’s see another different approach to get column names of a data type.
# By Using groupby
col = df.columns.to_series().groupby(df.dtypes).groups
print(col)
# Outputs:
{int64: ['Fee'], float64: ['Discount'], datetime64[ns]: ['StartDate'], object: ['Courses', 'Duration']}
To get column names by grouping data types.
# Get all columns for each data type.
col2 = {k.name: v for k, v in col.items()}
print(col2)
# Output:
# {'int64': Index(['Fee'], dtype='object'), 'float64':
# Index(['Discount'], dtype='object'), 'datetime64[ns]':
# Index(['StartDate'], dtype='object'), 'object': Index(['Courses',
# 'Duration'], dtype='object')}
5. Use DataFrame.dtypes & DataFrame.loc[] Method
You can use boolean mask on the dtypes
attribute.
# Use DataFrame.dtypes method
mask = df.dtypes == np.float64
print(mask)
# Output:
# Courses False
# Fee False
# Duration False
# Discount True
# dtype: bool
You can use df.loc[:,mask]
to look at just those columns with the desired dtype
.
# Use DataFrame.loc[] Method
mask = df.dtypes == np.float64
df2 =df.loc[:, mask]
print(df2)
# Output:
# Discount
# 0 1000.0
# 1 2300.0
# 2 1500.0
Now you can use Numpy.round()
(or whatever) and assign it back.
# Use Numpy.round() Method
mask = df.dtypes == np.float64
df2 = np.round(df.loc[:, mask], 2)
print(df2)
# Output:
# Discount
# 0 1000.0
# 1 2300.0
# 2 1500.0
# Use DataFrame.loc[] & Numpy.round() method
mask = df.dtypes == np.float64
df.loc[:, mask] = np.round(df.loc[:, mask], 2)
print(df)
# Output:
# Courses Fee Duration Discount
# 0 Spark 22000 30days 1000.0
# 1 PySpark 25000 50days 2300.0
# 2 Hadoop 23000 55days 1500.0
6. Use DataFrame.dtypes to Get Data Types of All Columns
You want to know data types of all the columns at once, you can use plural of dtype
as dtypes
. For E.x: df.dtypes
.
# Use Dataframe.dtypes to get data types of all columns
df2 = df.dtypes
print(df2)
# Use DataFrame.infer_objects().dtypes method
df2 = df.infer_objects().dtypes
print(df2)
Yields below output.
# Output:
Courses object
Fee int64
Duration object
Discount float64
dtype: object
You can use dtypes
will give you desired column’s data type. Use DataFrame.dtypes
to get data type of single column.
# Get data type of single column
df2 = df.dtypes['Discount']
print(df2)
# Output:
# float64
# Use DataFrame.dtypes to get single column
df2 = df['Discount'].dtype
print(df2)
# Output:
# float64
7. Complete Example For Get List of DataFrame Columns Based on Data Type
import pandas as pd
import numpy as np
technologies = [
("Spark", 22000,'30days',1000.0),
("PySpark",25000,'50days',2300.0),
("Hadoop",23000,'55days',1500.0)
]
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount'])
print(df)
# Use Dataframe.dtypes to get data types of all columns
df2 = df.dtypes
print(df2)
# Use DataFrame.infer_objects().dtypes method
df2 = df.infer_objects().dtypes
print(df2)
# Get data type of single column
df2 = df.dtypes['Discount']
print(df2)
# Use DataFrame.dtypes to get single column
df2 = df['Discount'].dtype
print(df2)
# Use DataFrame.columns.to_series() & groupby() function
df2 = df.columns.to_series().groupby(df.dtypes).groups
print(df2)
# Get all 'object' dtype columns
df2 = df.select_dtypes(include='object').columns
print(df2)
# Get list columns Using DataFrame.select_dtypes()
df2 = list(df.select_dtypes(include='object').columns)
print(df2)
# Use DataFrame.dtypes method
mask = df.dtypes == np.float64
print(mask)
# Use DataFrame.loc[] Method
mask = df.dtypes == np.float64
df2 =df.loc[:, mask]
print(df2
# Use Numpy.round() Method
mask = df.dtypes == np.float64
df2 = np.round(df.loc[:, mask], 2)
print(df2)
# Use DataFrame.loc[] & Numpy.round() method
mask = df.dtypes == np.float64
df.loc[:, mask] = np.round(df.loc[:, mask], 2)
print(df)
Conclusion
In this article, you have learned how to get a list of pandas DataFrame columns based on data type using DataFrame.dtypes
, DataFrame.columns.to_series()
, DataFrame.groupby()
, DataFrame.loc[]
and DataFrame.select_dtypes()
methods with more examples.
Happy Learning !!
Related Articles
- Change the Order of Pandas DataFrame Columns
- How to Change Position of a Column in Pandas
- Pandas Shuffle DataFrame Rows Examples
- Convert String to Float in Pandas DataFrame
- Convert Float to Integer in Pandas DataFrame
- Count NaN Values in Pandas DataFrame
- Get Unique Rows in Pandas DataFrame
- Apply Multiple Filters to Pandas DataFrame or Series
- Append Pandas DataFrames Using for Loop
References
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.select_dtypes.html