Pandas Get DataFrame Columns by Data Type

You can get/select a list of pandas DataFrame columns based on data type in several ways. In this article, I will explain different ways to get all the column names of the data type (for example object) and get column names of multiple data types with examples. To select int types just use int64, to select float type, use float64, and to select DateTime, use datetime64[ns].

1. Quick Examples of Get List of DataFrame Columns Based on Data Type

If you are in a hurry, below are some quick examples of how to get a list of DataFrame columns based on the data type.


# Below are quick example

# Select column names of object date type
sel_cols = list(df.select_dtypes(include='object'))

# Returns DataFrame by selected column names
df2=df.select_dtypes(include='object')

# Alternate way to get column names by data type
sel_cols = [column for column, is_type in (df.dtypes=="object").items() if is_type]

# Get DataFrame Column Names of a Multiple Data Types
sel_cols = list(df.select_dtypes(include=['object', 'datetime64[ns]' ]).columns)

# Get DataFrame Column Names of a Multiple Data Types
sel_cols = [c for c in df.columns if df[c].dtype in ['object', 'datetime64[ns]']]

# By Using groupby
col = df.columns.to_series().groupby(df.dtypes).groups

col2 = {k.name: v for k, v in col.items()}

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses, Fee, Duration, Discount and StartDate.


import pandas as pd
import numpy as np
technologies = [
            ("Spark", 22000,'30days',1000.0,"2021-11-21"),
            ("PySpark",25000,'50days',2300.0,"2020-08-21"),
            ("Hadoop",23000,'55days',1500.0,"2021-10-02")
            ]
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount', "StartDate"])
df['StartDate'] = pd.to_datetime(df['StartDate'], format='%Y-%m-%d')
print(df)

# Use Dataframe.dtypes to get data types of all columns
print(df.dtypes)

Yields below output.


   Courses    Fee Duration  Discount  StartDate
0    Spark  22000   30days    1000.0 2021-11-21
1  PySpark  25000   50days    2300.0 2020-08-21
2   Hadoop  23000   55days    1500.0 2021-10-02

Courses              object
Fee                   int64
Duration             object
Discount            float64
StartDate    datetime64[ns]
dtype: object

As you see above, you can get the data types of all columns using df.dtypes. You can also get the same using df.infer_objects().dtypes.

2. Get DataFrame Column Names of a Selected Data Type

Using DateFrame.select_dtypes() methods you can get the pandas DataFrame column names based on the data type.


# Select column names of object date type
sel_cols = list(df.select_dtypes(include='object'))
print(sel_cols)

# Outputs
# ['Courses', 'Duration']

In case if you wanted to select the DataFrame columns based on the data type.


# Returns DataFrame with selected columns.
df2=df.select_dtypes(include='object')
print(df2)

# Outputs
#   Courses Duration
#0    Spark   30days
#1  PySpark   50days
#2   Hadoop   55days

Alternatively, if you are using an older version, you can use it as below to get column names by data type.


# Alternate way to get column names by data type
sel_cols = [column for column, is_type in (df.dtypes=="object").items() if is_type]

3. Get DataFrame Column Names of a Multiple Data Types


# Get DataFrame Column Names of a Multiple Data Types
sel_cols = list(df.select_dtypes(include=['object', 'datetime64[ns]' ]).columns)
print(sel_cols)

# Output
#['Courses', 'Duration', 'StartDate']

Another way to get the same output.


sel_cols = [c for c in df.columns if df[c].dtype in ['object', 'datetime64[ns]']]
print(sel_cols)

4. Use DataFrame.columns.to_series() & groupby() Function

Let’s see another different approach to get column names of a data type.


# By Using groupby
col = df.columns.to_series().groupby(df.dtypes).groups
print(col)

# Outputs
{int64: ['Fee'], float64: ['Discount'], datetime64[ns]: ['StartDate'], object: ['Courses', 'Duration']}

To get column names by grouping data types.


# Get all columns for each data type.
col2 = {k.name: v for k, v in col.items()}
print(col2)

# Output
{'int64': Index(['Fee'], dtype='object'), 'float64': Index(['Discount'], dtype='object'), 'datetime64[ns]': Index(['StartDate'], dtype='object'), 'object': Index(['Courses', 'Duration'], dtype='object')}

5. Use DataFrame.dtypes & DataFrame.loc[] Method

You can use boolean mask on the dtypes attribute.


# Use DataFrame.dtypes method
mask = df.dtypes == np.float64
print(mask)

# Output:
# Courses     False
# Fee         False
# Duration    False
# Discount     True
# dtype: bool

You can use df.loc[:,mask] to look at just those columns with the desired dtype.


# Use DataFrame.loc[] Method
mask = df.dtypes == np.float64
df2 =df.loc[:, mask]
print(df2)

#   Output:
#   Discount
#0    1000.0
#1    2300.0
#2    1500.0

Now you can use Numpy.round() (or whatever) and assign it back.


# Use Numpy.round() Method
mask = df.dtypes == np.float64
df2 = np.round(df.loc[:, mask], 2)
print(df2)

#   Output:
#   Discount
#0    1000.0
#1    2300.0
#2    1500.0

# Use DataFrame.loc[] & Numpy.round() method
mask = df.dtypes == np.float64
df.loc[:, mask] = np.round(df.loc[:, mask], 2)
print(df)

#   Output:
#   Courses    Fee Duration  Discount
#0    Spark  22000   30days    1000.0
#1  PySpark  25000   50days    2300.0
#2   Hadoop  23000   55days    1500.0

6. Use DataFrame.dtypes to Get Data Types of All Columns

You want to know data types of all the columns at once, you can use plural of dtype as dtypes. For E.x: df.dtypes.


# Use Dataframe.dtypes to get data types of all columns
df2 = df.dtypes
print(df2)

# Use DataFrame.infer_objects().dtypes method
df2 = df.infer_objects().dtypes
print(df2)

Yields below output.


Courses      object
Fee           int64
Duration     object
Discount    float64
dtype: object

You can use dtypes will give you desired column’s data type. Use DataFrame.dtypes to get data type of single column.


# Get data type of single column
df2 = df.dtypes['Discount']
print(df2)

# Output:
# float64

# Use DataFrame.dtypes to get single column
df2 = df['Discount'].dtype
print(df2)

# Output:
# float64

7. Complete Example For Get List of DataFrame Columns Based on Data Type


import pandas as pd
import numpy as np
technologies = [
            ("Spark", 22000,'30days',1000.0),
            ("PySpark",25000,'50days',2300.0),
            ("Hadoop",23000,'55days',1500.0)
            ]
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount'])
print(df)

# Use Dataframe.dtypes to get data types of all columns
df2 = df.dtypes
print(df2)

# Use DataFrame.infer_objects().dtypes method
df2 = df.infer_objects().dtypes
print(df2)

# Get data type of single column
df2 = df.dtypes['Discount']
print(df2)

# Use DataFrame.dtypes to get single column
df2 = df['Discount'].dtype
print(df2)

# Use DataFrame.columns.to_series() & groupby() function
df2 = df.columns.to_series().groupby(df.dtypes).groups
print(df2)

# Get all 'object' dtype columns
df2 = df.select_dtypes(include='object').columns
print(df2)

# Get list columns Using DataFrame.select_dtypes()
df2 = list(df.select_dtypes(include='object').columns)
print(df2)

# Use DataFrame.dtypes method
mask = df.dtypes == np.float64
print(mask)

# Use DataFrame.loc[] Method
mask = df.dtypes == np.float64
df2 =df.loc[:, mask]
print(df2

# Use Numpy.round() Method
mask = df.dtypes == np.float64
df2 = np.round(df.loc[:, mask], 2)
print(df2)

# Use DataFrame.loc[] & Numpy.round() method
mask = df.dtypes == np.float64
df.loc[:, mask] = np.round(df.loc[:, mask], 2)
print(df)

Conclusion

In this article, you have learned how to get a list of pandas DataFrame columns based on data type using DataFrame.dtypes, DataFrame.columns.to_series(), DataFrame.groupby(), DataFrame.loc[] and DataFrame.select_dtypes() methods with more examples.

Happy Learning !!

You May Also Like

References

Leave a Reply

You are currently viewing Pandas Get DataFrame Columns by Data Type