Pandas Get Column Names as List From DataFrame

You can get the Pandas DataFrame Column Names (all header labels) as a list using DataFrame.columns.values.tolist() method. Each column in a Pandas DataFrame has a label/name that specifies what type of value it holds/represents. Getting a list of column names is useful when you wanted to access all columns by name programmatically or manipulate the values of a specific column. In this article, I will explain different ways to get a column name as a list from DataFrame column headers with examples.

To get a list of columns from the DataFrame header use DataFrame.columns.values.tolist() method. Below is an explanation of each section of the statement.

  • .columns returns an Index object with column names. This preserves the order of column names.
  • .columns.values returns an array and this has a helper function .tolist() that returns a list of column names.

In order to explain with examples first, let’s create a sample DataFrame.

1. Create a Pandas DataFrame

Create a Pandas DataFrame with a few rows and columns before we jump into explaining how to get the list of header names, on our DataFrame, we have column names CoursesFeeDuration and Discount.


import pandas as pd
import numpy as np

technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print(df)

2. Get DataFrame Column Headers List Using list(DataFrame.columns.values) Method

You can get the column names as a list from Pandas DataFrame using list(df.columns.values). I will take a moment to explain what is happening on this statement, df.columns attribute returns an Index object which is a basic object that stores axis labels. Index object provides a property Index.values that returns data in an array, in our case it returns column names in an array.

Note that df.columns preserve the order of the columns as-is.

To convert an array of column names into a list, we can use either .toList() on array object or use list(array object).


# Get the list of all column names from headers
column_headers = list(df.columns.values)
print("The Column Header :", column_headers)

Yields below output.


The Column Header : ['Courses', 'Fee', 'Duration', 'Discount']

You can also use df.columns.values.tolist() to get the Pandas DataFrame column names as a list.


# Get the list of all column names from headers
column_headers = df.columns.values.tolist()
print("The Column Header :", column_headers)

3. Use list(df) to Get the Column Names as List in Pandas DataFrame

Use list(df) to get the list of column header from pandas DataFrame. You can also use list(df.columns) to get the list of column names.


#Using list(df) to get the column headers as a list
column_headers = list(df.columns)

#Using list(df) to get the list of all Column Names
column_headers = list(df)

4. Get List of Column Names in Sort Order Using sorted(df)

In order to get a list of column names in a sorted order use sorted(df) function. this function returns a list of column names in alphabetical order.


# Dataframe show all columns sorted list
col_headers=sorted(df)
print(col_headers)

Yields below output. Notice the difference of output from above.


['Courses', 'Discount', 'Duration', 'Fee']

5. Access All Column Names by Iterating

Sometimes you may need to iterate over all columns and apply some function, you can do this as below.


# Get all Column Header Labels as List
for column_headers in df.columns: 
    print(column_headers)

Yields below output.


Courses
Fee
Duration
Discount

6. Get Column Headers List Using the keys() Method

df.keys() is another approach to get all column names as a list from pandas DataFrame.


column_headers = df.keys().values.tolist()
print("The Column Header :", column_headers)

Yields below output.


The Column Header : Index(['Courses', 'Fee', 'Duration', 'Discount'], dtype='object')

7. Get All Numeric column Names From Pandas DataFrame

Sometimes while working on the analytics, you may need to work only on numeric columns, hence you would be required to get all columns of a specific data type. For example, getting all columns of numeric data type can get using undocumented function df._get_numeric_data().


# Get all numeric columns
numeric_columns = df._get_numeric_data().columns.values.tolist()
print(numeric_columns)

Yields below output.


['Fee', 'Discount']

Use for df.dtypes[df.dtypes!="Courses"].index: This is another simple code for finding numeric columns in a pandas DataFrame.


# Simple Pandas Numeric Columns Code
numeric_columns=df.dtypes[df.dtypes == "int64"].index.values.tolist()

Yields same output as above.

9. Complete Example

Below is a complete example of how to get a list of column header labels from Pandas DataFrame for your reference.


import pandas as pd
import numpy as np

technologies= {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas"],
    'Fee' :[22000,25000,23000,24000,26000],
    'Duration':['30days','50days','30days', None,np.nan],
    'Discount':[1000,2300,1000,1200,2500]
          }
df = pd.DataFrame(technologies)
print(df)

# Get the list of all column names from headers
column_headers = list(df.columns.values)
print("The Column Header :", column_headers)

# Get the list of all column names from headers
column_headers = df.columns.values.tolist()
print("The Column Header :", column_headers)

#Using list(df) to get the column headers as a list
column_headers = list(df.columns)

#Using list(df) to get the list of all Column Names
column_headers = list(df)

# Dataframe show all columns sorted list
col_headers=sorted(df)
print(col_headers)

# Get all Column Header Labels as List
for column_headers in df.columns: 
    print(column_headers)
    
column_headers = df.keys().values.tolist()
print("The Column Header :", column_headers)

# Get all numeric columns
numeric_columns = df._get_numeric_data().columns.values.tolist()
print(numeric_columns)

# Simple Pandas Numeric Columns Code
numeric_columns=df.dtypes[df.dtypes == "int64"].index.values.tolist()
print(numeric_columns)

Conclusion

In this article, you have learned DataFrame label names as the list can get using df.columns, list(df), df.keys, and also learned how to get all column names of type integer, finally getting column names in a sorted order e.t.c

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

Pandas Get Column Names as List From DataFrame