Pandas Iterate Over Columns of DataFrame

Like any other data structure, Pandas DataFrame also has a way to iterate (loop through) over columns and access elements of each column. You can use the for loop to iterate over columns of a DataFrame.

You can use multiple methods to iterate over a pandas DataFrame like iteritems(), getitem([]), transpose().iterrows(), enumerate() and NumPy.asarray() function. In this article, I will explain the usage of these methods with examples.

1. Quick Examples of Iterate Over Columns in Pandas DataFrame

If you are in a hurry, below are some quick examples of how to iterate over columns of pandas DataFrame.


# Below are quick example

# Use getitem ([]) to iterate over columns
for column in df:
    print(df[column])
    
# Use getitem ([]) to iterate over columns in pandas DataFrame
for column in df:
    print(df[column].values)
    
# Iterate over columns using DataFrame.iteritems()
for (colname,colval) in df.iteritems():
    print(colname, colval.values)
    
# use iteritems()
for name, values in df.iteritems():
   print('{name}: {value}'.format(name=name, value=values[1])) 
   
# iterate over columns in pandas DataFrame using enumerate()
for (index, colname) in enumerate(df):
    print(index, df[colname].values)
    
# using enumerate()
for (index, column) in enumerate(df):
    print (index, df[column])
    
# using enumerate() & Numpy.asarray()
for (index, column) in enumerate(df):
    print (index, np.asarray(df[column]))
    
# Use DataFrame.columns()
for column in df.columns[1:]:
    print(df[column])
    
# Iterate over all the columns in reversed order    
for column in df.columns[::-1]:
    print(df[column])
    
# Get the indices of all columns
for indix, column in enumerate(df.columns):
    print(indix, column)
    
# Use DataFrame.transpose().iterrows()
for (column_name, column) in df.transpose().iterrows():
    print (column_name)

Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
technologies = [
            ("Spark", 22000,'30days',1000.0),
            ("PySpark",25000,'50days',2300.0),
            ("Hadoop",23000,'55days',1500.0)
            ]
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount'])
print(df)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  22000   30days    1000.0
1  PySpark  25000   50days    2300.0
2   Hadoop  23000   55days    1500.0

2. Iterate Over DataFrame Columns

One simple way to iterate over columns of pandas DataFrame is by using for loop. You can use column-labels to run the for loop over the pandas DataFrame using the get item syntax([]).


# Use getitem ([]) to iterate over columns
for column in df:
    print(df[column])

Yields below output.


0      Spark
1    PySpark
2     Hadoop
Name: Courses, dtype: object
0    22000
1    25000
2    23000
Name: Fee, dtype: int64
0    30days
1    50days
2    55days
Name: Duration, dtype: object
0    1000.0
1    2300.0
2    1500.0
Name: Discount, dtype: float64

The values() function is used to extract the object elements as a list.


# Use getitem ([]) to iterate over columns in pandas DataFrame
for column in df:
    print(df[column].values)

Yields below output.


['Spark' 'PySpark' 'Hadoop']
[22000 25000 23000]
['30days' '50days' '55days']
[1000. 2300. 1500.]

3. Iterate Over Columns Using DataFrame.iteritems()

pandas also provide methods that can be used to iterate over DataFrame columns. For example, DataFrame.iteritems() function iterates over a DataFrame and returns the column name and its content as a series. For E.x, for(colname,colval) in df.iteritems():.


# Iterate over columns using DataFrame.iteritems()
for (colname,colval) in df.iteritems():
    print(colname, colval.values)

Yields below output.


Courses ['Spark' 'PySpark' 'Hadoop']
Fee [22000 25000 23000]
Duration ['30days' '50days' '55days']
Discount [1000. 2300. 1500.]

You can also use iteritems().


# use iteritems()
for name, values in df.iteritems():
   print('{name}: {value}'.format(name=name, value=values[1])) 

Yields below output.


Courses: PySpark
Fee: 25000
Duration: 50days
Discount: 2300.0

4. Iterate Over Columns in DataFrame Using enumerate()

You can also use enumerate() with DataFrame to get the idex and column names and use this in for loop to iterate over each column.


# iterate over columns in pandas DataFrame using enumerate()
for (index, colname) in enumerate(df):
    print(index, df[colname].values)

Yields below output.


0 ['Spark' 'PySpark' 'Hadoop']
1 [22000 25000 23000]
2 ['30days' '50days' '55days']
3 [1000. 2300. 1500.]

if an index corresponding to each column is also desired.


# using enumerate()
for (index, column) in enumerate(df):
    print (index, df[column])

Yields below output.


0 0      Spark
1    PySpark
2     Hadoop
Name: Courses, dtype: object
1 0    22000
1    25000
2    23000
Name: Fee, dtype: int64
2 0    30days
1    50days
2    55days
Name: Duration, dtype: object
3 0    1000.0
1    2300.0
2    1500.0
Name: Discount, dtype: float64

The above code df[column] type is Series, which can simply be converted into numpy ndarrays. For E.x, np.asarray(df[column]).


# using enumerate() & Numpy.asarray()
for (index, column) in enumerate(df):
    print (index, np.asarray(df[column]))

Yields below output.


0 ['Spark' 'PySpark' 'Hadoop']
1 [22000 25000 23000]
2 ['30days' '50days' '55days']
3 [1000. 2300. 1500.]

5. Use DataFrame.columns() to Iterate Over Selected Columns

DataFrame.columns() gives a list containing all the column names in the DF. You can use python’s list slicing to slice DataFrame.columns() according to our needs. For instance, to iterate over all columns but the first one, you can use for column in df.columns[1:]: method.


# Use DataFrame.columns()
for column in df.columns[1:]:
    print(df[column])

Similarly to iterate over all the columns in reversed order, you can use this syntax, for column in df.columns[::-1]: method.


# Iterate over all the columns in reversed order    
for column in df.columns[::-1]:
    print(df[column])

Also, remember that you can get the indices of all columns easily using for indix,column in enumerate(df.columns):.


# Get the indices of all columns
for indix, column in enumerate(df.columns):
    print(indix, column)

6. Use DataFrame.transpose().iterrows()

DataFrame.transpose() and iterate over the rows.


# Use DataFrame.transpose().iterrows()
for (column_name, column) in df.transpose().iterrows():
    print (column_name)

Yields below output.


Courses
Fee
Duration
Discount

7. Complete Example For Iterate Over Columns in DataFrame


import pandas as pd
technologies = [
            ("Spark", 22000,'30days',1000.0),
            ("PySpark",25000,'50days',2300.0),
            ("Hadoop",23000,'55days',1500.0)
            ]
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount'])
print(df)

# Use getitem ([]) to iterate over columns
for column in df:
    print(df[column])
    
# Use getitem ([]) to iterate over columns in pandas DataFrame
for column in df:
    print(df[column].values)
    
# Iterate over columns using DataFrame.iteritems()
for (colname,colval) in df.iteritems():
    print(colname, colval.values)
    
# use iteritems()
for name, values in df.iteritems():
   print('{name}: {value}'.format(name=name, value=values[1])) 
   
# iterate over columns in pandas DataFrame using enumerate()
for (index, colname) in enumerate(df):
    print(index, df[colname].values)
    
# using enumerate()
for (index, column) in enumerate(df):
    print (index, df[column])
    
# using enumerate() & Numpy.asarray()
for (index, column) in enumerate(df):
    print (index, np.asarray(df[column]))
    
# Use DataFrame.columns()
for column in df.columns[1:]:
    print(df[column])
    
# Iterate over all the columns in reversed order    
for column in df.columns[::-1]:
    print(df[column])
    
# Get the indices of all columns
for indix, column in enumerate(df.columns):
    print(indix, column)
    
# Use DataFrame.transpose().iterrows()
for (column_name, column) in df.transpose().iterrows():
    print (column_name)

Conclusion

In this article, you have learned how to iterate over columns of pandas DataFrame using DataFrame.iteritems(), get item ([]), enumerate(), DataFrame.columns(), NumPy.asarray() and DataFrame.transpose().iterrows() methods with more examples.

Happy Learning !!

You May Also Like

References

Leave a Reply

You are currently viewing Pandas Iterate Over Columns of DataFrame