Pandas – Different Ways to Iterate Over Rows in DataFrame

Like any other data structure, Pandas DataFrame also has a way to iterate (loop through) over rows and access columns/elements of each row. Pandas DataFrame provides methods iterrows(), itertuples() to iterate over each Row. In this article, I will explain different ways to iterate over rows (loop through row by row) in Pandas DataFrame with examples.

Related: 10 Ways to Select Pandas Rows based on DataFrame Column Values

1. Using Pandas DataFrame.iterrows() to Iterate Over Rows

DataFrame.iterrows() is used to iterate over DataFrame rows. This returns (index, Series) where the index is an index of the Row and Series is data or content of each row. To get the data from the series, you should use the column name like row["Fee"]. To learn more about the Series access How to use Series with Examples.

First, let’s create a Pandas DataFrame.


import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30day', '40days' ,'35days', '40days', '60days', '50days', '55days']
              })
df = pd.DataFrame(technologies)
print(df)

Yields below result. As you see the DataFrame has 3 columns Courses, Fee and Duration.


   Courses    Fee Duration
0    Spark  20000    30day
1  PySpark  25000   40days
2   Hadoop  26000   35days
3   Python  22000   40days`
4   pandas  24000   60days
5   Oracle  21000   50days
6     Java  22000   55days

The below example Iterates all rows in a Pandas DataFrame using iterrows().


# Iterate all rows using DataFrame.iterrows()
for index, row in df.iterrows():
    print (index,row["Fee"], row["Courses"])

Yields below output.


0 20000 Spark
1 25000 PySpark
2 26000 Hadoop
3 22000 Python
4 24000 Pandas
5 21000 Oracle
6 22000 Java

Let’s see what a row looks like by printing it.


# Row contains the column name and data
row = next(df.iterrows())[1]
print("Data For First Row :")
print(row)

Yields below output.


Data For First Row :
Courses     Spark
Fee         20000
Duration    30day
Name: 0, dtype: object

Note that Series returned from iterrows() doesn’t contain the datatype (dtype), in order to access the data type you should use row["Fee"].dttype. If you want data type for each row you should use DataFrame.itertuples().

Note: Pandas document states that “You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.”

2. Using DataFrame.itertuples() to Iterate Over Rows

Pandas DataFrame.itertuples() is the most used method to iterate over rows as it returns all DataFrame elements as an iterator that contains a tuple for each row. itertuples() is faster compared with iterrows() and preserves data type.

Below is the syntax of the itertuples().


#Syntax DataFrame.itertuples()
DataFrame.itertuples(index=True, name='Pandas')
  • index – Defaults to ‘True’. Returns the DataFrame Index as a first element in a tuple. Setting it to False, doens’t return Index.
  • name – Defaults to ‘Pandas’. You can provide a custom name to your returned tuple.

The below example loop through all elements in a tuple and get the value of each column by using getattr().


# Iterate all rows using DataFrame.itertuples()
for row in df.itertuples(index = True):
    print (getattr(row,'Index'),getattr(row, "Fee"), getattr(row, "Courses"))

Yields below output.


0 20000 Spark
1 25000 PySpark
2 26000 Hadoop
3 22000 Python
4 24000 Pandas
5 21000 Oracle
6 22000 Java

Let’s provide the custom name to the tuple.


# Display one row from iterator
row = next(df.itertuples(index = True,name='Tution'))
print(row)

Yields below output.


Tution(Index=0, Courses='Spark', Fee=20000, Duration='30day')

If you set the index parameter to False, it removes the index as the first element of the tuple.

4. DataFrame.apply() to Iterate

You can also use apply() method of the DataFrame to loop through the rows by using the lambda function. For more details, refer to DataFrame.apply().


#Syntax of DataFrame.apply()
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwargs)

Example:


# Another alternate approach by using DataFrame.apply()
print(df.apply(lambda row: str(row["Fee"]) + " " + str(row["Courses"]), axis = 1))

Yields below output.


0      20000 Spark
1    25000 PySpark
2     26000 Hadoop
3     22000 Python
4     24000 Pandas
5     21000 Oracle
6       22000 Java
dtype: object

5. Iterating DataFrame using For & DataFrame.index

You can also loop through rows by using for loop. df['Fee'][0] returns the first-row value from column Fee.


# Using DataFrame.index
for idx in df.index:
     print(df['Fee'][idx], df['Courses'][idx])

Yields below output.


20000 Spark
25000 PySpark
26000 Hadoop
22000 Python
24000 Pandas
21000 Oracle
22000 Java

6. Iterating DataFrame using For & DataFrame.loc


# Another alternate approach byusing DataFrame.loc()
for i in range(len(df)) :
  print(df.loc[i, "Fee"], df.loc[i, "Courses"])

Yields same output as above.

7. Iterating DataFrame using For & DataFrame.iloc


# Another alternate approach by using DataFrame.iloc()
for i in range(len(df)) :
  print(df.iloc[i, 0], df.iloc[i, 2])

Yields below output.


Spark 30day
PySpark 40days
Hadoop 35days
Python 40days
Pandas 60days
Oracle 50days
Java 55days

8. Using DataFrame.items() to Iterate Over Columns in Pandas

DataFrame.items() are used to iterate column by column of Pandas DataFrame. This returns a tuple (column name, Series) with the column name and the content as Series for each column.

The first value in the returned tuple contains the column label name and the second contains the content/data of DataFrame as a series.


# Iterate over column by column using DataFrame.items()
for label, content in df.items():
    print(f'label: {label}')
    print(f'content: {content}', sep='\n')

Yields below output.


label: Courses
content: 0      Spark
1    PySpark
2     Hadoop
3     Python
4     Pandas
5     Oracle
6       Java
Name: Courses, dtype: object
label: Fee
content: 0    20000
1    25000
2    26000
3    22000
4    24000
5    21000
6    22000
Name: Fee, dtype: int64
label: Duration
content: 0     30day
1    40days
2    35days
3    40days
4    60days
5    50days
6    55days
Name: Duration, dtype: object

9. Performance of Iterating DataFrame in Pandas

Iterating a Pandas DataFrame is not advised or recommended to use as the performance would be very bad when iterating over the large dataset. Make sure you use this only when you exhausted all other options. Before using examples mentioned in this article, check if you can use any of these 1) Vectorization, 2) Cython routines, 3) List Comprehensions (vanilla for loop).

Padas Iterate Rows Performance
Padas Iterate Rows Performance

10. Complete Example of Iterate over Rows in Pandas DataFrame


import pandas as pd
Technologys = ({
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30day', '40days' ,'35days', '40days', '60days', '50days', '55days']
              })
df = pd.DataFrame(Technologys)
print(df)

#Using DataFrame.iterrows()
row = next(df.iterrows())[1]
print("Data For First Row :")
print(row)

for index, row in df.iterrows():
    print (index,row["Fee"], row["Courses"])

#Using DataFrame.itertuples()
row = next(df.itertuples(index = True, name='Tution'))
print("Data For First Row :")
print(row)

for row in df.itertuples(index = True):
    print (getattr(row,'Index'),getattr(row, "Fee"), getattr(row, "Courses"))



# Another alternate approach by using DataFrame.apply
print(df.apply(lambda row: str(row["Fee"]) + " " + str(row["Courses"]), axis = 1))

#Using DataFrame.index
for idx in df.index:
     print(df['Fee'][idx], df['Courses'][idx])
     
# Another alternate approach by using DataFrame.loc
for i in range(len(df)) :
  print(df.loc[i, "Fee"], df.loc[i, "Courses"])

# Another alternate approach by using DataFrame.iloc  
for i in range(len(df)) :
  print(df.iloc[i, 0], df.iloc[i, 2])

#Using DataFrame.items
for label, content in df.items():
    print(f'label: {label}')
    print(f'content: {content}', sep='\n')

Conclusion

Pandas DataFrame provides several methods to iterate over rows (loop over row by row) and access columns/cells. But it is not recommended to manually loop over the rows as it degrades the performance of the application when used on large datasets. Each example explained in this article behaves differently so depending on your use-case use the one that suits your need.

Happy Learning !!

Yoy May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

Pandas – Different Ways to Iterate Over Rows in DataFrame