• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:20 mins read
You are currently viewing Pandas Difference Between loc[] vs iloc[]

The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], if the position is not present it gives an index error. In this article, I will cover the differences and similarities between loc[] and iloc[] in Pandas DataFrame by exploring with examples.

Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.

pandas dataframe loc vs iloc
Difference Between pandas DataFrame loc vs iloc

Let’s see the differences and similarities between loc[] vs iloc[] by using the below topics with examples.

Key Points –

  • loc[] is primarily label-based indexing, meaning it uses row and column labels to access data, while iloc[] is integer-based indexing, using integer positions to access data.
  • loc[] is inclusive of the endpoint when slicing, whereas iloc[] is exclusive of the endpoint, similar to Python slicing convention.
  • loc[] is used to access data by label, allowing for explicit row and column selection based on their labels in the index, while iloc[] is used for integer-based indexing, where rows and columns are accessed by their integer positions.
  • loc[] allows for boolean array/mask indexing along with label-based indexing, enabling more flexible selection of data based on conditions, while iloc[] strictly relies on integer positions.
  • loc[] can lead to clearer and more readable code when working with labeled data, especially when the index is meaningful, whereas iloc[] is preferred for operations where the order of rows/columns matters more than their labels.

1. Difference Between loc[] vs iloc[] in DataFrame

The difference between loc[] vs iloc[] is described by how you select rows and columns from pandas DataFrame.

  • loc[] is used to select rows and columns by Names/Labels
  • iloc[] is used to select rows and columns by Integer Index/Position. zero based index position.

One of the main advantages of pandas DataFrame is the ease of use. You can see this yourself when you use loc[] or iloc[] attributes to select or filter DataFrame rows or columns. These are mostly used attributes in pandas DataFrame. Let’s see the usage of these before jumping into differences and similarities.

1.1 pandas.DataFrame.loc[] Usage

DataFrame.loc[] is label-based to select rows and/or columns in pandas. It accepts single labels, multiple labels from the list, indexes by a range (between two indexes labels), and many more.

pandas Difference between loc and iloc
  • START is the name of the row/column label
  • STOP is the name of the last row/column label to take, and 
  • STEP as the number of indices to advance after each extraction

Some point to note about loc[].

  • By not providing a start row/column, loc[] selects from the beginning.
  • By not providing stop, loc[] selects all rows/columns from the start label.
  • Providing both start and stop, selects all rows/columns in between

1.2 pandas.DataFrame.iloc[] usage

DataFrame.iloc[] is a index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more.

pandas dataframe loc vs iloc
  • START is the integer index of the row/column.
  • STOP is the integer index of the last row/column where you wanted to stop the selection, and 
  • STEP as the number of indices to advance after each extraction.

Some point to note about iloc[].

  • By not providing a start index, iloc[] selects from the first row/column.
  • By not providing stop, iloc[] selects all rows/columns from the start index.
  • Providing both start and stop, selects all rows/columns in between.

Let’s create a DataFrame and explore the differences of loc[] and iloc[].


# Pandas.DataFrame.iloc[] usage 
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Outputs:
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500
# r5   pandas  24000   60days      2000

2. Select Single Value Using loc[] vs iloc[]

By using loc[] and iloc[] you can select the single row and column by name and index respectively. The below example demonstrates how to select row by label and index.


# Select Single Row by Index Label
print(df.loc['r2'])

# Select Single Row by Index
print(df.iloc[1])

# Outputs:
# Courses     PySpark
# Fee           25000
# Duration     40days
# Discount       2300
# Name: r2, dtype: object

In order to select column by label and Index use below.


# Select Single Column by label
print(df.loc[:, "Courses"])

# Select Single Column by Index
print(df.iloc[:, 0])

# Outputs:
#    Courses
# r1    Spark
# r2  PySpark
# r3   Hadoop
# r4   Python
# r5   pandas

3. Select Multiple Rows/Columns using loc[] vs iloc[]

To select multiple rows and columns, use the labels or integer index as a list to loc[] and iloc[] attributes. Below is an example of how to select rows by label and index.


# Select Multiple Rows by Label
print(df.loc[['r2','r3']])

# Select Multiple Rows by Index
print(df.iloc[[1,2]])

# Outputs:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200

Similarly, to select multiple columns from pandas DataFrame.


# Select Multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select Multiple Columns by Index
print(df.iloc[:, [0,1,3]])

# Outputs:
#    Courses    Fee  Discount
# r1    Spark  20000      1000
# r2  PySpark  25000      2300
# r3   Hadoop  26000      1200
# r4   Python  22000      2500
# r5   pandas  24000      2000

4. Select Range of Values Between Two Rows or Columns

By using loc[] and iloc[], you can also select rows and columns by range. For example all items between two rows/columns. all items starting from e.t.c. The below example selects rows in between r1 and r4 row indices.


# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Select Rows Between two Indexs
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])

# Outputs:
#    Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500

To select columns between two column names. The below example selects all columns between Fee and Discount column labels.


# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Select Columns between two Indexes
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])

# Outputs:
#      Fee Duration  Discount
# r1  20000    30day      1000
# r2  25000   40days      2300
# r3  26000   35days      1200
# r4  22000   40days      2500
# r5  24000   60days      2000

5. Select Alternate Rows or Columns

Similarly, by using ranges you can also select every alternate row from DataFrame.


# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Select Alternate rows By Index
print(df.iloc[0:4:2])

# Outputs:
#   Courses    Fee Duration  Discount
# r1   Spark  20000    30day      1000
# r3  Hadoop  26000   35days      1200

To select alternate columns use


# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])

# Select Alternate Columns between two Indexes
print(df.iloc[:,1:4:2])

# Output:
#      Fee  Discount
# r1  20000      1000
# r2  25000      2300
# r3  26000      1200
# r4  22000      2500
# r5  24000      2000

6. Using Conditions with loc[] vs iloc[]

By using loc[] and iloc[] you can also select rows by conditions from pandas DataFrame.


# Using Conditions
print(df.loc[df['Fee'] >= 24000])

print(df.iloc[list(df['Fee'] >= 24000)])

# Output:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r5   pandas  24000   60days      2000

Frequently Asked Questions on Difference Between loc[] vs iloc[]

What is the primary difference between loc[] and iloc[] in Pandas?

The primary difference lies in the method of indexing data. loc[] is label-based, meaning it accesses data based on row and column labels, while iloc[] is integer-based, accessing data based on integer positions.

How does the slicing behavior differ between loc[] and iloc[]?

When slicing with loc[], it is inclusive of the endpoint, meaning the rows and columns specified in the slice are included in the output. Conversely, iloc[] slicing is exclusive of the endpoint, following the convention of Python slicing.

What types of indexing are supported by loc[] and iloc[]?

loc[] supports label-based indexing, allowing explicit selection of rows and columns based on their labels in the index. On the other hand, iloc[] supports integer-based indexing, where rows and columns are accessed by their integer positions.

How do loc[] and iloc[] differ in their handling of boolean indexing?

loc[] enables boolean array or mask indexing in addition to label-based indexing, facilitating flexible data selection based on conditions. iloc[], however, strictly relies on integer positions and doesn’t support boolean indexing directly.

When should I use loc[] versus iloc[] in my Pandas code?

Use loc[] when working with labeled data, especially when the index is meaningful, as it leads to clearer and more readable code. Conversely, iloc[] is preferred for operations where the order of rows/columns is more important than their labels, or when working with integer-based data.

Conclusion

In this article, you have learned the differences and similarities between loc and iloc in pandas DataFrame using examples. DataFrame.loc[] is label-based to select rows and/or columns in pandas. It accepts single label, multiple labels from the list, by a range (between two indexes labels), and many more. DataFrame.iloc[] is index-based to select rows and/or columns in pandas. it accepts a single index, multiple indexes from the list, indexes by a range, and many more.

Happy Learning !!

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply