• Post author:
  • Post category:Pandas
  • Post last modified:April 18, 2024
  • Reading time:21 mins read
You are currently viewing Pandas Difference Between loc[] vs iloc[]

In Pandas, both loc[] and iloc[] are indexing methods used to select specific rows and columns from a DataFrame. The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], if the position is not present it gives an index error.

Advertisements

In this article, I will cover the differences and similarities between loc[] and iloc[] in Pandas DataFrame by exploring with examples.

pandas dataframe loc vs iloc
Difference Between pandas DataFrame loc vs iloc

Let’s see the differences and similarities between loc[] vs iloc[] by using the below topics with examples.

Key Points –

  • loc[] is primarily label-based indexing, meaning it uses row and column labels to access data, while iloc[] is integer-based indexing, using integer positions to access data.
  • loc[] is inclusive of the endpoint when slicing, whereas iloc[] is exclusive of the endpoint, similar to Python slicing convention.
  • loc[] is used to access data by label, allowing for explicit row and column selection based on their labels in the index, while iloc[] is used for integer-based indexing, where rows and columns are accessed by their integer positions.
  • loc[] allows for boolean array/mask indexing along with label-based indexing, enabling more flexible selection of data based on conditions, while iloc[] strictly relies on integer positions.

Difference Between loc[] vs iloc[] in DataFrame

The difference between loc[] vs iloc[] is described by how you select rows and columns from pandas DataFrame.

  • loc[] is used to select rows and columns by Names/Labels
  • iloc[] is used to select rows and columns by Integer Index/Position. zero based index position.

You utilize the loc[] or iloc[] attributes to select or filter DataFrame rows or columns. These are mostly used attributes in pandas DataFrame. Let’s see the usage of these before jumping into differences and similarities.

pandas.DataFrame.loc[] Usage

DataFrame.loc[] is label-based to select rows and/or columns in pandas. It accepts single labels, multiple labels from the list, indexes by a range (between two indexes labels), and many more.

pandas Difference between loc and iloc
  • START is the label of the first row/column to include in the slice.
  • STOP is the label of the last row/column to include in the slice.
  • STEP indicates the interval between labels to include in the slice.

Some points to note about loc[].

  • When you don’t provide a start column, loc[] selects columns from the beginning.
  • If you don’t provide a stop column, loc[] selects all columns from the start label to the end.
  • When you provide both start and stop columns, loc[] selects all columns in between those two columns, inclusive of both start and stop columns.

pandas.DataFrame.iloc[] usage

DataFrame.iloc[] is a index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more.

pandas dataframe loc vs iloc
  • START is the integer index of the row/column.
  • STOP is the integer index of the last row/column where you wanted to stop the selection, and 
  • STEP is the number of indices to advance after each extraction.

Some point to note about iloc[].

  • By not providing a start index, iloc[] selects from the first row/column.
  • By not providing stop, iloc[] selects all rows/columns from the start index.
  • Providing both start and stop, selects all rows/columns in between.

Let’s create a DataFrame and explore the differences of loc[] and iloc[].


# Pandas.DataFrame.iloc[] usage 
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Outputs:
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500
# r5   pandas  24000   60days      2000

Select Single Value Using loc[] vs iloc[]

By using loc[] and iloc[] you can select the single row and column by name and index respectively. The below example demonstrates how to select row by label and index.


# Select Single Row by Index Label
print(df.loc['r2'])

# Select Single Row by Index
print(df.iloc[1])

# Outputs:
# Courses     PySpark
# Fee           25000
# Duration     40days
# Discount       2300
# Name: r2, dtype: object

In order to select column by label and Index use below.


# Select Single Column by label
print(df.loc[:, "Courses"])

# Select Single Column by Index
print(df.iloc[:, 0])

# Outputs:
#    Courses
# r1    Spark
# r2  PySpark
# r3   Hadoop
# r4   Python
# r5   pandas

Select Multiple Rows/Columns using loc[] vs iloc[]

To select multiple rows and columns, use the labels or integer index as a list to loc[] and iloc[] attributes. Below is an example of how to select rows by label and index.


# Select Multiple Rows by Label
print(df.loc[['r2','r3']])

# Select Multiple Rows by Index
print(df.iloc[[1,2]])

# Outputs:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200

Similarly, to select multiple columns from a Pandas DataFrame, you can use the loc[] method with a colon (:) to specify all rows, followed by a list of column labels enclosed in square brackets


# Select multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select multiple columns by index
print(df.iloc[:, [0,1,3]])

# Outputs:
#    Courses    Fee  Discount
# r1    Spark  20000      1000
# r2  PySpark  25000      2300
# r3   Hadoop  26000      1200
# r4   Python  22000      2500
# r5   pandas  24000      2000

Select Range of Values Between Two Rows or Columns

To select a range of values between two rows or columns in a Pandas DataFrame, you can use the slice notation within the loc[] and iloc[] methods.


# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Select Rows Between two Indexs
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])

# Outputs:
#    Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500

In the above example,

  • df.loc['r1':'r4'] select rows from index label r1 to r4, including both r1 and r4.
  • df.iloc[0:4] selects rows from index position 0 up to, but excluding, index position 4. It includes rows at index positions 0, 1, 2, and 3.

To select columns between two column names in a Pandas DataFrame, you can use the loc[] & iloc[] method with slice notation.


# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Select Columns between two Indexes
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])

# Outputs:
#      Fee Duration  Discount
# r1  20000    30day      1000
# r2  25000   40days      2300
# r3  26000   35days      1200
# r4  22000   40days      2500
# r5  24000   60days      2000

In the both above examples,

  • df.loc[:,'Fee':'Discount'] selects columns from Fee to Discount, including both Fee and Discount.
  • df.iloc[:,1:4] selects columns from index position 1 up to, but excluding, index position 4. It includes columns at index positions 1, 2, and 3 (Fee, Duration, and Discount).

Select Alternate Rows or Columns

Similarly, You can select alternate rows or columns using both loc[] and iloc[] methods. For instance,

  • To select alternate rows using index labels with a step size of 2 using the loc[] method.
  • To select alternate rows by index positions using the iloc[] method with a step size of 2.

# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Select Alternate rows By Index
print(df.iloc[0:4:2])

# Outputs:
#   Courses    Fee Duration  Discount
# r1   Spark  20000    30day      1000
# r3  Hadoop  26000   35days      1200

To selects alternate columns between two labels and two index positions using the loc[] and iloc[] methods


# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])

# Select Alternate Columns between two Indexes
print(df.iloc[:,1:4:2])

# Output:
#      Fee  Discount
# r1  20000      1000
# r2  25000      2300
# r3  26000      1200
# r4  22000      2500
# r5  24000      2000

In the above examples,

  • The first example is selecting alternate columns starting from Fee up to Discount, with a step size of 2. It includes Fee and Discount columns.
  • The second example is alternate columns starting from index position 1 up to 4 (excluding), with a step size of 2. It includes columns at index positions 1 and 3 (Fee and Discount).

Using Conditions with loc[] vs iloc[]

Conditions can be applied to select specific rows or columns from a DataFrame using both loc[] and iloc[] methods.


# Using Conditions
print(df.loc[df['Fee'] >= 24000])

print(df.iloc[list(df['Fee'] >= 24000)])

# Output:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r5   pandas  24000   60days      2000

Frequently Asked Questions on Difference Between loc[] vs iloc[]

What is the primary difference between loc[] and iloc[] in Pandas?

The primary difference lies in the method of indexing data. loc[] is label-based, meaning it accesses data based on row and column labels, while iloc[] is integer-based, accessing data based on integer positions.

How does the slicing behavior differ between loc[] and iloc[]?

When slicing with loc[], it is inclusive of the endpoint, meaning the rows and columns specified in the slice are included in the output. Conversely, iloc[] slicing is exclusive of the endpoint, following the convention of Python slicing.

What types of indexing are supported by loc[] and iloc[]?

loc[] supports label-based indexing, allowing explicit selection of rows and columns based on their labels in the index. On the other hand, iloc[] supports integer-based indexing, where rows and columns are accessed by their integer positions.

How do loc[] and iloc[] differ in their handling of boolean indexing?

loc[] enables boolean array or mask indexing in addition to label-based indexing, facilitating flexible data selection based on conditions. iloc[], however, strictly relies on integer positions and doesn’t support boolean indexing directly.

When should I use loc[] versus iloc[] in my Pandas code?

Use loc[] when working with labeled data, especially when the index is meaningful, as it leads to clearer and more readable code. Conversely, iloc[] is preferred for operations where the order of rows/columns is more important than their labels, or when working with integer-based data.

Conclusion

In this article, I have explained the differences and similarities between loc and iloc in pandas DataFrame using examples. DataFrame.loc[] facilitates label-based selection of rows and/or columns in pandas. It supports single labels, lists of labels, ranges specified by two index labels, and additional selection methods. DataFrame.iloc[] is index-based to select rows and/or columns in pandas. it accepts a single index, multiple indexes from the list, indexes by a range, and many more.

Happy Learning !!

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply