• Post author:
  • Post category:Pandas
  • Post last modified:May 30, 2024
  • Reading time:21 mins read
You are currently viewing Pandas Difference Between loc[] vs iloc[]

In Pandas, both loc[] and iloc[] are indexing methods used to select specific rows and columns from a DataFrame. The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], If the position doesn’t exist, it triggers an index error.

Advertisements

In this article, I will explain the differences and similarities between loc[] and iloc[] in Pandas DataFrame by exploring with examples.

pandas dataframe loc vs iloc
Difference Between pandas DataFrame loc vs iloc

Let’s see the differences and similarities between loc[] vs iloc[] by using the below topics with examples.

Key Points –

  • loc[] is primarily label-based indexing, meaning it uses row and column labels to access data, while iloc[] is integer-based indexing, using integer positions to access data.
  • loc[] is inclusive of the endpoint when slicing, whereas iloc[] is exclusive of the endpoint, similar to Python slicing convention.
  • loc[] is used to access data by label, allowing for explicit row and column selection based on their labels in the index, while iloc[] is used for integer-based indexing, where rows and columns are accessed by their integer positions.
  • loc[] allows for boolean array/mask indexing along with label-based indexing, enabling more flexible selection of data based on conditions, while iloc[] strictly relies on integer positions.

Difference Between loc[] vs iloc[] in DataFrame

The difference between loc[] vs iloc[] is described by how you select rows and columns from Pandas DataFrame.

  • loc[] is used to select rows and columns by Names/Labels
  • iloc[] is used to select rows and columns by Integer Index/Position. zero based index position.

You utilize the loc[] or iloc[] attributes to select or filter DataFrame rows or columns. These are mostly used attributes in pandas DataFrame.

pandas.DataFrame.loc[] Usage

DataFrame.loc[] is label-based to select rows and/or columns in pandas. It accepts single labels, multiple labels from the list, indexes by a range (between two indexes labels), and many more.

pandas Difference between loc and iloc
  • START: This corresponds to the starting label in a slice.
  • STOP: This corresponds to the ending label in a slice.
  • STEP: This indicates the interval between labels to include in the slice.

Some points to note about loc[].

  • If you don’t provide a start row/column label, it selects from the beginning.
  • If you don’t provide a stop row/column label, it selects all rows/columns from the start label.
  • By providing both start and stop labels, it selects all rows/columns in between, including both the start and stop labels.

pandas.DataFrame.iloc[] usage

DataFrame.iloc[] is a index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more.

pandas dataframe loc vs iloc
  • START is the integer index of the row/column.
  • STOP is the integer index of the last row/column where you wanted to stop the selection, and 
  • STEP is the number of indices to advance after each extraction.

Some point to note about iloc[].

  • By not providing a start index, iloc[] selects from the first row/column.
  • By not providing stop, iloc[] selects all rows/columns from the start index.
  • Providing both start and stop, selects all rows/columns in between.

To run some examples of pandas’ difference between loc[] vs iloc[], let’s create a DataFrame.


# Pandas.DataFrame.iloc[] usage 
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Outputs:
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500
# r5   pandas  24000   60days      2000

Select Single Value Using loc[] vs iloc[]

Both .loc[] and .iloc[] can be used to select single rows and columns, but they use different methods for indexing


# Select Single Row by Index Label
print(df.loc['r2'])

# Select Single Row by Index
print(df.iloc[1])

# Outputs:
# Courses     PySpark
# Fee           25000
# Duration     40days
# Discount       2300
# Name: r2, dtype: object

You can select a single column by label and index. In both cases, you are selecting all rows (:) from the specified column. With .loc[], you specify the column label directly, while with .iloc[], you use the column index.


# Select single column by label
print(df.loc[:, "Courses"])

# Select single column by index
print(df.iloc[:, 0])

# Outputs:
#    Courses
# r1    Spark
# r2  PySpark
# r3   Hadoop
# r4   Python
# r5   pandas

Using loc[] vs iloc[] To Select Multiple Values

To select multiple values using loc[] and iloc[], you can specify the rows and columns you want to select. Below is an example of how to select rows by label and index.


# Select multiple rows by label
print(df.loc[['r2','r3']])

# Select multiple rows by index
print(df.iloc[[1,2]])

# Outputs:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200

You can also select multiple columns from a Pandas DataFrame, you can use the loc[] method with a colon (:) to specify all rows, followed by a list of column labels enclosed in square brackets


# Select multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select multiple columns by index
print(df.iloc[:, [0,1,3]])

# Outputs:
#    Courses    Fee  Discount
# r1    Spark  20000      1000
# r2  PySpark  25000      2300
# r3   Hadoop  26000      1200
# r4   Python  22000      2500
# r5   pandas  24000      2000

Select Range of Values

To select a range of values between two rows or columns in a Pandas DataFrame, you can use the slice notation within the loc[] and iloc[] methods.


# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Select Rows Between two Indexs
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])

# Outputs:
#    Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500

In the above example,

  • df.loc['r1':'r4'] select rows from index label r1 to r4, including both r1 and r4.
  • df.iloc[0:4] selects rows from index position 0 up to, but excluding, index position 4. It includes rows at index positions 0, 1, 2, and 3.

To select columns between two column names in a Pandas DataFrame, you can use the loc[] & iloc[] method with slice notation.


# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Select Columns between two Indexes
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])

# Outputs:
#      Fee Duration  Discount
# r1  20000    30day      1000
# r2  25000   40days      2300
# r3  26000   35days      1200
# r4  22000   40days      2500
# r5  24000   60days      2000

In the both above examples,

  • df.loc[:,'Fee':'Discount'] selects columns from Fee to Discount, including both Fee and Discount.
  • df.iloc[:,1:4] selects columns from index position 1 up to, but excluding, index position 4. It includes columns at index positions 1, 2, and 3 (Fee, Duration, and Discount).

Select Every Other Row or Column

You can select every other row or column using both loc[] and iloc[] methods. For instance,

  • To select alternate rows using index labels with a step size of 2 using the loc[] method.
  • To select alternate rows by index positions using the iloc[] method with a step size of 2.

# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Select Alternate rows By Index
print(df.iloc[0:4:2])

# Outputs:
#   Courses    Fee Duration  Discount
# r1   Spark  20000    30day      1000
# r3  Hadoop  26000   35days      1200

You can also select alternate columns between two labels and two index positions using the loc[] and iloc[] methods


# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])

# Select Alternate Columns between two Indexes
print(df.iloc[:,1:4:2])

# Output:
#      Fee  Discount
# r1  20000      1000
# r2  25000      2300
# r3  26000      1200
# r4  22000      2500
# r5  24000      2000

In the above examples,

  • The first example is selecting alternate columns starting from Fee up to Discount, with a step size of 2. It includes Fee and Discount columns.
  • The second example is alternate columns starting from index position 1 up to 4 (excluding), with a step size of 2. It includes columns at index positions 1 and 3 (Fee and Discount).

Using Conditions with loc[] vs iloc[]

Conditions can be applied to select specific rows or columns from a DataFrame using both loc[] and iloc[] methods.


# Using Conditions
print(df.loc[df['Fee'] >= 24000])

print(df.iloc[list(df['Fee'] >= 24000)])

# Output:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r5   pandas  24000   60days      2000

Frequently Asked Questions on Difference Between loc[] vs iloc[]

What is the primary difference between loc[] and iloc[] in Pandas?

The primary difference lies in the method of indexing data. loc[] is label-based, meaning it accesses data based on row and column labels, while iloc[] is integer-based, accessing data based on integer positions.

How does the slicing behavior differ between loc[] and iloc[]?

When slicing with loc[], it is inclusive of the endpoint, meaning the rows and columns specified in the slice are included in the output. Conversely, iloc[] slicing is exclusive of the endpoint, following the convention of Python slicing.

What types of indexing are supported by loc[] and iloc[]?

loc[] supports label-based indexing, allowing explicit selection of rows and columns based on their labels in the index. On the other hand, iloc[] supports integer-based indexing, where rows and columns are accessed by their integer positions.

How do loc[] and iloc[] differ in their handling of boolean indexing?

loc[] enables boolean array or mask indexing in addition to label-based indexing, facilitating flexible data selection based on conditions. iloc[], however, strictly relies on integer positions and doesn’t support boolean indexing directly.

When should I use loc[] versus iloc[] in my Pandas code?

Use loc[] when working with labeled data, especially when the index is meaningful, as it leads to clearer and more readable code. Conversely, iloc[] is preferred for operations where the order of rows/columns is more important than their labels, or when working with integer-based data.

Conclusion

In this article, I have explained the differences and similarities between loc and iloc in pandas DataFrame using examples. DataFrame.loc[] facilitates label-based selection of rows and/or columns in pandas. It supports single labels, lists of labels, ranges specified by two index labels, and additional selection methods. DataFrame.iloc[] in pandas operates based on indices for selecting rows and/or columns. It can take a single index, a list of multiple indexes, a range of indexes, and various other options.

Happy Learning !!

Related Articles

References

Leave a Reply