Site icon Spark By {Examples}

Pandas Difference Between loc[] vs iloc[]

Pandas loc vs iloc

In Pandas, both loc[] and iloc[] are indexing methods used to select specific rows and columns from a DataFrame. The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], if the position is not present it gives an index error.

In this article, I will cover the differences and similarities between loc[] and iloc[] in Pandas DataFrame by exploring with examples.

pandas dataframe loc vs iloc
Difference Between pandas DataFrame loc vs iloc

Let’s see the differences and similarities between loc[] vs iloc[] by using the below topics with examples.

Key Points –

Difference Between loc[] vs iloc[] in DataFrame

The difference between loc[] vs iloc[] is described by how you select rows and columns from pandas DataFrame.

You utilize the loc[] or iloc[] attributes to select or filter DataFrame rows or columns. These are mostly used attributes in pandas DataFrame. Let’s see the usage of these before jumping into differences and similarities.

pandas.DataFrame.loc[] Usage

DataFrame.loc[] is label-based to select rows and/or columns in pandas. It accepts single labels, multiple labels from the list, indexes by a range (between two indexes labels), and many more.

pandas Difference between loc and iloc

Some points to note about loc[].

pandas.DataFrame.iloc[] usage

DataFrame.iloc[] is a index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more.

pandas dataframe loc vs iloc

Some point to note about iloc[].

Let’s create a DataFrame and explore the differences of loc[] and iloc[].


# Pandas.DataFrame.iloc[] usage 
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Outputs:
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500
# r5   pandas  24000   60days      2000

Select Single Value Using loc[] vs iloc[]

By using loc[] and iloc[] you can select the single row and column by name and index respectively. The below example demonstrates how to select row by label and index.


# Select Single Row by Index Label
print(df.loc['r2'])

# Select Single Row by Index
print(df.iloc[1])

# Outputs:
# Courses     PySpark
# Fee           25000
# Duration     40days
# Discount       2300
# Name: r2, dtype: object

In order to select column by label and Index use below.


# Select Single Column by label
print(df.loc[:, "Courses"])

# Select Single Column by Index
print(df.iloc[:, 0])

# Outputs:
#    Courses
# r1    Spark
# r2  PySpark
# r3   Hadoop
# r4   Python
# r5   pandas

Select Multiple Rows/Columns using loc[] vs iloc[]

To select multiple rows and columns, use the labels or integer index as a list to loc[] and iloc[] attributes. Below is an example of how to select rows by label and index.


# Select Multiple Rows by Label
print(df.loc[['r2','r3']])

# Select Multiple Rows by Index
print(df.iloc[[1,2]])

# Outputs:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200

Similarly, to select multiple columns from a Pandas DataFrame, you can use the loc[] method with a colon (:) to specify all rows, followed by a list of column labels enclosed in square brackets


# Select multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select multiple columns by index
print(df.iloc[:, [0,1,3]])

# Outputs:
#    Courses    Fee  Discount
# r1    Spark  20000      1000
# r2  PySpark  25000      2300
# r3   Hadoop  26000      1200
# r4   Python  22000      2500
# r5   pandas  24000      2000

Select Range of Values Between Two Rows or Columns

To select a range of values between two rows or columns in a Pandas DataFrame, you can use the slice notation within the loc[] and iloc[] methods.


# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Select Rows Between two Indexs
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])

# Outputs:
#    Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500

In the above example,

To select columns between two column names in a Pandas DataFrame, you can use the loc[] & iloc[] method with slice notation.


# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Select Columns between two Indexes
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])

# Outputs:
#      Fee Duration  Discount
# r1  20000    30day      1000
# r2  25000   40days      2300
# r3  26000   35days      1200
# r4  22000   40days      2500
# r5  24000   60days      2000

In the both above examples,

Select Alternate Rows or Columns

Similarly, You can select alternate rows or columns using both loc[] and iloc[] methods. For instance,


# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Select Alternate rows By Index
print(df.iloc[0:4:2])

# Outputs:
#   Courses    Fee Duration  Discount
# r1   Spark  20000    30day      1000
# r3  Hadoop  26000   35days      1200

To selects alternate columns between two labels and two index positions using the loc[] and iloc[] methods


# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])

# Select Alternate Columns between two Indexes
print(df.iloc[:,1:4:2])

# Output:
#      Fee  Discount
# r1  20000      1000
# r2  25000      2300
# r3  26000      1200
# r4  22000      2500
# r5  24000      2000

In the above examples,

Using Conditions with loc[] vs iloc[]

Conditions can be applied to select specific rows or columns from a DataFrame using both loc[] and iloc[] methods.


# Using Conditions
print(df.loc[df['Fee'] >= 24000])

print(df.iloc[list(df['Fee'] >= 24000)])

# Output:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r5   pandas  24000   60days      2000

Frequently Asked Questions on Difference Between loc[] vs iloc[]

What is the primary difference between loc[] and iloc[] in Pandas?

The primary difference lies in the method of indexing data. loc[] is label-based, meaning it accesses data based on row and column labels, while iloc[] is integer-based, accessing data based on integer positions.

How does the slicing behavior differ between loc[] and iloc[]?

When slicing with loc[], it is inclusive of the endpoint, meaning the rows and columns specified in the slice are included in the output. Conversely, iloc[] slicing is exclusive of the endpoint, following the convention of Python slicing.

What types of indexing are supported by loc[] and iloc[]?

loc[] supports label-based indexing, allowing explicit selection of rows and columns based on their labels in the index. On the other hand, iloc[] supports integer-based indexing, where rows and columns are accessed by their integer positions.

How do loc[] and iloc[] differ in their handling of boolean indexing?

loc[] enables boolean array or mask indexing in addition to label-based indexing, facilitating flexible data selection based on conditions. iloc[], however, strictly relies on integer positions and doesn’t support boolean indexing directly.

When should I use loc[] versus iloc[] in my Pandas code?

Use loc[] when working with labeled data, especially when the index is meaningful, as it leads to clearer and more readable code. Conversely, iloc[] is preferred for operations where the order of rows/columns is more important than their labels, or when working with integer-based data.

Conclusion

In this article, I have explained the differences and similarities between loc and iloc in pandas DataFrame using examples. DataFrame.loc[] facilitates label-based selection of rows and/or columns in pandas. It supports single labels, lists of labels, ranges specified by two index labels, and additional selection methods. DataFrame.iloc[] is index-based to select rows and/or columns in pandas. it accepts a single index, multiple indexes from the list, indexes by a range, and many more.

Happy Learning !!

Related Articles

References

Exit mobile version