Pandas Difference Between loc[] vs iloc[]

The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], if the position is not present it gives an index error. In this article, I will cover the difference and similarities between loc[] and iloc[] in Pandas DataFrame by exploring with examples.

Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.

pandas dataframe loc vs iloc
Difference Between pandas DataFrame loc vs iloc

Let’s see the differences and similarities between loc[] vs iloc[] by using the below topics with examples.

1. Difference Between loc[] vs iloc[] in pandas DataFrame

The difference between loc[] vs iloc[] is described by how you select rows and columns from pandas DataFrame.

  • loc[] is used to select rows and columns by Names/Labels
  • iloc[] is used to select rows and columns by Integer Index/Position. zero based index position.

One of the main advantages of pandas DataFrame is the ease of use. You can see this yourself when you use loc[] or iloc[] attributes to select or filter DataFrame rows or columns. These are mostly used attributes in pandas DataFrame. Let’s see the usage of these before jumping into differences and similarities.

1.1 pandas.DataFrame.loc[] Usage

DataFrame.loc[] is label-based to select rows and/or columns in pandas. It accepts single labels, multiple labels from the list, indexes by a range (between two indexes labels), and many more.

pandas Difference between loc and iloc
  • START is the name of the row/column label
  • STOP is the name of the last row/column label to take, and 
  • STEP as the number of indices to advance after each extraction

Some point to note about loc[].

  • By not providing a start row/column, loc[] selects from the beginning.
  • By not providing stop, loc[] selects all rows/columns from the start label.
  • Providing both start and stop, selects all rows/columns in between

1.2 pandas.DataFrame.iloc[] usage

DataFrame.iloc[] is a index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more.

pandas dataframe loc vs iloc
  • START is the integer index of the row/column.
  • STOP is the integer index of the last row/column where you wanted to stop the selection, and 
  • STEP as the number of indices to advance after each extraction.

Some point to note about iloc[].

  • By not providing a start index, iloc[] selects from the first row/column.
  • By not providing stop, iloc[] selects all rows/columns from the start index.
  • Providing both start and stop, selects all rows/columns in between.

Let’s create a DataFrame and explore the differences of loc[] and iloc[].


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Outputs
#r1    Spark  20000    30day      1000
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200
#r4   Python  22000   40days      2500
#r5   pandas  24000   60days      2000

2. Select Single Value Using loc[] vs iloc[]

By using loc[] and iloc[] you can select the single row and column by name and index respectively. The below example demonstrates how to select row by label and index.


# Select Single Row by Index Label
print(df.loc['r2'])

# Select Single Row by Index
print(df.iloc[1])

# Outputs
#Courses     PySpark
#Fee           25000
#Duration     40days
#Discount       2300
#Name: r2, dtype: object

In order to select column by label and Index use below.


# Select Single Column by label
print(df.loc[:, "Courses"])

# Select Single Column by Index
print(df.iloc[:, 0])

#Outputs
#    Courses
#r1    Spark
#r2  PySpark
#r3   Hadoop
#r4   Python
#r5   pandas

3. Select Multiple Rows/Column using loc[] vs iloc[]

To select multiple rows and columns, use the labels or integer index as a list to loc[] and iloc[] attributes. Below is an example of how to select rows by label and index.


# Select Multiple Rows by Label
print(df.loc[['r2','r3']])

# Select Multiple Rows by Index
print(df.iloc[[1,2]])

# Outputs
#    Courses    Fee Duration  Discount
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200

Similarly to select multiple columns from pandas DataFrame.


# Select Multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select Multiple Columns by Index
print(df.iloc[:, [0,1,3]])

# Outputs
#    Courses    Fee  Discount
#r1    Spark  20000      1000
#r2  PySpark  25000      2300
#r3   Hadoop  26000      1200
#r4   Python  22000      2500
#r5   pandas  24000      2000

4. Select Range of Values Between Two Rows or Columns

By using loc[] and iloc[], you can also select rows and columns by range. For example all items between two rows/columns. all items starting from e.t.c. The below example selects rows in between r1 and r4 row indices.


# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Select Rows Between two Indexs
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])

# Outputs
#    Courses    Fee Duration  Discount
#r1    Spark  20000    30day      1000
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200
#r4   Python  22000   40days      2500

To select columns between two column names. The below example selects all columns between Fee and Discount column labels.


# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Select Columns between two Indexes
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])

# Outputs
#      Fee Duration  Discount
#r1  20000    30day      1000
#r2  25000   40days      2300
#r3  26000   35days      1200
#r4  22000   40days      2500
#r5  24000   60days      2000




5. Select Alternate Rows or Columns

Similarly, by using ranges you can also select every alternate row from DataFrame.


# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Select Alternate rows By Index
print(df.iloc[0:4:2])

# Outputs
#   Courses    Fee Duration  Discount
#r1   Spark  20000    30day      1000
#r3  Hadoop  26000   35days      1200

To select alternate columns use


# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])

# Select Alternate Columns between two Indexes
print(df.iloc[:,1:4:2])

# Output
#      Fee  Discount
#r1  20000      1000
#r2  25000      2300
#r3  26000      1200
#r4  22000      2500
#r5  24000      2000

6. Using Conditions with loc[] vs iloc[]

By using loc[] and iloc[] you can also select rows by conditions from pandas DataFrame.


# Using Conditions
print(df.loc[df['Fee'] >= 24000])

print(df.iloc[list(df['Fee'] >= 24000)])

# Output
#    Courses    Fee Duration  Discount
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200
#r5   pandas  24000   60days      2000

Conclusion

In this article, you have learned the difference and similarities between loc and iloc in pandas DataFrame using examples. DataFrame.loc[] is label-based to select rows and/or columns in pandas. It accepts single label, multiple labels from the list, by a range (between two indexes labels), and many more. DataFrame.iloc[] is index-based to select rows and/or columns in pandas. it accepts a single index, multiple indexes from the list, indexes by a range, and many more.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

Pandas Difference Between loc[] vs iloc[]