In Pandas, both loc[]
and iloc[]
are indexing methods used to select specific rows and columns from a DataFrame. The main difference between pandas loc[]
vs iloc[]
is loc gets DataFrame rows & columns by labels/names and iloc[]
gets by integer Index/position. For loc[]
, if the label is not present it gives a key error. For iloc[]
, If the position doesn’t exist, it triggers an index error.
In this article, I will explain the differences and similarities between loc[] and iloc[] in Pandas DataFrame by exploring with examples.
Let’s see the differences and similarities between loc[] vs iloc[] by using the below topics with examples.
- Pandas loc vs iloc Usage
- Select Single Value
- Select Multiple Values
- Select Range of Values
- Select Alternate Rows & Columns
- Using Conditions
Key Points –
loc[]
is primarily label-based indexing, meaning it uses row and column labels to access data, whileiloc[]
is integer-based indexing, using integer positions to access data.loc[]
is inclusive of the endpoint when slicing, whereasiloc[]
is exclusive of the endpoint, similar to Python slicing convention.loc[]
is used to access data by label, allowing for explicit row and column selection based on their labels in the index, whileiloc[]
is used for integer-based indexing, where rows and columns are accessed by their integer positions.loc[]
allows for boolean array/mask indexing along with label-based indexing, enabling more flexible selection of data based on conditions, whileiloc[]
strictly relies on integer positions.
Difference Between loc[] vs iloc[] in DataFrame
The difference between loc[] vs iloc[] is described by how you select rows and columns from Pandas DataFrame.
loc[]
is used to select rows and columns by Names/Labelsiloc[]
is used to select rows and columns by Integer Index/Position. zero based index position.
You utilize the loc[] or iloc[] attributes to select or filter DataFrame rows or columns. These are mostly used attributes in pandas DataFrame.
pandas.DataFrame.loc[] Usage
DataFrame.loc[]
is label-based to select rows and/or columns in pandas. It accepts single labels, multiple labels from the list, indexes by a range (between two indexes labels), and many more.
START
: This corresponds to the starting label in a slice.STOP
: This corresponds to the ending label in a slice.STEP
: This indicates the interval between labels to include in the slice.
Some points to note about loc[].
- If you don’t provide a start row/column label, it selects from the beginning.
- If you don’t provide a stop row/column label, it selects all rows/columns from the start label.
- By providing both start and stop labels, it selects all rows/columns in between, including both the start and stop labels.
pandas.DataFrame.iloc[] usage
DataFrame.iloc[]
is a index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more.
START
is the integer index of the row/column.STOP
is the integer index of the last row/column where you wanted to stop the selection, andSTEP
is the number of indices to advance after each extraction.
Some point to note about iloc[].
- By not providing a start index, iloc[] selects from the first row/column.
- By not providing stop, iloc[] selects all rows/columns from the start index.
- Providing both start and stop, selects all rows/columns in between.
To run some examples of pandas’ difference between loc[] vs iloc[], let’s create a DataFrame.
# Pandas.DataFrame.iloc[] usage
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30day','40days','35days','40days','60days'],
'Discount':[1000,2300,1200,2500,2000]
}
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Outputs:
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r4 Python 22000 40days 2500
# r5 pandas 24000 60days 2000
Select Single Value Using loc[] vs iloc[]
Both .loc[]
and .iloc[]
can be used to select single rows and columns, but they use different methods for indexing
# Select Single Row by Index Label
print(df.loc['r2'])
# Select Single Row by Index
print(df.iloc[1])
# Outputs:
# Courses PySpark
# Fee 25000
# Duration 40days
# Discount 2300
# Name: r2, dtype: object
You can select a single column by label and index. In both cases, you are selecting all rows (:
) from the specified column. With .loc[]
, you specify the column label directly, while with .iloc[]
, you use the column index.
# Select single column by label
print(df.loc[:, "Courses"])
# Select single column by index
print(df.iloc[:, 0])
# Outputs:
# Courses
# r1 Spark
# r2 PySpark
# r3 Hadoop
# r4 Python
# r5 pandas
Using loc[] vs iloc[] To Select Multiple Values
To select multiple values using loc[]
and iloc[]
, you can specify the rows and columns you want to select. Below is an example of how to select rows by label and index.
# Select multiple rows by label
print(df.loc[['r2','r3']])
# Select multiple rows by index
print(df.iloc[[1,2]])
# Outputs:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
You can also select multiple columns from a Pandas DataFrame, you can use the loc[]
method with a colon (:
) to specify all rows, followed by a list of column labels enclosed in square brackets
# Select multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])
# Select multiple columns by index
print(df.iloc[:, [0,1,3]])
# Outputs:
# Courses Fee Discount
# r1 Spark 20000 1000
# r2 PySpark 25000 2300
# r3 Hadoop 26000 1200
# r4 Python 22000 2500
# r5 pandas 24000 2000
Select Range of Values
To select a range of values between two rows or columns in a Pandas DataFrame, you can use the slice notation within the loc[]
and iloc[]
methods.
# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])
# Select Rows Between two Indexs
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])
# Outputs:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r4 Python 22000 40days 2500
In the above example,
df.loc['r1':'r4']
select rows from index labelr1
tor4
, including bothr1
andr4
.df.iloc[0:4]
selects rows from index position 0 up to, but excluding, index position 4. It includes rows at index positions 0, 1, 2, and 3.
To select columns between two column names in a Pandas DataFrame, you can use the loc[]
& iloc[]
method with slice notation.
# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])
# Select Columns between two Indexes
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])
# Outputs:
# Fee Duration Discount
# r1 20000 30day 1000
# r2 25000 40days 2300
# r3 26000 35days 1200
# r4 22000 40days 2500
# r5 24000 60days 2000
In the both above examples,
df.loc[:,'Fee':'Discount']
selects columns fromFee
toDiscount
, including bothFee
andDiscount
.df.iloc[:,1:4]
selects columns from index position 1 up to, but excluding, index position 4. It includes columns at index positions 1, 2, and 3 (Fee
,Duration
, andDiscount
).
Select Every Other Row or Column
You can select every other row or column using both loc[] and iloc[] methods. For instance,
- To select alternate rows using index labels with a step size of
2
using theloc[]
method. - To select alternate rows by index positions using the
iloc[]
method with a step size of2
.
# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])
# Select Alternate rows By Index
print(df.iloc[0:4:2])
# Outputs:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r3 Hadoop 26000 35days 1200
You can also select alternate columns between two labels and two index positions using the loc[]
and iloc[]
methods
# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])
# Select Alternate Columns between two Indexes
print(df.iloc[:,1:4:2])
# Output:
# Fee Discount
# r1 20000 1000
# r2 25000 2300
# r3 26000 1200
# r4 22000 2500
# r5 24000 2000
In the above examples,
- The first example is selecting alternate columns starting from
Fee
up toDiscount
, with a step size of2
. It includesFee
andDiscount
columns. - The second example is alternate columns starting from index position 1 up to 4 (excluding), with a step size of 2. It includes columns at index positions 1 and 3 (
Fee
andDiscount
).
Using Conditions with loc[] vs iloc[]
Conditions can be applied to select specific rows or columns from a DataFrame using both loc[] and iloc[] methods.
# Using Conditions
print(df.loc[df['Fee'] >= 24000])
print(df.iloc[list(df['Fee'] >= 24000)])
# Output:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r5 pandas 24000 60days 2000
Frequently Asked Questions on Difference Between loc[] vs iloc[]
The primary difference lies in the method of indexing data. loc[]
is label-based, meaning it accesses data based on row and column labels, while iloc[]
is integer-based, accessing data based on integer positions.
When slicing with loc[]
, it is inclusive of the endpoint, meaning the rows and columns specified in the slice are included in the output. Conversely, iloc[]
slicing is exclusive of the endpoint, following the convention of Python slicing.
loc[]
supports label-based indexing, allowing explicit selection of rows and columns based on their labels in the index. On the other hand, iloc[]
supports integer-based indexing, where rows and columns are accessed by their integer positions.
loc[]
enables boolean array or mask indexing in addition to label-based indexing, facilitating flexible data selection based on conditions. iloc[]
, however, strictly relies on integer positions and doesn’t support boolean indexing directly.
Use loc[]
when working with labeled data, especially when the index is meaningful, as it leads to clearer and more readable code. Conversely, iloc[]
is preferred for operations where the order of rows/columns is more important than their labels, or when working with integer-based data.
Conclusion
In this article, I have explained the differences and similarities between loc and iloc in pandas DataFrame using examples. DataFrame.loc[]
facilitates label-based selection of rows and/or columns in pandas. It supports single labels, lists of labels, ranges specified by two index labels, and additional selection methods. DataFrame.iloc[]
in pandas operates based on indices for selecting rows and/or columns. It can take a single index, a list of multiple indexes, a range of indexes, and various other options.
Happy Learning !!
Related Articles
- How to Change Position of a Column in Pandas
- Append a List as a Row to Pandas DataFrame
- Pandas Shuffle DataFrame Rows Examples
- Pandas DataFrame reindex() Function
- How to Split Pandas DataFrame?
- Count NaN Values in Pandas DataFrame
- Pandas DataFrame count() Function
- Pandas Series loc[] Function
- Pandas iloc[] Usage with Examples
- Pandas loc[] multiple conditions
- Convert Pandas Timestamp to Datetime
- Pandas Get First Row Value of a Given Column