pandas.DataFrame.iloc[] is used to select rows and columns by their position or index. If the specified position or index is not found, it raises an IndexError. In this article, I will explain the usage and examples of pandas iloc.
Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.
pandas.DataFrame.loc[] is a property used to operate on row and column labels. To better understand loc[] and iloc[], it is important to learn their differences and similarities. The main distinction between loc[]
and iloc[]
is described by how rows and columns are selected from a Pandas DataFrame.
Key Points –
iloc[]
is primarily used for integer-location-based indexing, allowing selection by row and column positions.- The syntax is
df.iloc[row_index, column_index]
, where you can specify individual positions or ranges. iloc[]
supports slicing, so you can retrieve subsets of rows and columns by specifying start and end positions.- It accepts row and column indices (both start at 0) rather than labels or names.
iloc[]
is exclusive of the end position in slices, similar to standard Python slicing.- You can use
iloc[]
to set values by assigning directly to specific positions or slices.
1. pandas.DataFrame.iloc[] Syntax & Usage
DataFrame.iloc[]
in pandas is index-based and is used to select rows and/or columns. It can accept a single index, multiple indices from a list, a range of indices, and more.
One of the main advantages of DataFrame is its ease of use. You can see this yourself when you use loc[]
or iloc[]
attributes to select or filter DataFrame rows or columns. These are mostly used attributes in DataFrame.
START
represents the integer index of the row/column.STOP
signifies the integer index of the last row/column where you wish to conclude the selection, andSTEP
refers to the quantity of indices to progress after each extraction.
Some points to note about iloc[].
- If a start index is not provided,
iloc[]
selects from the first row/column. - If no stop index is provided,
iloc[]
selects all rows/columns from the start index. - When both start and stop indices are provided,
iloc[]
selects all rows/columns in between them.
Now, let’s create a Pandas DataFrame.
# Create pandas DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30days','40days','35days','40days','60days'],
'Discount':[1000,2300,1200,2500,2000]
}
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n",df)
Yields below output.
2. Select Single Row & Column By Index
Using iloc[]
you can select a single row and column by index. The below example demonstrates how to select row by index. The second row (index ‘r2’) of the DataFrame is selected using iloc[1]
, and it prints the corresponding values for each column in that row.
# Select single row by index
print(df.iloc[1])
Yields below output.
Using iloc[]
to select a single column by index. Specifically, it’s selecting all rows (:
) for the column with index 0.
In the below example, it selects the entire first column (Courses
) from the DataFrame using integer-based indexing. Each element in the result corresponds to the value in the first column for the respective row.
# Select single column by index
print(df.iloc[:, 0])
# Outputs:
# r1 Spark
# r2 PySpark
# r3 Hadoop
# r4 Python
# r5 pandas
Name: Courses, dtype: object
3. Select Multiple Rows & Columns by Index
To select multiple rows and columns, use the integer index as a list to iloc[]
attribute. Below is an example of how to select rows by index. It selects the rows with indices 1 and 2, providing a DataFrame that includes the specified rows.
# Select multiple rows by index
print(df.iloc[[1,2]])
# Outputs:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
Similarly, to select multiple columns from pandas DataFrame. For example, it selects the specified columns (Courses, Fee, and Discount) for all rows in the DataFrame.
# Select multiple columns by index
print(df.iloc[:, [0,1,3]])
# Outputs:
# Courses Fee Discount
# r1 Spark 20000 1000
# r2 PySpark 25000 2300
# r3 Hadoop 26000 1200
# r4 Python 22000 2500
# r5 pandas 24000 2000
4. Select Rows or Columns by Index Range
By using iloc[]
, you can also select rows and columns by range. For example all items between two rows/columns. all items starting from e.t.c. The below example selects rows in between 0
and 4
row indices.
# Select rows between two indexs
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])
# Outputs:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r4 Python 22000 40days 2500
To select columns between two column names. The below example selects all columns between 1
and 4
column indexes.
# Select Columns between two Indexes
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])
# Outputs:
# Fee Duration Discount
# r1 20000 30day 1000
# r2 25000 40days 2300
# r3 26000 35days 1200
# r4 22000 40days 2500
# r5 24000 60days 2000
5. Select Alternate Rows or Columns
Similarly, by using ranges, you can also choose every other row from the DataFrame. df.iloc[0:4:2]
utilizes slicing to select alternate rows by index. The slicing notation 0:4:2
selects rows starting from index 0 up to (but not including) index 4, with a step of 2
.
# Select alternate rows by index
print(df.iloc[0:4:2])
# Outputs:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r3 Hadoop 26000 35days 1200
To select alternate columns use df.iloc[:, 1:4:2]
is using slicing to select alternate columns between two indexes. The slicing notation 1:4:2
selects columns starting from index 1 up to (but not including) index 4, with a step of 2.
# Select alternate columns between two indexes
print(df.iloc[:,1:4:2])
# Output:
# Fee Discount
# r1 20000 1000
# r2 25000 2300
# r3 26000 1200
# r4 22000 2500
# r5 24000 2000
6. Using Conditions with iloc[]
By using iloc[]
you can also select rows by conditions from pandas DataFrame. Use df.iloc[list(df['Fee'] >= 24000)]
is using a conditional expression to filter rows based on the condition that the ‘Fee’ column should be greater than or equal to 24000.
This program selects rows from the DataFrame where the ‘Fee’ column is greater than or equal to 24000 using boolean indexing. The result is a DataFrame containing only the rows that satisfy the specified condition.
# By Condition
print(df.iloc[list(df['Fee'] >= 24000)])
# Output:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r5 pandas 24000 60days 2000
7. Complete Example
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30day','40days','35days','40days','60days'],
'Discount':[1000,2300,1200,2500,2000]
}
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Select Single Row by Index
print(df.iloc[1])
# Select Single Column by Index
print(df.iloc[:, 0])
# Select Multiple Rows by Index
print(df.iloc[[1,2]])
# Select multiple columns by index
print(df.iloc[:, [0,1,3]])
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])
# Select alternate rows by index
print(df.iloc[0:4:2])
# Select alternate columns between two indexes
print(df.iloc[:,1:4:2])
print(df.iloc[list(df['Fee'] >= 24000)])
Frequently Asked Questions on Pandas iloc[]
In pandas, iloc[]
is a method used for integer-location based indexing. It is primarily used to select specific rows and columns in a DataFrame using integer indices, providing a way to access data based on its numerical position.
While iloc[]
uses integer-based indexing, loc[]
is label-based indexing. iloc[]
is used when you want to select data based on numerical positions, whereas loc[]
is used when you want to select data based on labels (row or column names).iffer from loc[]?
You can select a range of rows or columns using iloc[]
in pandas by using slicing. Slicing allows you to specify a range of indices or positions.
To select alternate rows or columns using iloc[]
in pandas, you can use slicing with a step parameter. The step parameter allows you to specify the interval between selected elements. Here are examples for selecting alternate rows and columns.
You cannot use iloc[]
with both row labels and column labels simultaneously. iloc[]
is specifically designed for integer-location based indexing, and it expects integer indices for both rows and columns.
Conclusion
In conclusion, iloc
in pandas provides a versatile method for selecting rows and columns based on their integer indices. It allows for precise control over the selection process, accommodating single indices, lists of indices, and ranges.
Happy Learning !!
Related Articles
- Pandas Series loc[] attribute
- Pandas Filter by Index
- How to Slice Columns in Pandas DataFrame
- Pandas loc[] attribute multiple conditions
- Pandas Difference Between loc[] vs iloc[]
- Pandas DatetimeIndex Usage Explained
- Pandas Join DataFrames on Columns
- pandas DataFrame filter() – Usage & Examples
- Pandas Drop Last Column From DataFrame