Pandas iloc[] Usage with Examples

pandas.DataFrame.iloc[] is used to select rows and columns by their position or index. If the specified position or index is not found, it raises an IndexError. In this article, I will explain the usage and examples of pandas iloc.

1. pandas.DataFrame.iloc[] Syntax & Usage

DataFrame.iloc[] in pandas is index-based and is used to select rows and/or columns. It can accept a single index, multiple indices from a list, a range of indices, and more.

One of the main advantages of DataFrame is its ease of use. You can see this yourself when you use loc[] or iloc[] attributes to select or filter DataFrame rows or columns. These are mostly used attributes in DataFrame.

START represents the integer index of the row/column.
STOP signifies the integer index of the last row/column where you wish to conclude the selection, and
STEP refers to the quantity of indices to progress after each extraction.

Some points to note about iloc[].

If a start index is not provided, iloc[] selects from the first row/column.
If no stop index is provided, iloc[] selects all rows/columns from the start index.
When both start and stop indices are provided, iloc[] selects all rows/columns in between them.

Now, let’s create a Pandas DataFrame.


# Create pandas DataFrame 
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30days','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n",df)

Yields below output.

2. Select Single Row & Column By Index

Using iloc[] you can select a single row and column by index. The below example demonstrates how to select row by index. The second row (index ‘r2’) of the DataFrame is selected using iloc[1], and it prints the corresponding values for each column in that row.


# Select single row by index
print(df.iloc[1])

Yields below output.

Using iloc[] to select a single column by index. Specifically, it’s selecting all rows (:) for the column with index 0.

In the below example, it selects the entire first column (Courses) from the DataFrame using integer-based indexing. Each element in the result corresponds to the value in the first column for the respective row.


# Select single column by index
print(df.iloc[:, 0])

# Outputs:
# r1      Spark
# r2    PySpark
# r3     Hadoop
# r4     Python
# r5     pandas
Name: Courses, dtype: object

3. Select Multiple Rows & Columns by Index

To select multiple rows and columns, use the integer index as a list to iloc[] attribute. Below is an example of how to select rows by index. It selects the rows with indices 1 and 2, providing a DataFrame that includes the specified rows.


# Select multiple rows by index
print(df.iloc[[1,2]])

# Outputs:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200

Similarly, to select multiple columns from pandas DataFrame. For example, it selects the specified columns (Courses, Fee, and Discount) for all rows in the DataFrame.


# Select multiple columns by index
print(df.iloc[:, [0,1,3]])

# Outputs:
#    Courses    Fee  Discount
# r1    Spark  20000      1000
# r2  PySpark  25000      2300
# r3   Hadoop  26000      1200
# r4   Python  22000      2500
# r5   pandas  24000      2000

4. Select Rows or Columns by Index Range

By using iloc[], you can also select rows and columns by range. For example all items between two rows/columns. all items starting from e.t.c. The below example selects rows in between 0 and 4 row indices.


# Select rows between two indexs
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])

# Outputs:
#    Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500

To select columns between two column names. The below example selects all columns between 1 and 4 column indexes.


# Select Columns between two Indexes
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])

# Outputs:
#      Fee Duration  Discount
# r1  20000    30day      1000
# r2  25000   40days      2300
# r3  26000   35days      1200
# r4  22000   40days      2500
# r5  24000   60days      2000

5. Select Alternate Rows or Columns

Similarly, by using ranges, you can also choose every other row from the DataFrame. df.iloc[0:4:2] utilizes slicing to select alternate rows by index. The slicing notation 0:4:2 selects rows starting from index 0 up to (but not including) index 4, with a step of 2.


# Select Alternate rows By Index
print(df.iloc[0:4:2])

# Outputs:
#   Courses    Fee Duration  Discount
# r1   Spark  20000    30day      1000
# r3  Hadoop  26000   35days      1200

To select alternate columns use df.iloc[:, 1:4:2] is using slicing to select alternate columns between two indexes. The slicing notation 1:4:2 selects columns starting from index 1 up to (but not including) index 4, with a step of 2.


# Select alternate columns between two indexes
print(df.iloc[:,1:4:2])

# Output:
#      Fee  Discount
# r1  20000      1000
# r2  25000      2300
# r3  26000      1200
# r4  22000      2500
# r5  24000      2000

6. Using Conditions with iloc[]

By using iloc[] you can also select rows by conditions from pandas DataFrame. Use df.iloc[list(df['Fee'] >= 24000)] is using a conditional expression to filter rows based on the condition that the ‘Fee’ column should be greater than or equal to 24000.

This program selects rows from the DataFrame where the ‘Fee’ column is greater than or equal to 24000 using boolean indexing. The result is a DataFrame containing only the rows that satisfy the specified condition.


# By Condition
print(df.iloc[list(df['Fee'] >= 24000)])

# Output:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r5   pandas  24000   60days      2000

7. Complete Example


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Select Single Row by Index
print(df.iloc[1])

# Select Single Column by Index
print(df.iloc[:, 0])

# Select Multiple Rows by Index
print(df.iloc[[1,2]])

# Select Multiple Columns by Index
print(df.iloc[:, [0,1,3]])

# Includes Index 0 & Execludes 4
print(df.iloc[0:4])

# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])

# Select Alternate rows By Index
print(df.iloc[0:4:2])

# Select Alternate Columns between two Indexes
print(df.iloc[:,1:4:2])

print(df.iloc[list(df['Fee'] >= 24000)])

Frequently Asked Questions on Pandas iloc[]

What is iloc[] in pandas?

In pandas, iloc[] is a method used for integer-location based indexing. It is primarily used to select specific rows and columns in a DataFrame using integer indices, providing a way to access data based on its numerical position.

How does iloc[] differ from loc[]?

While iloc[] uses integer-based indexing, loc[] is label-based indexing. iloc[] is used when you want to select data based on numerical positions, whereas loc[] is used when you want to select data based on labels (row or column names).iffer from loc[]?

Can I select a range of rows or columns using iloc[]?

You can select a range of rows or columns using iloc[] in pandas by using slicing. Slicing allows you to specify a range of indices or positions.

How do I select alternate rows or columns with iloc[]?

To select alternate rows or columns using iloc[] in pandas, you can use slicing with a step parameter. The step parameter allows you to specify the interval between selected elements. Here are examples for selecting alternate rows and columns.

Can I use iloc[] with both row labels and column labels simultaneously?

You cannot use iloc[] with both row labels and column labels simultaneously. iloc[] is specifically designed for integer-location based indexing, and it expects integer indices for both rows and columns.

Conclusion

In conclusion, iloc in pandas provides a versatile method for selecting rows and columns based on their integer indices. It allows for precise control over the selection process, accommodating single indices, lists of indices, and ranges.

Happy Learning !!

References

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html