Pandas iloc[] Usage with Examples

pandas.DataFrame.iloc[] is a property that is used to select rows and columns by position/index. If the position/index does not exist, it gives an index error. In this article, I will cover usage and examples of pandas iloc.

Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.

pandas loc[] is another property that is used to operate on the column and row labels. For a better understanding of these two learn the differences and similarities between pandas loc[] vs iloc[]. The difference between loc[] vs iloc[] is described by how you select rows and columns from pandas DataFrame.

1 pandas.DataFrame.iloc[] Syntax & Usage

DataFrame.iloc[] is an index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more.

One of the main advantages of DataFrame is its ease of use. You can see this yourself when you use loc[] or iloc[] attributes to select or filter DataFrame rows or columns. These are mostly used attributes in DataFrame.

pandas iloc
  • START is the integer index of the row/column.
  • STOP is the integer index of the last row/column where you wanted to stop the selection, and 
  • STEP as the number of indices to advance after each extraction.

Some point to note about iloc[].

  • By not providing a start index, iloc[] selects from the first row/column.
  • By not providing stop, iloc[] selects all rows/columns from the start index.
  • Providing both start and stop, selects all rows/columns in between.

Let’s create a DataFrame and run some examples of pandas iloc.


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Outputs
#r1    Spark  20000    30day      1000
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200
#r4   Python  22000   40days      2500
#r5   pandas  24000   60days      2000

2. Select Single Row & Column By Index

Using iloc[] you can select the single row and column by index. The below example demonstrates how to select row by index.


# Select Single Row by Index
print(df.iloc[1])

# Outputs
#Courses     PySpark
#Fee           25000
#Duration     40days
#Discount       2300
#Name: r2, dtype: object

In order to select column by Index, use below.


# Select Single Column by Index
print(df.iloc[:, 0])

#Outputs
#    Courses
#r1    Spark
#r2  PySpark
#r3   Hadoop
#r4   Python
#r5   pandas

3. Select Multiple Rows & Columns by Index

To select multiple rows and columns, use integer index as a list to iloc[] attribute. Below is an example of how to select rows by index.


# Select Multiple Rows by Index
print(df.iloc[[1,2]])

# Outputs
#    Courses    Fee Duration  Discount
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200

Similarly to select multiple columns from pandas DataFrame.


# Select Multiple Columns by Index
print(df.iloc[:, [0,1,3]])

# Outputs
#    Courses    Fee  Discount
#r1    Spark  20000      1000
#r2  PySpark  25000      2300
#r3   Hadoop  26000      1200
#r4   Python  22000      2500
#r5   pandas  24000      2000

4. Select Rows or Columns by Index Range

By using iloc[], you can also select rows and columns by range. For example all items between two rows/columns. all items starting from e.t.c. The below example selects rows in between 0 and 4 row indices.


# Select Rows Between two Indexs
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])

# Outputs
#    Courses    Fee Duration  Discount
#r1    Spark  20000    30day      1000
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200
#r4   Python  22000   40days      2500

To select columns between two column names. The below example selects all columns between 1 and 4 column indexes.


# Select Columns between two Indexes
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])

# Outputs
#      Fee Duration  Discount
#r1  20000    30day      1000
#r2  25000   40days      2300
#r3  26000   35days      1200
#r4  22000   40days      2500
#r5  24000   60days      2000




5. Select Alternate Rows or Columns

Similarly, by using ranges you can also select every alternate row from DataFrame.


# Select Alternate rows By Index
print(df.iloc[0:4:2])

# Outputs
#   Courses    Fee Duration  Discount
#r1   Spark  20000    30day      1000
#r3  Hadoop  26000   35days      1200

To select alternate columns use


# Select Alternate Columns between two Indexes
print(df.iloc[:,1:4:2])

# Output
#      Fee  Discount
#r1  20000      1000
#r2  25000      2300
#r3  26000      1200
#r4  22000      2500
#r5  24000      2000

6. Using Conditions with iloc[]

By using iloc[] you can also select rows by conditions from pandas DataFrame.


# By Condition
print(df.iloc[list(df['Fee'] >= 24000)])

# Output
#    Courses    Fee Duration  Discount
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200
#r5   pandas  24000   60days      2000

7. pandas iloc[] Complete Example


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Select Single Row by Index
print(df.iloc[1])

# Select Single Column by Index
print(df.iloc[:, 0])

# Select Multiple Rows by Index
print(df.iloc[[1,2]])

# Select Multiple Columns by Index
print(df.iloc[:, [0,1,3]])

# Includes Index 0 & Execludes 4
print(df.iloc[0:4])

# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])

# Select Alternate rows By Index
print(df.iloc[0:4:2])

# Select Alternate Columns between two Indexes
print(df.iloc[:,1:4:2])

print(df.iloc[list(df['Fee'] >= 24000)])

Conclusion

In this article, you have learned iloc in pandas is index-based to select rows and/or columns. It accepts a single index, multiple indexes from the list, indexes by a range, and many more.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas iloc[] Usage with Examples