pandas.DataFrame.iloc[] is a property that is used to select rows and columns by position/index. If the position/index does not exist, it gives an index error. In this article, I will cover usage and examples of pandas iloc.
Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.
pandas loc[] is another property that is used to operate on the column and row labels. For a better understanding of these two learn the differences and similarities between pandas loc[] vs iloc[]. The difference between loc[]
vs iloc[]
is described by how you select rows and columns from Pandas DataFrame.
1 pandas.DataFrame.iloc[] Syntax & Usage
DataFrame.iloc[]
is index-based to select rows and/or columns in pandas. It accepts a single index, multiple indexes from the list, indexes by a range, and many more.
One of the main advantages of DataFrame is its ease of use. You can see this yourself when you use loc[]
or iloc[]
attributes to select or filter DataFrame rows or columns. These are mostly used attributes in DataFrame.
START
is the integer index of the row/column.STOP
is the integer index of the last row/column where you wanted to stop the selection, andSTEP
as the number of indices to advance after each extraction.
Some point to note about iloc[].
- By not providing a start index,
iloc[]
selects from the first row/column. - By not providing stop,
iloc[]
selects all rows/columns from the start index. - Providing both start and stop, selects all rows/columns in between.
Let’s create a DataFrame and run some examples of pandas iloc.
# Create pandas DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30days','40days','35days','40days','60days'],
'Discount':[1000,2300,1200,2500,2000]
}
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n",df)
Yields below output.
2. Select Single Row & Column By Index
Using iloc[]
you can select a single row and column by index. The below example demonstrates how to select row by index. The second row (index ‘r2’) of the DataFrame is selected using iloc[1]
, and it prints the corresponding values for each column in that row.
# Select single row by index
print(df.iloc[1])
Yields below output.
Using iloc[]
to select a single column by index. Specifically, it’s selecting all rows (:
) for the column with index 0.
In the below example, it selects the entire first column (Courses
) from the DataFrame using integer-based indexing. Each element in the result corresponds to the value in the first column for the respective row.
# Select single column by index
print(df.iloc[:, 0])
# Outputs:
# r1 Spark
# r2 PySpark
# r3 Hadoop
# r4 Python
# r5 pandas
Name: Courses, dtype: object
3. Select Multiple Rows & Columns by Index
To select multiple rows and columns, use the integer index as a list to iloc[]
attribute. Below is an example of how to select rows by index. It selects the rows with indices 1 and 2, providing a DataFrame that includes the specified rows.
# Select multiple rows by index
print(df.iloc[[1,2]])
# Outputs:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
Similarly, to select multiple columns from pandas DataFrame. For example, it selects the specified columns (Courses, Fee, and Discount) for all rows in the DataFrame.
# Select multiple columns by index
print(df.iloc[:, [0,1,3]])
# Outputs:
# Courses Fee Discount
# r1 Spark 20000 1000
# r2 PySpark 25000 2300
# r3 Hadoop 26000 1200
# r4 Python 22000 2500
# r5 pandas 24000 2000
4. Select Rows or Columns by Index Range
By using iloc[]
, you can also select rows and columns by range. For example all items between two rows/columns. all items starting from e.t.c. The below example selects rows in between 0
and 4
row indices.
# Select rows between two indexs
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])
# Outputs:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r4 Python 22000 40days 2500
To select columns between two column names. The below example selects all columns between 1
and 4
column indexes.
# Select Columns between two Indexes
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])
# Outputs:
# Fee Duration Discount
# r1 20000 30day 1000
# r2 25000 40days 2300
# r3 26000 35days 1200
# r4 22000 40days 2500
# r5 24000 60days 2000
5. Select Alternate Rows or Columns
Similarly, by using ranges you can also select every alternate row from DataFrame. df.iloc[0:4:2]
is using slicing to select alternate rows by index. The slicing notation 0:4:2
selects rows starting from index 0 up to (but not including) index 4, with a step of 2.
# Select Alternate rows By Index
print(df.iloc[0:4:2])
# Outputs:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r3 Hadoop 26000 35days 1200
To select alternate columns use df.iloc[:, 1:4:2]
is using slicing to select alternate columns between two indexes. The slicing notation 1:4:2
selects columns starting from index 1 up to (but not including) index 4, with a step of 2.
# Select alternate columns between two indexes
print(df.iloc[:,1:4:2])
# Output:
# Fee Discount
# r1 20000 1000
# r2 25000 2300
# r3 26000 1200
# r4 22000 2500
# r5 24000 2000
6. Using Conditions with iloc[]
By using iloc[]
you can also select rows by conditions from pandas DataFrame. Use df.iloc[list(df['Fee'] >= 24000)]
is using a conditional expression to filter rows based on the condition that the ‘Fee’ column should be greater than or equal to 24000.
This program selects rows from the DataFrame where the ‘Fee’ column is greater than or equal to 24000 using boolean indexing. The result is a DataFrame containing only the rows that satisfy the specified condition.
# By Condition
print(df.iloc[list(df['Fee'] >= 24000)])
# Output:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r5 pandas 24000 60days 2000
7. Complete Example of pandas iloc[]
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30day','40days','35days','40days','60days'],
'Discount':[1000,2300,1200,2500,2000]
}
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Select Single Row by Index
print(df.iloc[1])
# Select Single Column by Index
print(df.iloc[:, 0])
# Select Multiple Rows by Index
print(df.iloc[[1,2]])
# Select Multiple Columns by Index
print(df.iloc[:, [0,1,3]])
# Includes Index 0 & Execludes 4
print(df.iloc[0:4])
# Includes Index 1 & Execludes 4
print(df.iloc[:,1:4])
# Select Alternate rows By Index
print(df.iloc[0:4:2])
# Select Alternate Columns between two Indexes
print(df.iloc[:,1:4:2])
print(df.iloc[list(df['Fee'] >= 24000)])
Frequently Asked Questions on Pandas iloc[]
In pandas, iloc[]
is a method used for integer-location based indexing. It is primarily used to select specific rows and columns in a DataFrame using integer indices, providing a way to access data based on its numerical position.
You can use negative indices with iloc[]
in pandas. Negative indices are interpreted as positions counting from the end of the DataFrame. For example, -1
refers to the last element, -2
refers to the second-to-last element, and so on.
While iloc[]
uses integer-based indexing, loc[]
is label-based indexing. iloc[]
is used when you want to select data based on numerical positions, whereas loc[]
is used when you want to select data based on labels (row or column names).iffer from loc[]?
You can select a range of rows or columns using iloc[]
in pandas by using slicing. Slicing allows you to specify a range of indices or positions.
To select alternate rows or columns using iloc[]
in pandas, you can use slicing with a step parameter. The step parameter allows you to specify the interval between selected elements. Here are examples for selecting alternate rows and columns.
While iloc[]
is primarily designed for integer-location based indexing and doesn’t directly support boolean indexing, you can use a boolean array in conjunction with iloc[]
to achieve row selection based on a condition.
You cannot use iloc[]
with both row labels and column labels simultaneously. iloc[]
is specifically designed for integer-location based indexing, and it expects integer indices for both rows and columns.
Conclusion
In this article, you have learned iloc in pandas is index-based to select rows and/or columns. It accepts a single index, multiple indexes from the list, indexes by a range, and many more.
Happy Learning !!
Related Articles
- Pandas Series loc[] attribute
- How to Slice Columns in Pandas DataFrame
- Pandas loc[] attribute multiple conditions
- How to Change Position of a Column in Pandas
- Append a List as a Row to Pandas DataFrame
- Pandas Shuffle DataFrame Rows Examples
- Pandas Difference Between loc[] vs iloc[]
- Pandas DatetimeIndex Usage Explained
- pandas DataFrame filter() – Usage & Examples
- Pandas Drop Last Column From DataFrame
- Drop Single & Multiple Columns From pandas DataFrame
- How to Get Column Names as List From Pndas DataFrame