Use Pandas DataFrame.iloc[] & DataFrame.loc[] to select rows by integer Index and by row indices respectively. iloc[] attribute can accept single index, multiple indexes from the list, indexes by a range, and many more. loc[] operator is explicitly used with labels that can accept single index labels, multiple index labels from the list, indexes by a range (between two index labels), and many more. When using iloc[] or loc[] with an index that doesn’t exist it returns an error.
Related: Filter Pandas DataFrame Rows Based on Condition
In this article, I will explain how to select rows from Pandas DataFrame by integer index and label (single & multiple rows), by the range, and by selecting first and last n rows with several examples. loc[] & iloc[] attributes are also used to select columns from Pandas DataFrame and refer to related articles on how to get cell value from Pandas DataFrame.
1. Quick Examples of Select Rows by Index Position & Labels
If you are in a hurry, below are some quick examples of how to select a row of Pandas DataFrame by index.
# Below are the quick examples.
# Select Rows by Integer Index
df2 = df.iloc[2] # Select Row by Index
df2 = df.iloc[[2,3,6]] # Select Rows by Index List
df2 = df.iloc[1:5] # Select Rows by Integer Index Range
df2 = df.iloc[:1] # Select First Row
df2 = df.iloc[:3] # Select First 3 Rows
df2 = df.iloc[-1:] # Select Last Row
df2 = df.iloc[-3:] # Select Last 3 Row
df2 = df.iloc[::2] # Selects alternate rows
# Select Rows by Index Labels
df2 = df.loc['r2'] # Select Row by Index Label
df2 = df.loc[['r2','r3','r6']] # Select Rows by Index Label List
df2 = df.loc['r1':'r5'] # Select Rows by Label Index Range
df2 = df.loc['r1':'r5'] # Select Rows by Label Index Range
df2 = df.loc['r1':'r5':2] # Select Alternate Rows with in Index Labels
Let’s create a DataFrame with a few rows and columns and execute some examples to learn how to use an index. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Oracle","Java"],
'Fee' :[20000,25000,26000,22000,24000,21000,22000],
'Duration':['30days','40days','35days','40days',np.nan,None,'55days'],
'Discount':[1000,2300,1500,1200,2500,2100,2000]
}
index_labels=['r1','r2','r3','r4','r5','r6','r7']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n", df)
Yields below output.
2. Select Rows by Index using Pandas iloc[]
pandas.iloc[]
attribute is used for integer-location-based indexing to select rows and columns in a DataFrame. Remember index starts from 0, you can use
pandas.DataFrame.iloc[] with the syntax [start:stop:step]
; where start
indicates the index of the first row to start, stop
indicates the index of the last row to stop, and step
indicates the number of indices to advance after each extraction. Or, use the syntax: [[indices]]
with indices as a list of row indices to take.
2.1 Select Row by Integer Index
You can select a single row from Pandas DataFrame by integer index using df.iloc[n]
. Replace n with a position you want to select.
# Select Row by Integer Index
df1 = df.iloc[2]
print("After selecting a row by index position:\n", df1)
2.2. Get Multiple Rows by Index List
Sometimes you may need to get multiple rows from DataFrame by specifying indexes as a list. Certainly, you can do this. For example, df.iloc[[2,3,6]] selects rows 3, 4, and 7 as the index starts from zero.
# Select Rows by Index List
df1 = df.iloc[[2,3,6]])
print("After selecting rows by index position:\n", df1)
# Output:
# After selecting rows by index position:
# Courses Fee Duration Discount
# r3 Hadoop 26000 35days 1500
# r4 Python 22000 40days 1200
# r7 Java 22000 55days 2000
2.3. Get DataFrame Rows by Index Range
When you want to select a DataFrame by the range of Indexes, provide start and stop indexes.
- By not providing a start index, iloc[] selects from the first row.
- By not providing stop, iloc[] selects all rows from the start index.
- Providing both start and stop, selects all rows in between.
# Select Rows by Integer Index Range
print(df.iloc[1:5])
print("After selecting rows by index range:\n", df1)
# Output:
# After selecting rows by index range:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1500
# r4 Python 22000 40days 1200
# r5 pandas 24000 NaN 2500
# Select First Row by Index
print(df.iloc[:1])
# Outputs:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# Select First 3 Rows
print(df.iloc[:3])
# Outputs:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1500
# Select Last Row by Index
print(df.iloc[-1:])
# Outputs:
# r7 Java 22000 55days 2000
# Courses Fee Duration Discount
# Select Last 3 Row
print(df.iloc[-3:])
# Output:
# Courses Fee Duration Discount
# r5 pandas 24000 NaN 2500
# r6 Oracle 21000 None 2100
# r7 Java 22000 55days 2000
# Selects alternate rows
print(df.iloc[::2])
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# r3 Hadoop 26000 35days 1500
# r5 Pandas 24000 NaN 2500
# r7 Java 22000 55days 2000
3. Select Rows by Index Labels using Pandas loc[]
By using pandas.DataFrame.loc[] you can get rows by index names or labels. To select the rows, the syntax is df.loc[start:stop:step]
; where start
is the name of the first-row label to take, stop
is the name of the last row label to take, and step
as the number of indices to advance after each extraction; for example, you can use it to select alternate rows. Or, use the syntax: [[labels]]
with labels as a list of row labels to take.
3.1. Get Row by Label
If you have custom index labels on DataFrame, you can use these label names to select row. For example df.loc['r2']
returns row with label ‘r2’.
# Select Row by Index Label
df1 = df.loc['r2']
print("After selecting a row by index label:\n", df1)
# Output:
# After selecting row by index label:
# Courses PySpark
# Fee 25000
# Duration 40days
# Discount 2300
# Name: r2, dtype: object
3.2. Get Multiple Rows by Label List
If you have a list of row labels, you can use this to select multiple rows from Pandas DataFrame.
# Select Rows by Index Label List
df1 = df.loc[['r2','r3','r6']]
print("After selecting rows by index label:\n", df1)
# Output:
# After selecting rows by index label:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1500
# r6 Oracle 21000 None 2100
3.3. Get Rows Between Two Labels
You can also select rows between two index labels.
# Select Rows by Label Index Range
df1 = df.loc['r1':'r5']
print("After selecting rows by index label range:\n", df1)
# Output:
# After selecting rows by index label range:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1500
# r4 Python 22000 40days 1200
# r5 Pandas 24000 NaN 2500
# Select Alternate Rows with in Index Labels
print(df.loc['r1':'r5':2])
# Outputs:
# Courses Fee Duration Discount
# r1 Spark 20000 30days 1000
# r3 Hadoop 26000 35days 1500
# r5 Pandas 24000 NaN 2500
You can get the first two rows using df.loc[:'r2']
, but this approach is not much used as you need to know the row labels hence, to select the first n rows it is recommended to use by index df.iloc[:n]
, replace n with the value you want. The same applies to get the last n rows.
4. Complete Example
import pandas as pd
import numpy as np
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Oracle","Java"],
'Fee' :[20000,25000,26000,22000,24000,21000,22000],
'Duration':['30days','40days','35days','40days',np.nan,None,'55days'],
'Discount':[1000,2300,1500,1200,2500,2100,2000]
}
index_labels=['r1','r2','r3','r4','r5','r6','r7']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Select Row by Index
print(df.iloc[2])
# Select Rows by Index List
print(df.iloc[[2,3,6]])
# Select Rows by Integer Index Range
print(df.iloc[1:5])
# Select First Row
print(df.iloc[:1])
# Select First 3 Rows
print(df.iloc[:3])
# Select Last Row
print(df.iloc[-1:])
# Select Last 3 Row
print(df.iloc[-3:])
# Selects alternate rows
print(df.iloc[::2])
# Select Row by Index Label
print(df.loc['r2'])
# Select Rows by Index Label List
print(df.loc[['r2','r3','r6']])
# Select Rows by Label Index Range
print(df.loc['r1':'r5'])
# Select Rows by Label Index Range
print(df.loc['r1':'r5'])
# Select Alternate Rows with in Index Labels
print(df.loc['r1':'r5':2])
Frequently Asked Questions on Pandas Select Rows by Index
To select a single row by index, you can use either loc[]
or iloc[]
attributes. You can select a single row using the loc[] attribute by an index label whereas the iloc[] attribute by index position. For example row = df.loc[index_label]
or row = df.iloc[index_position]
.
To select multiple rows by index, you can use either loc[]
or iloc[]
attributes. For example, df.loc[[index_label1, <code>index_label
2, index_label3]]
or row = df.iloc[[index_position1, <code>index_position2, <code>index_position1
3]
To select a range of rows by index, you can use either loc[]
or iloc[]
attributes. For example, rows = df.loc[start_indexlabel:end_indexlabel]
orrows = df.iloc[start_indexposition:end_indexposition]
You can use boolean indexing to select rows based on a condition. For example, to select rows where a column ‘column_name’ is greater than a certain value. rows = df[df['column_name'] > value]
You can use the reset_index()
method to reset the index of a DataFrame. For example, rows = df.loc[index_label].reset_index(drop=True)
5. Conclusion
In this article, you have learned how to select a single row or multiple rows from Pandas DataFrame by integer index and labels by using iloc[]
and loc[]
respectively. Using these you can also select rows by ranges, select first and last n rows, etc.
Happy Learning !!
Related Articles
- How to Get a Cell Value From Pandas DataFrame
- How to Take Column-Slices of Pandas DataFrame
- Add an Empty Column to a Pandas DataFrame
- Drop Rows with NaN Values in Pandas DataFrame
- Combine Two Columns of Text in Pandas DataFrame
- Pandas Select Rows Based on Column Values
- Pandas Select Columns by Name or Index
- Pandas Select Multiple Columns in DataFrame
Very Nice.