• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:19 mins read
You are currently viewing Pandas Select Rows by Index (Position/Label)

Use Pandas DataFrame.iloc[] & DataFrame.loc[] to select rows by integer Index and by row indices respectively. iloc[] attribute can accept single index, multiple indexes from the list, indexes by a range, and many more. loc[] operator is explicitly used with labels that can accept single index labels, multiple index labels from the list, indexes by a range (between two index labels), and many more. When using iloc[] or loc[] with an index that doesn’t exist it returns an error.

Related: Filter Pandas DataFrame Rows Based on Condition

In this article, I will explain how to select rows from Pandas DataFrame by integer index and label (single & multiple rows), by the range, and by selecting first and last n rows with several examples. loc[] & iloc[] attributes are also used to select columns from Pandas DataFrame and refer to related articles on how to get cell value from Pandas DataFrame.

1. Quick Examples of Select Rows by Index Position & Labels

If you are in a hurry, below are some quick examples of how to select a row of Pandas DataFrame by index.


# Below are the quick examples.

# Select Rows by Integer Index
df2 = df.iloc[2]     # Select Row by Index
df2 = df.iloc[[2,3,6]]    # Select Rows by Index List
df2 = df.iloc[1:5]   # Select Rows by Integer Index Range
df2 = df.iloc[:1]    # Select First Row
df2 = df.iloc[:3]    # Select First 3 Rows
df2 = df.iloc[-1:]   # Select Last Row
df2 = df.iloc[-3:]   # Select Last 3 Row
df2 = df.iloc[::2]   # Selects alternate rows

# Select Rows by Index Labels
df2 = df.loc['r2']          # Select Row by Index Label
df2 = df.loc[['r2','r3','r6']]    # Select Rows by Index Label List
df2 = df.loc['r1':'r5']     # Select Rows by Label Index Range
df2 = df.loc['r1':'r5']     # Select Rows by Label Index Range
df2 = df.loc['r1':'r5':2]   # Select Alternate Rows with in Index Labels

Let’s create a DataFrame with a few rows and columns and execute some examples to learn how to use an index. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30days','40days','35days','40days',np.nan,None,'55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000]
               }
index_labels=['r1','r2','r3','r4','r5','r6','r7']
df = pd.DataFrame(technologies,index=index_labels)
print("Create DataFrame:\n", df)

Yields below output.

Pandas get Rows Index

2. Select Rows by Index using Pandas iloc[]

pandas.iloc[] attribute is used for integer-location-based indexing to select rows and columns in a DataFrame. Remember index starts from 0, you can use pandas.DataFrame.iloc[] with the syntax [start:stop:step]; where start indicates the index of the first row to start, stop indicates the index of the last row to stop, and step indicates the number of indices to advance after each extraction. Or, use the syntax: [[indices]] with indices as a list of row indices to take.

2.1 Select Row by Integer Index

You can select a single row from Pandas DataFrame by integer index using df.iloc[n]. Replace n with a position you want to select.


# Select Row by Integer Index
df1 = df.iloc[2]
print("After selecting a row by index position:\n", df1)
Pandas get Rows Index

2.2. Get Multiple Rows by Index List

Sometimes you may need to get multiple rows from DataFrame by specifying indexes as a list. Certainly, you can do this. For example, df.iloc[[2,3,6]] selects rows 3, 4, and 7 as the index starts from zero.


# Select Rows by Index List
df1 = df.iloc[[2,3,6]])
print("After selecting rows by index position:\n", df1)

# Output:
# After selecting rows by index position:
#   Courses    Fee Duration  Discount
# r3  Hadoop  26000   35days      1500
# r4  Python  22000   40days      1200
# r7    Java  22000   55days      2000

2.3. Get DataFrame Rows by Index Range

When you want to select a DataFrame by the range of Indexes, provide start and stop indexes.

  • By not providing a start index, iloc[] selects from the first row.
  • By not providing stop, iloc[] selects all rows from the start index.
  • Providing both start and stop, selects all rows in between.

# Select Rows by Integer Index Range
print(df.iloc[1:5])
print("After selecting rows by index range:\n", df1)

# Output:
# After selecting rows by index range:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1500
# r4   Python  22000   40days      1200
# r5   pandas  24000      NaN      2500

# Select First Row by Index
print(df.iloc[:1])

# Outputs:
# Courses    Fee Duration  Discount
# r1   Spark  20000   30days      1000

# Select First 3 Rows
print(df.iloc[:3])

# Outputs:
#    Courses    Fee Duration  Discount
# r1    Spark  20000   30days      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1500

# Select Last Row by Index
print(df.iloc[-1:])

# Outputs:
# r7    Java  22000   55days      2000
# Courses    Fee Duration  Discount

# Select Last 3 Row
print(df.iloc[-3:])

# Output:
#   Courses    Fee Duration  Discount
# r5  pandas  24000      NaN      2500
# r6  Oracle  21000     None      2100
# r7    Java  22000   55days      2000

# Selects alternate rows
print(df.iloc[::2])

# Output:
#   Courses    Fee Duration  Discount
# r1   Spark  20000   30days      1000
# r3  Hadoop  26000   35days      1500
# r5  Pandas  24000      NaN      2500
# r7    Java  22000   55days      2000

3. Select Rows by Index Labels using Pandas loc[]

By using pandas.DataFrame.loc[] you can get rows by index names or labels. To select the rows, the syntax is df.loc[start:stop:step]; where start is the name of the first-row label to take, stop is the name of the last row label to take, and step as the number of indices to advance after each extraction; for example, you can use it to select alternate rows. Or, use the syntax: [[labels]] with labels as a list of row labels to take.

3.1. Get Row by Label

If you have custom index labels on DataFrame, you can use these label names to select row. For example df.loc['r2'] returns row with label ‘r2’.


# Select Row by Index Label
df1 = df.loc['r2']
print("After selecting a row by index label:\n", df1)

# Output:
# After selecting row by index label:
# Courses     PySpark
# Fee           25000
# Duration     40days
# Discount       2300
# Name: r2, dtype: object

3.2. Get Multiple Rows by Label List

If you have a list of row labels, you can use this to select multiple rows from Pandas DataFrame.


# Select Rows by Index Label List
df1 = df.loc[['r2','r3','r6']]
print("After selecting rows by index label:\n", df1)

# Output:
# After selecting rows by index label:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1500
# r6   Oracle  21000     None      2100

3.3. Get Rows Between Two Labels

You can also select rows between two index labels.


# Select Rows by Label Index Range
df1 = df.loc['r1':'r5']
print("After selecting rows by index label range:\n", df1)

# Output:
# After selecting rows by index label range:
#    Courses    Fee Duration  Discount
# r1    Spark  20000   30days      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1500
# r4   Python  22000   40days      1200
# r5   Pandas  24000      NaN      2500

# Select Alternate Rows with in Index Labels
print(df.loc['r1':'r5':2])

# Outputs:
#   Courses    Fee Duration  Discount
# r1   Spark  20000   30days      1000
# r3  Hadoop  26000   35days      1500
# r5  Pandas  24000      NaN      2500

You can get the first two rows using df.loc[:'r2'], but this approach is not much used as you need to know the row labels hence, to select the first n rows it is recommended to use by index df.iloc[:n], replace n with the value you want. The same applies to get the last n rows.

4. Complete Example


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","Pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30days','40days','35days','40days',np.nan,None,'55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000]
               }
index_labels=['r1','r2','r3','r4','r5','r6','r7']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Select Row by Index
print(df.iloc[2])

# Select Rows by Index List
print(df.iloc[[2,3,6]])

# Select Rows by Integer Index Range
print(df.iloc[1:5])

# Select First Row
print(df.iloc[:1])

# Select First 3 Rows
print(df.iloc[:3])

# Select Last Row
print(df.iloc[-1:])

# Select Last 3 Row
print(df.iloc[-3:])

# Selects alternate rows
print(df.iloc[::2])

# Select Row by Index Label
print(df.loc['r2'])

# Select Rows by Index Label List
print(df.loc[['r2','r3','r6']])

# Select Rows by Label Index Range
print(df.loc['r1':'r5'])

# Select Rows by Label Index Range
print(df.loc['r1':'r5'])

# Select Alternate Rows with in Index Labels
print(df.loc['r1':'r5':2])

Frequently Asked Questions on Pandas Select Rows by Index

How do I select a single row by index in a Pandas DataFrame?

To select a single row by index, you can use either loc[] or iloc[] attributes. You can select a single row using the loc[] attribute by an index label whereas the iloc[] attribute by index position. For example row = df.loc[index_label] or row = df.iloc[index_position].

How can I select multiple rows by a list of indices?

To select multiple rows by index, you can use either loc[] or iloc[] attributes. For example, df.loc[[index_label1, <code>index_label2, index_label3]] or row = df.iloc[[index_position1, <code>index_position2, <code>index_position13]

How can I select a range of rows by index?

To select a range of rows by index, you can use either loc[] or iloc[] attributes. For example, rows = df.loc[start_indexlabel:end_indexlabel] or
rows = df.iloc[start_indexposition:end_indexposition]

What if I want to select rows based on a condition?

You can use boolean indexing to select rows based on a condition. For example, to select rows where a column ‘column_name’ is greater than a certain value. rows = df[df['column_name'] > value]

How can I reset the index after selecting rows by index?

You can use the reset_index() method to reset the index of a DataFrame. For example, rows = df.loc[index_label].reset_index(drop=True)

5. Conclusion

In this article, you have learned how to select a single row or multiple rows from Pandas DataFrame by integer index and labels by using iloc[] and loc[] respectively. Using these you can also select rows by ranges, select first and last n rows, etc.

Happy Learning !!

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

This Post Has One Comment

  1. Ramesh

    Very Nice.

Comments are closed.