Pandas Select Rows by Index (Position/Label)

Use pandas.DataFrame.iloc[] & pandas.DataFrame.loc[] to select a single row or multiple rows from DataFrame by integer Index and by index labels respectively. iloc[] operator can accept single index, multiple indexes from the list, indexes by a range, and many more. loc[] operator is explicitly used with labels that can accept single index labels, multiple index labels from the list, indexes by a range (between two indexes labels), and many more. When using .iloc[] or loc[] with an index that doesn’t exist it returns an error.

Related: Filter pandas DataFrame Rows Based on Condition

In this article, I will explain how to select rows from pandas DataFrame by integer index and label, by the range, and selecting first and last n rows with several examples. loc[] & iloc[] operators are also used to select columns from pandas DataFrame and refer related article how to get cell value from pandas DataFrame.

1. Quick Examples of Select Rows From by Index Position & Labels

If you are in a hurry, below are some quick examples of how to select a row of pandas DataFrame by index.


# Below are quick example
# Select Rows by Integer Index
df2 = df.iloc[2]     # Select Row by Index
df2 = df.iloc[[2,3,6]]    # Select Rows by Index List
df2 = df.iloc[1:5]   # Select Rows by Integer Index Range
df2 = df.iloc[:1]    # Select First Row
df2 = df.iloc[:3]    # Select First 3 Rows
df2 = df.iloc[-1:]   # Select Last Row
df2 = df.iloc[-3:]   # Select Last 3 Row
df2 = df.iloc[::2]   # Selects alternate rows

# Select Rows by Index Labels
df2 = df.loc['r2']          # Select Row by Index Label
df2 = df.loc[['r2','r3','r6']]    # Select Rows by Index Label List
df2 = df.loc['r1':'r5']     # Select Rows by Label Index Range
df2 = df.loc['r1':'r5']     # Select Rows by Label Index Range
df2 = df.loc['r1':'r5':2]   # Select Alternate Rows with in Index Labels

Let’s create a DataFrame with a few rows and columns and execute some examples to learn using an index. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30days','40days','35days','40days',np.nan,None,'55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000]
               }
index_labels=['r1','r2','r3','r4','r5','r6','r7']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

2. Use pandas.DataFrame.iloc[] to Select Rows by Integer Index

pandas iloc[] operator is an index-based to select DataFrame rows. Remember index starts from 0. You can use pandas.DataFrame.iloc[] with the syntax [start:stop:step]; where start indicates the index of the first row to start, stop indicates the index of the last row to stop, and step indicates the number of indices to advance after each extraction. Or, use the syntax: [[indices]] with indices as a list of row indices to take.

2.1 Select Row by Integer Index

You can select a single row from pandas DataFrame by integer index using df.iloc[n]. Replace n with a position you wanted to select.


# Select Row by Integer Index
print(df.iloc[2])
# Outputs
#Courses     Hadoop
#Fee          26000
#Duration    35days
#Discount      1500
#Name: r3, dtype: object

2.2. Select Multiple Rows by Index List

Sometimes you may need to select multiple rows from DataFrame by specifies indexes as a list. Certainly, you can do this. For example df.iloc[[2,3,6]] selects rows 3, 4 and 7 as index starts from zero.


# Select Rows by Index List
print(df.iloc[[2,3,6]])
# Outputs
#   Courses    Fee Duration  Discount
#r3  Hadoop  26000   35days      1500
#r4  Python  22000   40days      1200
#r7    Java  22000   55days      2000

2.3. Select DataFrame Rows by Index Range

When you wanted to select a DataFrame by the range of Indexes, provide start and stop indexes.

  • By not providing a start index, iloc[] selects from the first row.
  • By not providing stop, iloc[] selects all rows from the start index.
  • Providing both start and stop, selects all rows in between.

# Select Rows by Integer Index Range
print(df.iloc[1:5])
# Output
#    Courses    Fee Duration  Discount
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1500
#r4   Python  22000   40days      1200
#r5   pandas  24000      NaN      2500

# Select First Row by Index
print(df.iloc[:1])
# Outputs
#   Courses    Fee Duration  Discount
#r1   Spark  20000   30days      1000

# Select First 3 Rows
print(df.iloc[:3])
# Outputs
#    Courses    Fee Duration  Discount
#r1    Spark  20000   30days      1000
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1500

# Select Last Row by Index
print(df.iloc[-1:])
# Outputs
#r7    Java  22000   55days      2000
#   Courses    Fee Duration  Discount

# Select Last 3 Row
print(df.iloc[-3:])
# Outputs
#   Courses    Fee Duration  Discount
#r5  pandas  24000      NaN      2500
#r6  Oracle  21000     None      2100
#r7    Java  22000   55days      2000

# Selects alternate rows
print(df.iloc[::2])
# Output
#   Courses    Fee Duration  Discount
#r1   Spark  20000   30days      1000
#r3  Hadoop  26000   35days      1500
#r5  pandas  24000      NaN      2500
#r7    Java  22000   55days      2000

3. Use pandas.DataFrame.loc[] to Select Rows by Index Labels

By using pandas.DataFrame.loc[] you can select rows by index names or labels. To select the rows, the syntax is df.loc[start:stop:step]; where start is the name of the first-row label to take, stop is the name of the last row label to take, and step as the number of indices to advance after each extraction; for example, you can use it to select alternate rows. Or, use the syntax: [[labels]] with labels as a list of row labels to take.

3.1. Select Row by Label

If you have custom index labels on DataFrame, you can use these label names to select row. For example df.loc['r2'] returns row with label ‘r2’.


# Select Row by Index Label
print(df.loc['r2'])
# Outputs
#Courses     PySpark
#Fee           25000
#Duration     40days
#Discount       2300
#Name: r2, dtype: object

3.2. Select Multiple Rows by Label List

If you have a list of row labels, you can use this to select multiple rows from pandas DataFrame.


# Select Rows by Index Label List
print(df.loc[['r2','r3','r6']])
# Outputs
#    Courses    Fee Duration  Discount
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1500
#r6   Oracle  21000     None      2100

3.3. Select Rows by Between Two Labels

You can also select rows between two index labels.


# Select Rows by Label Index Range
print(df.loc['r1':'r5'])
# Outputs
#    Courses    Fee Duration  Discount
#r1    Spark  20000   30days      1000
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1500
#r4   Python  22000   40days      1200
#r5   pandas  24000      NaN      2500

# Select Alternate Rows with in Index Labels
print(df.loc['r1':'r5':2])
# Outputs
#   Courses    Fee Duration  Discount
#r1   Spark  20000   30days      1000
#r3  Hadoop  26000   35days      1500
#r5  pandas  24000      NaN      2500

You can get the first two rows using df.loc[:'r2'], but this approach is not much used as you need to know the row labels hence, to select the first n rows it is recommended to use by index df.loc[:n], replace n with the value you want. The same applies to get the last n rows.

4. Complete Example


import pandas as pd
import numpy as np
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30days','40days','35days','40days',np.nan,None,'55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000]
               }
index_labels=['r1','r2','r3','r4','r5','r6','r7']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Select Row by Index
print(df.iloc[2])

# Select Rows by Index List
print(df.iloc[[2,3,6]])

# Select Rows by Integer Index Range
print(df.iloc[1:5])

# Select First Row
print(df.iloc[:1])

# Select First 3 Rows
print(df.iloc[:3])

# Select Last Row
print(df.iloc[-1:])

# Select Last 3 Row
print(df.iloc[-3:])

# Selects alternate rows
print(df.iloc[::2])

# Select Row by Index Label
print(df.loc['r2'])

# Select Rows by Index Label List
print(df.loc[['r2','r3','r6']])

# Select Rows by Label Index Range
print(df.loc['r1':'r5'])

# Select Rows by Label Index Range
print(df.loc['r1':'r5'])

# Select Alternate Rows with in Index Labels
print(df.loc['r1':'r5':2])

Conclusion

In this article, you have learned how to select a single row or multiple rows from pandas DataFrame by integer index and labels Using iloc[] and loc[] respectively. Using these you can also select rows by ranges, select first and last n rows.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

Pandas Select Rows by Index (Position/Label)