Pandas DataFrame loc[] Syntax and Examples

pandas.DataFrame.loc[] is a property that is used to access a group of rows and columns by label(s) or a boolean array. The Pandas DataFrame represents a two-dimensional tabular data structure with labeled axes, encompassing columns and rows. When selecting columns from a DataFrame, it generates a new DataFrame containing only the specified selected columns from the original DataFrame.

pandas.DataFrame.loc[] Syntax & Usage

loc is used to select rows and columns by names/labels of pandas DataFrame. One of the main advantages of DataFrame is its ease of use. You can see this yourself when you use pandas.DataFrame.loc[] attribute to select or filter DataFrame rows or columns. This is mostly used attribute in pandas DataFrame.

START denotes the label of the initial row or column.
STOP represents the label of the final row or column to include, and
STEP defines the count of indices to progress after each extraction.

Key points

By not providing a start row/column, loc[] selects from the beginning.
When stop is not provided, loc[] selects all rows/columns starting from the specified label.
When both start and stop are provided, loc[] selects all rows/columns in between them.

First, let’s create a Pandas DataFrame.


# Pandas.DataFrame.loc[] Syntax & Usage
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Output:
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500
# r5   pandas  24000   60days      2000

Select Single Row & Column By Label using loc[]

By using pandas loc[] you can select the rows and columns by name. This also supports selecting multiple rows and columns, records between two rows, between two columns e.t.c The below example demonstrates how to select row by label. Alternatively, you can also select rows using DataFrame.query() method


# Select Single Row by Label
print(df.loc['r2'])

# Output:
# Courses     PySpark
# Fee           25000
# Duration     40days
# Discount       2300
# Name: r2, dtype: object

In order to select column by label.


# Select Single Column by label
print(df.loc[:, "Courses"])

# Output:
#    Courses
# r1    Spark
# r2  PySpark
# r3   Hadoop
# r4   Python
# r5   pandas

Select Multiple Rows & Columns

Now, let’s see how to select multiple rows and columns by labels using DataFrame.loc[] property


# Select Multiple Rows by Label
print(df.loc[['r2','r3']])

# Output:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200

Similarly, to select multiple columns from a Pandas DataFrame, you can use indexing or the loc[] or iloc[] methods.


# Select Multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Output:
#    Courses    Fee  Discount
# r1    Spark  20000      1000
# r2  PySpark  25000      2300
# r3   Hadoop  26000      1200
# r4   Python  22000      2500
# r5   pandas  24000      2000

Select Rows Between Two Index Labels

loc[] also supports rows and columns by range. For example all items between two rows/columns. all items starting from e.t.c. The below example selects rows in between r1 and r4.


# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Output:
#    Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500

To select between two column names. The below example selects all columns between Fee and Discount column labels.


# Select Columns between two Labels
# Include both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Output:
#      Fee Duration  Discount
# r1  20000    30day      1000
# r2  25000   40days      2300
# r3  26000   35days      1200
# r4  22000   40days      2500
# r5  24000   60days      2000

Select Alternate Rows or Columns

Similarly, by using ranges you can also select every alternate row from DataFrame.


# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Output:
#   Courses    Fee Duration  Discount
# r1   Spark  20000    30day      1000
# r3  Hadoop  26000   35days      1200

To select alternate columns between two labels, you can use the loc[] function with slicing.


# Select alternate columns between two labels
print(df.loc[:,'Fee':'Discount':2])

# Output:
#      Fee  Discount
# r1  20000      1000
# r2  25000      2300
# r3  26000      1200
# r4  22000      2500
# r5  24000      2000

Using Conditions with Pandas loc

Using conditions with Pandas loc[] allows you to filter rows based on specific criteria.


# Using Conditions
print(df.loc[df['Fee'] >= 24000])

# Output:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r5   pandas  24000   60days      2000

Complete Examples of Pandas DataFrame loc


# Examples of pandas DataFrame loc
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Select single Row
print(df.loc['r2'])

# Select Single Column by label
print(df.loc[:, "Courses"])

# Select Multiple Rows by Label
print(df.loc[['r2','r3']])

# Select Multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])

# Using Conditions
print(df.loc[df['Fee'] >= 24000])

Frequently Asked Questions of DataFrame Loc[]

What is loc[] in Pandas DataFrame?

In Pandas DataFrame, loc[] is a method used for selecting rows and columns by label(s). It allows you to access a group of rows and columns by specifying the labels of rows and columns. You can use it to slice and retrieve specific subsets of data from a DataFrame based on their row and column labels.

How is loc[] different from iloc[]?

While loc[] is label-based, meaning you can specify the index and column names, iloc[] is integer-location-based, and uses integer indices to access data.

How to use loc[] to select specific rows and columns?

You can use loc[] to select specific rows and columns by providing row and column labels as arguments. For example, df.loc[[1, 2, 3], ['column1', 'column2']]

What happens if a label is not present in the DataFrame using loc[]?

If a label is not present, loc raises a KeyError. Make sure the labels exist in your DataFrame.

How can I use slicing with loc[]?

You can use slicing with loc[] for both rows and columns. For example, df.loc[1:5, 'column1':'column3']

Conclusion

In this article, you have learned the syntax, usage, and examples of the pandas DataFrame loc[] property. DataFrame.loc[] operates on labels to extract rows and/or columns in Pandas. It can accept a single label, multiple labels from a list, a range (between two index labels), and more.

Happy Learning !!

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html