• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:14 mins read
You are currently viewing Pandas DataFrame loc[] Syntax and Examples

pandas.DataFrame.loc[] is a property that is used to access a group of rows and columns by label(s) or a boolean array. Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.

pandas DataFrame loc key Points

  • loc is used to select/filter rows and columns by labels.
  • When using to select rows, you need to provide row indices label.
  • It also provides a way to select rows and columns between ranges, every alternate e.t.c

pandas iloc[] is another property of DataFrame that is used to operate on column position and row indices. For a better understanding of these two learn the differences and similarities between pandas loc[] vs iloc[]

1. pandas.DataFrame.loc[] Syntax & Usage

loc is used to select rows and columns by names/labels of pandas DataFrame. One of the main advantages of DataFrame is its ease of use. You can see this yourself when you use pandas.DataFrame.loc[] attribute to select or filter DataFrame rows or columns. This is mostly used attribute in pandas DataFrame.

pandas loc
pandas loc[]
  • START is the name of the row/column label
  • STOP is the name of the last row/column label to take, and 
  • STEP as the number of indices to advance after each extraction

Key points

  • By not providing a start row/column, loc[] selects from the beginning.
  • By not providing stop, loc[] selects all rows/columns from the start label.
  • Providing both start and stop, selects all rows/columns in between

Let’s create a DataFrame and explore how to use pandas loc[].


# Pandas.DataFrame.loc[] Syntax & Usage
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Output:
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500
# r5   pandas  24000   60days      2000
pandas loc
pandas DataFrame loc[] usage

2. Select Single Row & Column By Label using loc[]

By using pandas loc[] you can select the rows and columns by name. This also supports selecting multiple rows and columns, records between two rows, between two columns e.t.c The below example demonstrates how to select row by label. Alternatively, you can also select rows using DataFrame.query() method


# Select Single Row by Label
print(df.loc['r2'])

# Output:
# Courses     PySpark
# Fee           25000
# Duration     40days
# Discount       2300
# Name: r2, dtype: object

In order to select column by label.


# Select Single Column by label
print(df.loc[:, "Courses"])

# Output:
#    Courses
# r1    Spark
# r2  PySpark
# r3   Hadoop
# r4   Python
# r5   pandas

3. Select Multiple Rows & Columns

Now, let’s see how to select multiple rows and columns by labels using DataFrame.loc[] property


# Select Multiple Rows by Label
print(df.loc[['r2','r3']])

# Output:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200

Similarly to select multiple columns from pandas DataFrame.


# Select Multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Output:
#    Courses    Fee  Discount
# r1    Spark  20000      1000
# r2  PySpark  25000      2300
# r3   Hadoop  26000      1200
# r4   Python  22000      2500
# r5   pandas  24000      2000

4. Select Between Two Rows or Columns

loc[] also supports rows and columns by range. For example all items between two rows/columns. all items starting from e.t.c. The below example selects rows in between r1 and r4.


# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Output:
#    Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500

To select between two column names. The below example selects all columns between Fee and Discount column labels.


# Select Columns between two Labels
# Include both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Output:
#      Fee Duration  Discount
# r1  20000    30day      1000
# r2  25000   40days      2300
# r3  26000   35days      1200
# r4  22000   40days      2500
# r5  24000   60days      2000

5. Select Alternate Rows or Columns

Similarly, by using ranges you can also select every alternate row from DataFrame.


# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Output:
#   Courses    Fee Duration  Discount
# r1   Spark  20000    30day      1000
# r3  Hadoop  26000   35days      1200

To select alternate columns use


# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])

# Output:
#      Fee  Discount
# r1  20000      1000
# r2  25000      2300
# r3  26000      1200
# r4  22000      2500
# r5  24000      2000

6. Using Conditions with pandas loc

By using loc select DataFrame rows with conditions.


# Using Conditions
print(df.loc[df['Fee'] >= 24000])

# Output:
#    Courses    Fee Duration  Discount
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r5   pandas  24000   60days      2000

7. Complete Examples of pandas DataFrame loc


# Examples of pandas DataFrame loc
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Select single Row
print(df.loc['r2'])

# Select Single Column by label
print(df.loc[:, "Courses"])

# Select Multiple Rows by Label
print(df.loc[['r2','r3']])

# Select Multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])

# Using Conditions
print(df.loc[df['Fee'] >= 24000])

Frequently Asked Questions of Pandas DataFrame Loc

What is loc[] in Pandas DataFrame?

loc[] is a label-based indexing attribute in Pandas that is used to access a particular selection of rows and columns by labels.

How is loc[] different from iloc[]?

While loc[] is label-based, meaning you can specify the index and column names, iloc[] is integer-location-based, and uses integer indices to access data.

How to use loc[] to select specific rows and columns?

You can use loc[] to select specific rows and columns by providing row and column labels as arguments. For example, df.loc[[1, 2, 3], ['column1', 'column2']]

What happens if a label is not present in the DataFrame using loc[]?

If a label is not present, loc raises a KeyError. Make sure the labels exist in your DataFrame.

How can I use slicing with loc[]?

You can use slicing with loc[] for both rows and columns. For example, df.loc[1:5, 'column1':'column3']

Conclusion

In this article, you have learned the syntax, usage, and examples of pandas DataFrame loc property. DataFrame.loc[] is label-based to select rows and/or columns in pandas. It accepts a single label, multiple labels from the list, by a range (between two index labels), and many more.

Happy Learning !!

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium