Pandas DataFrame loc[] Syntax and Examples

pandas.DataFrame.loc[] is a property that is used to access a group of rows and columns by label(s) or a boolean array. Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.

pandas DataFrame loc key Points

  • loc is used to select/filter rows and columns by labels.
  • When using to select rows, you need to provide row indices label.
  • It also provides a way to select rows and columns between ranges, every alternate e.t.c

pandas iloc[] is another property of DataFrame that is used to operate on column position and row indices. For a better understanding of these two learn the differences and similarities between pandas loc[] vs iloc[]

1. pandas.DataFrame.loc[] Syntax & Usage

loc is used to select rows and columns by names/labels of pandas DataFrame. One of the main advantages of DataFrame is its ease of use. You can see this yourself when you use pandas.DataFrame.loc[] attribute to select or filter DataFrame rows or columns. This is mostly used attribute in pandas DataFrame.

pandas loc
pandas loc[]
  • START is the name of the row/column label
  • STOP is the name of the last row/column label to take, and 
  • STEP as the number of indices to advance after each extraction

Key points

  • By not providing a start row/column, loc[] selects from the beginning.
  • By not providing stop, loc[] selects all rows/columns from the start label.
  • Providing both start and stop, selects all rows/columns in between

Let’s create a DataFrame and explore how to use pandas loc[].


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Outputs
#r1    Spark  20000    30day      1000
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200
#r4   Python  22000   40days      2500
#r5   pandas  24000   60days      2000
pandas loc
pandas DataFrame loc[] usage

2. Select Single Row & Column By Label using loc[]

By using pandas loc[] you can select the rows and columns by name. This also supports selecting multiple rows and columns, records between two rows, between two columns e.t.c The below example demonstrates how to select row by label. Alternatively, you can also select rows using DataFrame.query() method

# Select Single Row by Label print(df.loc[‘r2’]) # Outputs #Courses PySpark #Fee 25000 #Duration 40days #Discount 2300 #Name: r2, dtype: object

In order to select column by label


# Select Single Column by label
print(df.loc[:, "Courses"])

#Outputs
#    Courses
#r1    Spark
#r2  PySpark
#r3   Hadoop
#r4   Python
#r5   pandas

3. Select Multiple Rows & Columns

Now, let’s see how to select multiple rows and columns by labels using DataFrame.loc[] property


# Select Multiple Rows by Label
print(df.loc[['r2','r3']])

# Outputs
#    Courses    Fee Duration  Discount
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200

Similarly to select multiple columns from pandas DataFrame.


# Select Multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Outputs
#    Courses    Fee  Discount
#r1    Spark  20000      1000
#r2  PySpark  25000      2300
#r3   Hadoop  26000      1200
#r4   Python  22000      2500
#r5   pandas  24000      2000

4. Select Between Two Rows or Columns

loc[] also supports rows and columns by range. For example all items between two rows/columns. all items starting from e.t.c. The below example selects rows in between r1 and r4.


# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Outputs
#    Courses    Fee Duration  Discount
#r1    Spark  20000    30day      1000
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200
#r4   Python  22000   40days      2500

To select between two column names. The below example selects all columns between Fee and Discount column labels.


# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Outputs
#      Fee Duration  Discount
#r1  20000    30day      1000
#r2  25000   40days      2300
#r3  26000   35days      1200
#r4  22000   40days      2500
#r5  24000   60days      2000

5. Select Alternate Rows or Columns

Similarly, by using ranges you can also select every alternate row from DataFrame.


# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Outputs
#   Courses    Fee Duration  Discount
#r1   Spark  20000    30day      1000
#r3  Hadoop  26000   35days      1200

To select alternate columns use


# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])

# Output
#      Fee  Discount
#r1  20000      1000
#r2  25000      2300
#r3  26000      1200
#r4  22000      2500
#r5  24000      2000

6. Using Conditions with pandas loc

By using loc select DataFrame rows with conditions.


# Using Conditions
print(df.loc[df['Fee'] >= 24000])

# Output
#    Courses    Fee Duration  Discount
#r2  PySpark  25000   40days      2300
#r3   Hadoop  26000   35days      1200
#r5   pandas  24000   60days      2000

7. Complete Examples of pandas DataFrame loc


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)

# Select single Row
print(df.loc['r2'])

# Select Single Column by label
print(df.loc[:, "Courses"])

# Select Multiple Rows by Label
print(df.loc[['r2','r3']])

# Select Multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])

# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])

# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])

# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])

# Using Conditions
print(df.loc[df['Fee'] >= 24000])

Conclusion

In this article, you have learned the syntax, usage, and examples of pandas DataFrame loc property. DataFrame.loc[] is label-based to select rows and/or columns in pandas. It accepts single label, multiple labels from the list, by a range (between two indexes labels), and many more.

Happy Learning !!

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas DataFrame loc[] Syntax and Examples