pandas.DataFrame.loc[]
is a property that is used to access a group of rows and columns by label(s) or a boolean array. The Pandas DataFrame represents a two-dimensional tabular data structure with labeled axes, encompassing columns and rows. When selecting columns from a DataFrame, it generates a new DataFrame containing only the specified selected columns from the original DataFrame.
Key Points –
- Pandas DataFrame
loc[]
is a label-based indexer used for selecting rows and columns by label. - It allows you to select data using row and column labels, rather than integer-based indexing.
- You can use boolean arrays with
loc[]
to filter rows based on conditions. loc[]
can be used not only for accessing data but also for setting values in specific rows and columns of the DataFrame.
pandas.DataFrame.loc[] Syntax & Usage
loc is used to select rows and columns by names/labels of pandas DataFrame. One of the main advantages of DataFrame is its ease of use. You can see this yourself when you use pandas.DataFrame.loc[]
attribute to select or filter DataFrame rows or columns. This is mostly used attribute in pandas DataFrame.
START
denotes the label of the initial row or column.STOP
represents the label of the final row or column to include, andSTEP
defines the count of indices to progress after each extraction.
Key points
- By not providing a start row/column, loc[] selects from the beginning.
- When stop is not provided, loc[] selects all rows/columns starting from the specified label.
- When both start and stop are provided,
loc[]
selects all rows/columns in between them.
First, let’s create a Pandas DataFrame.
# Pandas.DataFrame.loc[] Syntax & Usage
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30day','40days','35days','40days','60days'],
'Discount':[1000,2300,1200,2500,2000]
}
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Output:
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r4 Python 22000 40days 2500
# r5 pandas 24000 60days 2000
Select Single Row & Column By Label using loc[]
By using pandas loc[]
you can select the rows and columns by name. This also supports selecting multiple rows and columns, records between two rows, between two columns e.t.c The below example demonstrates how to select row by label. Alternatively, you can also select rows using DataFrame.query() method
# Select Single Row by Label
print(df.loc['r2'])
# Output:
# Courses PySpark
# Fee 25000
# Duration 40days
# Discount 2300
# Name: r2, dtype: object
In order to select column by label.
# Select Single Column by label
print(df.loc[:, "Courses"])
# Output:
# Courses
# r1 Spark
# r2 PySpark
# r3 Hadoop
# r4 Python
# r5 pandas
Select Multiple Rows & Columns
Now, let’s see how to select multiple rows and columns by labels using DataFrame.loc[] property
# Select Multiple Rows by Label
print(df.loc[['r2','r3']])
# Output:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
Similarly, to select multiple columns from a Pandas DataFrame, you can use indexing or the loc[]
or iloc[]
methods.
# Select Multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])
# Output:
# Courses Fee Discount
# r1 Spark 20000 1000
# r2 PySpark 25000 2300
# r3 Hadoop 26000 1200
# r4 Python 22000 2500
# r5 pandas 24000 2000
Select Rows Between Two Index Labels
loc[]
also supports rows and columns by range. For example all items between two rows/columns. all items starting from e.t.c. The below example selects rows in between r1
and r4
.
# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r4 Python 22000 40days 2500
To select between two column names. The below example selects all columns between Fee
and Discount
column labels.
# Select Columns between two Labels
# Include both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])
# Output:
# Fee Duration Discount
# r1 20000 30day 1000
# r2 25000 40days 2300
# r3 26000 35days 1200
# r4 22000 40days 2500
# r5 24000 60days 2000
Select Alternate Rows or Columns
Similarly, by using ranges you can also select every alternate row from DataFrame.
# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r3 Hadoop 26000 35days 1200
To select alternate columns between two labels, you can use the loc[]
function with slicing.
# Select alternate columns between two labels
print(df.loc[:,'Fee':'Discount':2])
# Output:
# Fee Discount
# r1 20000 1000
# r2 25000 2300
# r3 26000 1200
# r4 22000 2500
# r5 24000 2000
Using Conditions with Pandas loc
Using conditions with Pandas loc[]
allows you to filter rows based on specific criteria.
# Using Conditions
print(df.loc[df['Fee'] >= 24000])
# Output:
# Courses Fee Duration Discount
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r5 pandas 24000 60days 2000
Complete Examples of Pandas DataFrame loc
# Examples of pandas DataFrame loc
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30day','40days','35days','40days','60days'],
'Discount':[1000,2300,1200,2500,2000]
}
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies,index=index_labels)
print(df)
# Select single Row
print(df.loc['r2'])
# Select Single Column by label
print(df.loc[:, "Courses"])
# Select Multiple Rows by Label
print(df.loc[['r2','r3']])
# Select Multiple Columns by labels
print(df.loc[:, ["Courses","Fee","Discount"]])
# Select Rows Between two Index Labels
# Includes both r1 and r4 rows
print(df.loc['r1':'r4'])
# Select Columns between two Labels
# Includes both 'Fee' and 'Discount' columns
print(df.loc[:,'Fee':'Discount'])
# Select Alternate rows By indeces
print(df.loc['r1':'r4':2])
# Select Alternate Columns between two Labels
print(df.loc[:,'Fee':'Discount':2])
# Using Conditions
print(df.loc[df['Fee'] >= 24000])
Frequently Asked Questions of DataFrame Loc[]
In Pandas DataFrame, loc[]
is a method used for selecting rows and columns by label(s). It allows you to access a group of rows and columns by specifying the labels of rows and columns. You can use it to slice and retrieve specific subsets of data from a DataFrame based on their row and column labels.
While loc[]
is label-based, meaning you can specify the index and column names, iloc[]
is integer-location-based, and uses integer indices to access data.
You can use loc[]
to select specific rows and columns by providing row and column labels as arguments. For example, df.loc[[1, 2, 3], ['column1', 'column2']]
If a label is not present, loc
raises a KeyError
. Make sure the labels exist in your DataFrame.
You can use slicing with loc[]
for both rows and columns. For example, df.loc[1:5, 'column1':'column3']
Conclusion
In this article, you have learned the syntax, usage, and examples of the pandas DataFrame loc[]
property. DataFrame.loc[]
operates on labels to extract rows and/or columns in Pandas. It can accept a single label, multiple labels from a list, a range (between two index labels), and more.
Happy Learning !!
Related Articles
- Select Rows by Index (Position/Label)
- Pandas DataFrame fillna() function
- Pandas loc[] attribute multiple conditions
- Pandas Series loc[] attribute
- Pandas Create Conditional Column in DataFrame
- Pandas Difference Between Two DataFrames
- Drop Single & Multiple Columns From Pandas DataFrame
- How to Get Column Names as List From Pandas DataFrame