Use DataFrame.loc[]
and DataFrame.iloc[]
to slice the columns in pandas DataFrame where loc[]
is used with column labels/names and iloc[]
is used with column index/position. You can also use these operators to select rows from Pandas DataFrame.
In this article, I will explain how to slice/take or select a subset of a DataFrame by column labels, certain positions of the column, and by range e.t.c with examples.
Key Points –
- Use the bracket notation with the column name to slice a single column.
- Use the
loc
oriloc
accessor to slice rows based on index labels or integer positions respectively, and specify the desired columns by name or index. - Utilize boolean indexing to slice rows based on conditions and select specific columns simultaneously.
- Employ the
slice
object withiniloc
to slice both rows and columns simultaneously. - Take advantage of the
loc
andiloc
accessors to slice columns based on labels or integer positions respectively, allowing for versatile column selection.
Quick Examples of Column-Slices of Pandas
Following are quick examples of taking column slices of Pandas DataFrame.
# Quick examples of column-slices
# Example 1: Using loc[] to take column slices
# Slice selected multiple columns
df2=df.loc[:, ["Courses","Fee","Duration"]]
# Example 2: Slice random selected columns
df2=df.loc[:, ["Courses","Fee","Discount"]]
# Example 3: Slice columns by range
df2=df.loc[:,'Fee':'Discount']
df2=df.loc[:,'Duration':]
df2=df.loc[:,:'Duration']
# Example 4: Slice every alternate column
df2 = df.loc[:,::2]
# Example 5: Using iloc[] to take column slices
# Slice by selected column position
df2 = df.iloc[:,[1,3,4]]
# Example 6: Slice between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,1:4]
# Example 7: Slice From 3rd to end
df2 = df.iloc[:,2:]
# Example 8: Slice First Two Columns
df2 = df.iloc[:,:2]
To run some examples of how to slice columns in Pandas DataFrame, let’s create Pandas DataFrame using data from a dictionary.
# Create DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark"],
'Fee' :[20000,25000],
'Duration':['30days','40days'],
'Discount':[1000,2300]
}
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)
Yields below output.
Using Pandas.DataFrame.loc[] to Slice Columns by Names or Labels
By using pandas.DataFrame.loc[] you can slice columns by names or labels. To slice the columns, the syntax is df.loc[:,start:stop:step];
to slice columns by names or labels. where start
is the name of the first column to include, stop
is the name of the last column to include (exclusive), and step
is the number of indices to advance after each extraction, allowing the selection of alternate columns; for instance, you can select alternatively, employ the syntax [:, [labels]]
, where the label is a list of column.
# loc[] syntax to slice columns
df.loc[:,start:stop:step]
Slice DataFrame Columns by Labels
To slice DataFrame columns by labels or names, you only need to provide the multiple labels you want to slice as a list. Here we use the list of labels instead of the start:stop:step
approach.
# Slice Columns by labels
df1 = df.loc[:, ["Courses","Fee","Duration"]]
print("Get selection of columns by labels:\n", df1)
Yields below output.
Slice Certain Selective Columns in Pandas
If you want to select certain columns at random from a Pandas DataFrame, you can achieve this by passing the selected column names or labels as a list to the DataFrame indexing operator.
# Slice by Certain Columns
df.loc[:, ["Courses","Fee","Discount"]]
# Output:
# Courses Fee Discount
# 0 Spark 20000 1000
# 1 PySpark 25000 2300
Slice DataFrame Columns by Range
When slicing a DataFrame by the range of columns in Pandas, you can specify the start and stop column names.
- If you don’t provide a start column, it automatically loc[] selects columns from the beginning.
- If you don’t provide a stop column, loc[] selects all columns from the start label.
- By providing both start and stop column names, loc[] selects all columns in between, inclusive of both start and stop.
# Slice all columns between Fee an Discount columns
df2 = df.loc[:,'Fee':'Discount']
# Output
# Fee Duration Discount
# 0 20000 30days 1000
# 1 25000 40days 2300
# Slice start from 'Duration' column
df2 = df.loc[:,'Duration':]
# Output:
# Duration Discount
# 0 30days 1000
# 1 40days 2300
# Slice Start from beginning and end at 'Duration' column
df2 = df.loc[:,:'Duration']
# Output:
# Courses Fee Duration
# 0 Spark 20000 30days
# 1 PySpark 25000 40days
Slice Alternate Column
Similarly, to select every alternate column in a Pandas DataFrame using the loc[]
function, you can use Python’s slicing notation with a step size of 2.
# Slice every alternate column
df2 = df.loc[:,::2]
# Output:
# Courses Duration
# 0 Spark 30days
# 1 PySpark 40days
In the above examples, you use the loc[]
accessor to select all rows (:
) and every other column (::2
) starting from the first column. Then the resulting DataFrame df2
contains every other column from the original DataFrame df
.
Pandas DataFrame.iloc[] to Column Slices by Index
By using pandas.DataFrame.iloc[] you can slice DataFrame by column position/index. ; Remember index starts from 0. You can slice a DataFrame by column position/index using iloc[]
with the syntax [:,start:stop:step]
where start
indicates of the first column to include, stop
indicates of the last column to include (exclusive), and step
indicates the number of indices to advance after each extraction, enabling selection of columns at regular intervals.
Alternatively, you can use the syntax df.iloc[:,[indices]]
with indices as a list of column indices to include.
Slice Columns by Index Position
We are going to use columns by index position and retrieve slices of DataFrame. Below example retrieves Fee
,Discount
and Duration
slices of column DataFrame.
# Slice by selected column position
df1 = df.iloc[:,[1,2,3]]
print("Get selection of columns by indexes:\n", df1)
Yields below output.
Column Slices by Position Range
Like slices by column labels, slicing a DataFrame by a range of positions allows you to select a subset of rows or columns based on their positional indices.
# Slice between indexes 1 and 4 (1, 2, 3)
print(df.iloc[:,1:4])
# Output:
# Fee Duration Discount
# 0 20000 30days 1000
# 1 25000 40days 2300
# Slice From 3rd to end
print(df.iloc[:,2:])
# Output:
# Duration Discount
# 0 30days 1000
# 1 40days 2300
# Slice First Two Columns
print(df.iloc[:,:2])
# Output:
# Courses Fee
# 0 Spark 20000
# 1 PySpark 25000
In the below example, uses iloc[]
to slice the DataFrame columns from index 1 to index 4 (exclusive). The colon :
before the comma indicates that we’re selecting all rows, while 1:4
specifies the range of column indices to select. The resulting DataFrame will include columns at positions 1, 2, and 3.
Complete Example
import pandas as pd
technologies = {
'Courses':["Spark","PySpark"],
'Fee' :[20000,25000],
'Duration':['30days','40days'],
'Discount':[1000,2300],
'Tutor':['Michel','Sam']
}
df = pd.DataFrame(technologies)
print(df)
# Slice selected multiple columns
print(df.loc[:, ["Courses","Fee","Duration"]])
# Slice random selected columns
print(df.loc[:, ["Courses","Fee","Discount"]])
# Slice columns by range
print(df.loc[:,'Fee':'Discount'])
print(df.loc[:,'Duration':])
print(df.loc[:,:'Duration'])
# Slice every alternate column
print(df.loc[:,::2])
# Slice by selected column position
print(df.iloc[:,[1,3,4]])
# Slice between indexes 1 and 4 (2,3,4)
print(df.iloc[:,1:4])
# Slice from 3rd to end
print(df.iloc[:,2:])
# Slice first two columns
print(df.iloc[:,:2])
FAQ on Slice Columns in Pandas DataFrame
You can slice specific columns by name using the loc[]
accessor. Example: df.loc[:, ["column1", "column2", "column3"]]
To select columns based on their position or index in a DataFrame, you can use the iloc[]
accessor in pandas. For instance, df.iloc[:, 0:3]
selects the first three columns.
To slice every other column in a DataFrame, you can use slicing notation with a step size of 2. For instance, df.iloc[:, ::2]
selects every alternate column.
To select the last column in a DataFrame, you can use negative indexing with iloc[]
. For examples, you can use df.iloc[:,-1:]
to select the last column. The -1
index refers to the last column.
To slice columns based on a specific range of index positions in a DataFrame, you can use slicing notation within the iloc[]
accessor. For instance, df.iloc[:, 1:4]
selects columns at index positions 1, 2, and 3.
Conclusion
In this article, I have explained how to take column-slices of pandas DataFrame using loc[]
, & iloc[]
function with multiple approaches.
Happy Learning !!
Related Articles
- Pandas Join DataFrames on Columns
- Convert Index to Column in Pandas DataFrame
- Set Order of Columns in Pandas DataFrame
- Count NaN Values in Pandas DataFrame
- Pandas apply() Return Multiple Columns
- How to Change Column Name in Pandas
- Pandas Check Column Contains a Value
- Create New DataFrame By Selecting Specific Columns