Use DataFrame.loc[]
and DataFrame.iloc[]
to slice the columns in pandas DataFrame where loc[]
is used with column labels/names and iloc[]
is used with column index/position. You can also use these operators to select rows from pandas DataFrame
Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Taking column slices of DataFrame results in a new DataFrame containing only specified columns from the original DataFrame.
In this article, I will explain how to slice/take or select a subset of a DataFrame by column labels, certain positions of the column, and by range e.t.c with examples.
1. Quick Examples of Column-Slices of Pandas DataFrame
If you are in a hurry, below are some quick examples of how to take columns slices of pandas DataFrame.
# Below are quick example
# Using loc[] to take column slices
# Slice selected multiple columns
df2=df.loc[:, ["Courses","Fee","Duration"]]
# Slice random selected columns
df2=df.loc[:, ["Courses","Fee","Discount"]]
# Slice columns by range
df2=df.loc[:,'Fee':'Discount']
df2=df.loc[:,'Duration':]
df2=df.loc[:,:'Duration']
# slice every alternate column
df2 = df.loc[:,::2]
# Using iloc[] to take column slices
# Slice by selected column position
df2 = df.iloc[:,[1,3,4]]
# Slice between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,1:4]
# Slice From 3rd to end
df2 = df.iloc[:,2:]
# Slice First Two Columns
df2 = df.iloc[:,:2]
Now, let’s create a DataFrame with a few rows and columns and execute some examples of how to slice columns in pandas. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark"],
'Fee' :[20000,25000],
'Duration':['30days','40days'],
'Discount':[1000,2300]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
Courses Fee Duration Discount Tutor
0 Spark 20000 30days 1000 Michel
1 PySpark 25000 40days 2300 Sam
2. Using Pandas.DataFrame.loc[] – Slice Columns by Names or Labels
By using pandas.DataFrame.loc[]
you can slice columns by names or labels. To slice the columns, the syntax is df.loc[:,start:stop:step]
; where start
is the name of the first column to take, stop
is the name of the last column to take, and step
as the number of indices to advance after each extraction; for example, you can select alternate columns. Or, use the syntax: [:,[labels]]
with labels as a list of column names to take.
#loc[] syntax to slice columns
df.loc[:,start:stop:step]
2.1 Slice DataFrame Columns by Labels
To slice DataFrame columns by labels or names, all you need is to provide the multiple labels you wanted to slice as a list. Here we use the list of labels instead of the start:stop:step approach.
# Slice Columns by labels
df.loc[:, ["Courses","Fee","Duration"]]
#Output
# Courses Fee Duration
#0 Spark 20000 30days
#1 PySpark 25000 40days
2.2 Slice Certain Selective Columns in pandas
Sometimes you may want to select random certain columns from pandas DataFrame, you can do this by passing selected column names/labels as a list.
# Slice by Certain Columns
df.loc[:, ["Courses","Fee","Discount"]]
#Output
# Courses Fee Discount
#0 Spark 20000 1000
#1 PySpark 25000 2300
2.3 Slice DataFrame Columns by Range
When you wanted to slice a DataFrame by the range of columns, provide start and stop column names.
- By not providing a start column, loc[] selects from the beginning.
- By not providing stop, loc[] selects all columns from the start label.
- Providing both start and stop, selects all columns in between.
# Slice all columns between Fee an Discount columns
df2 = df.loc[:,'Fee':'Discount']
#Output
# Fee Duration Discount
#0 20000 30days 1000
#1 25000 40days 2300
# Slice start from 'Duration' column
df2 = df.loc[:,'Duration':]
#Output
# Duration Discount Tutor
#0 30days 1000 Michel
#1 40days 2300 Sam
# Slice Start from beginning and end at 'Duration' column
df2 = df.loc[:,:'Duration']
#Output
# Courses Fee Duration
#0 Spark 20000 30days
#1 PySpark 25000 40days
2.4 Select Every Alternate Column
Using loc[]
, you can also slice columns by selecting every other column from pandas DataFrame.
# Slice every alternate column
df2 = df.loc[:,::2]
#Output
# Courses Duration Tutor
#0 Spark 30days Michel
#1 PySpark 40days Sam
3. Pandas DataFrame.iloc[] – Column Slices by Index or Position
By using pandas.DataFrame.iloc[]
you can slice DataFrame by column position/index. ; Remember index starts from 0. You can use pandas.DataFrame.iloc[]
with the syntax [:,start:stop:step]
where start
indicates the index of the first column to take, stop
indicates the index of the last column to take, and step
indicates the number of indices to advance after each extraction. Or, use the syntax: [:,[indices]]
with indices as a list of column indices to take.
3.1. Slice Columns by Index Position
We are going to use columns by index position, and retrieve slices of DataFrame. Below example retrieves "Fee"
,"Discount"
and "Duration"
slices of column DataFrame.
#slice by selected column position
print(df.iloc[:,[1,3,4]])
#Output
# Fee Discount Tutor
#0 20000 1000 Michel
#1 25000 2300 Sam
3.2 Column Slices by Position Range
Like slices by column labels, you can also slice a DataFrame by a range of positions.
# Slice between indexes 1 and 4 (2,3,4)
print(df.iloc[:,1:4])
# Fee Duration Discount
#0 20000 30days 1000
#1 25000 40days 2300
# Slice From 3rd to end
print(df.iloc[:,2:])
# Duration Discount Tutor
#0 30days 1000 Michel
#1 40days 2300 Sam
# Slice First Two Columns
print(df.iloc[:,:2])
# Courses Fee
#0 Spark 20000
#1 PySpark 25000
To get the last column use df.iloc[:,-1:]
and to get just first column df.iloc[:,:1]
4. Complete Example To Take Column-Slices From DataFrame
Below is a complete example of how to take column slices from pandas DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark"],
'Fee' :[20000,25000],
'Duration':['30days','40days'],
'Discount':[1000,2300],
'Tutor':['Michel','Sam']
}
df = pd.DataFrame(technologies)
print(df)
# Slice selected multiple columns
print(df.loc[:, ["Courses","Fee","Duration"]])
# Slice random selected columns
print(df.loc[:, ["Courses","Fee","Discount"]])
# Slice columns by range
print(df.loc[:,'Fee':'Discount'])
print(df.loc[:,'Duration':])
print(df.loc[:,:'Duration'])
# slice every alternate column
print(df.loc[:,::2])
#slice by selected column position
print(df.iloc[:,[1,3,4]])
# Slice between indexes 1 and 4 (2,3,4)
print(df.iloc[:,1:4])
# Slice From 3rd to end
print(df.iloc[:,2:])
# Slice First Two Columns
print(df.iloc[:,:2])
Conclusion
In this article, you have learned how to take column-slices of pandas DataFrame using DataFrame.loc[]
, and DataFrame.iloc[]
function. loc[] is used with labels
Happy Learning !!
Related Articles
- How to Add an Empty Column to a Pandas DataFrame
- How to Combine Two Series into pandas DataFrame
- Install pandas on Windows Step-by-Step
- Convert Index to Column in Pandas DataFrame
- Replace NaN Values with Zeroes in a Column of a Pandas DataFrame
- Count NaN Values in Pandas DataFrame
- Pandas Check Column Contains a Value in DataFrame