By using df[], loc[], iloc[] and get() you can select multiple columns from pandas DataFrame. When working with a table-like structure we are often required to retrieve the data from columns. Similar to SQL, selecting multiple columns in pandas DataFrame is one of the most frequently performed tasks while manipulating data. Pandas provide several techniques to efficiently retrieve subsets of data from your DataFrame.
Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows. Selecting multiple columns from DataFrame results in a new DataFrame containing only specified selected columns from the original DataFrame.
loc[]
is used to select columns by name and iloc[]
is used to select columns by index. You can also use these operators to select rows from pandas DataFrame. Also, refer to a related article how to get cell value from pandas DataFrame.
1. Quick Examples of Select Multiple Columns in pandas
If you are in a hurry, below are some quick examples of how to select multiple columns from pandas DataFrame by column name and index.
# Below are the quick examples
# By using df[] Notation to select multiple columns
df2 = df[["Courses","Fee","Duration"]]
# Using loc[] to take column slices
df2 = df.loc[:, ["Courses","Fee","Duration"]] # Selecte multiple columns
df2 = df.loc[:, ["Courses","Fee","Discount"]] # Select Random columns
df2 = df.loc[:,'Fee':'Discount'] # Select columns between two columns
df2 = df.loc[:,'Duration':] # Select columns by range
df2 = df.loc[:,:'Duration'] # Select columns by range
df2 = df.loc[:,::2] # Select every alternate column
# Using iloc[] to select column by Index
df2 = df.iloc[:,[1,3,4]] # Select columns by Index
df2 = df.iloc[:,1:4] # Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,2:] # Select From 3rd to end
df2 = df.iloc[:,:2] # Select First Two Columns
# Using get()
df2 = df.get(['Courses','Fee'])
Now, let’s run these examples by creating the sample DataFrame with a few rows and columns. Our DataFrame contains column names Courses
, Fee
, Duration
, and Discount
.
# Create DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark"],
'Fee' :[20000,25000],
'Duration':['30days','40days'],
'Discount':[1000,2300]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Courses Fee Duration Discount Tutor
0 Spark 20000 30days 1000 Michel
1 PySpark 25000 40days 2300 Sam
2. Using df[] & loc[] to Select Multiple Columns by Name
By using df[]
& pandas.DataFrame.loc[] you can select multiple columns by names or labels. To select the columns by names, the syntax is df.loc[:,start:stop:step]
; where start
is the name of the first column to take, stop
is the name of the last column to take, and step
as the number of indices to advance after each extraction; for example, you can select alternate columns. Or, use the syntax: [:,[labels]]
with labels as a list of column names to take.
# loc[] syntax to slice columns
df.loc[:,start:stop:step]
2.1 Select Columns Using df[] Notation
To select single or multiple columns by labels or names, all you need is to provide the names of the columns as a list. Here we use the []
notation instead of df.loc[,start:stop:step]
approach.
# Select Columns by labels
df2 = df[["Courses","Fee","Duration"]]
print(df2)
# Output:
# Courses Fee Duration
# 0 Spark 20000 30days
# 1 PySpark 25000 40days
2.2 Select Multiple Columns using df.loc[]
Sometimes you may want to select multiple columns from pandas DataFrame, you can do this by passing multiple column names/labels as a list.
# Select Multiple Columns
df2 = df.loc[:, ["Courses","Fee","Discount"]]
print(df2)
# Output:
# Courses Fee Discount
# 0 Spark 20000 1000
# 1 PySpark 25000 2300
2.3 Select DataFrame Columns by Range
When you wanted to select columns by the range, provide start and stop column names.
- By not providing a start column, loc[] selects from the beginning.
- By not providing stop, loc[] selects all columns from the start label.
- Providing both start and stop, selects all columns in between.
# Select all columns between Fee an Discount columns
df2 = df.loc[:,'Fee':'Discount']
print(df2)
# Output:
# Fee Duration Discount
# 0 20000 30days 1000
# 1 25000 40days 2300
# Select from 'Duration' column
df2 = df.loc[:,'Duration':]
print(df2)
# Output:
# Duration Discount Tutor
# 0 30days 1000 Michel
# 1 40days 2300 Sam
# Select from beginning and end at 'Duration' column
df2 = df.loc[:,:'Duration']
print(df2)
# Output:
# Courses Fee Duration
# 0 Spark 20000 30days
# 1 PySpark 25000 40days
2.4 Select Every Other Column
Using loc[], you can also select every alternate column from pandas DataFrame.
# Select every alternate column
df2 = df.loc[:,::2]
print(df2)
# Output:
# Courses Duration Tutor
# 0 Spark 30days Michel
# 1 PySpark 40days Sam
3. Using iloc[] – Select Multiple Columns by Index
By using pandas.DataFrame.iloc[] you can also select multiple columns from DataFrame by position/index. ; Remember index starts from 0. You can use
pandas.DataFrame.iloc[] with the syntax [:,start:stop:step]
where start
indicates the index of the first column to take, stop
indicates the index of the last column to take, and step
indicates the number of indices to advance after each extraction. Or, use the syntax: [:,[indices]]
with indices as a list of column indices to take.
3.1. Select Multiple Columns using df.iloc[]
Below example retrieves "Fee"
,"Discount"
and "Duration"
and returns a new DataFrame with the columns selected.
# Selected by column position
df2 = df.iloc[:,[1,3,4]]
print(df2)
# Output:
# Fee Discount Tutor
# 0 20000 1000 Michel
# 1 25000 2300 Sam
3.2 Select Columns by Position Range
You can also slice a DataFrame by a range of positions.
# Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,1:4]
print(df2)
# Output:
# Fee Duration Discount
# 0 20000 30days 1000
# 1 25000 40days 2300
# Select From 3rd to end
df2 = df.iloc[:,2:]
# Returns
# Output:
# Duration Discount Tutor
# 0 30days 1000 Michel
# 1 40days 2300 Sam
# Select First Two Columns
df2 = df.iloc[:,:2]
print(df2)
# Output:
# Courses Fee
# 0 Spark 20000
# 1 PySpark 25000
To get the last column use df.iloc[:,-1:]
and to get just the first column df.iloc[:,:1]
4. Using filter() & get() methods
You can also select multiple columns by using DataFrame.filter() method.
# Using DataFrame.filter() to select multiple columns
df2 = df.filter(['Courses','Fee'])
print(df2)
# Using DataFrame.get() method
df2 = df.get(['Courses','Fee'])
print(df2)
5. Complete Example of Selecting Multiple Columns
Below is a complete example of how to select multiple columns from pandas DataFrame.
import pandas as pd
technologies = {
'Courses':["Spark","PySpark"],
'Fee' :[20000,25000],
'Duration':['30days','40days'],
'Discount':[1000,2300],
'Tutor':['Michel','Sam']
}
df = pd.DataFrame(technologies)
print(df)
# Select multiple columns
print(df[["Courses","Fee","Duration"]])
# Select Random columns
print(df.loc[:, ["Courses","Fee","Discount"]])
# Select columns by range
print(df.loc[:,'Fee':'Discount'])
print(df.loc[:,'Duration':])
print(df.loc[:,:'Duration'])
# Select every alternate column
print(df.loc[:,::2])
# Selected by column position
print(df.iloc[:,[1,3,4]])
# Select between indexes 1 and 4 (2,3,4)
print(df.iloc[:,1:4])
# Select From 3rd to end
print(df.iloc[:,2:])
# Select First Two Columns
print(df.iloc[:,:2])
6. Conclusion
In this article, you have learned how to select multiple columns from pandas DataFrame using DataFrame.loc[], and DataFrame.iloc[] functions and using get() and filter().
Happy Learning !!
Related Articles
- Pandas Select DataFrame Rows Between Two Dates
- Pandas Select Rows Based on List Index
- Pandas Select Columns by Name or Index
- Pandas Select Rows Based on Column Values
- Pandas Select Rows by Index (Position/Label)
- Pandas Create New DataFrame By Selecting Specific Columns
- Select Rows From List of Values in Pandas DataFrame
- Pandas – Select All Columns Except One Column