By using df[], loc[], iloc[], and get() you can select multiple columns from pandas DataFrame. When working with a table-like structure we are often required to retrieve the data from columns. Similar to SQL, selecting multiple columns in Pandas DataFrame is one of the most frequently performed tasks while manipulating data. Pandas provide several techniques to efficiently retrieve subsets of data from your DataFrame.
loc[]
is used to select columns by name and iloc[]
is used to select columns by index. You can also use these operators to select rows from pandas DataFrame. Also, refer to a related article how to get cell value from pandas DataFrame.
Key Points –
- Use square brackets or the
.loc
accessor with a list of column names to select multiple columns efficiently. - Alternatively, employ the
.loc
accessor with a colon to select all rows and a list of column names to select multiple columns - Utilize the
.iloc
accessor with a colon to select all rows and a list of column indices to select multiple columns. - Combine these methods with boolean indexing to select specific columns based on conditions.
Quick Examples of Select Multiple Columns
If you are in a hurry, below are some quick examples of how to select multiple columns in Pandas DataFrame.
# Quick examples of select multiple columns
# By using df[] Notation to select multiple columns
df2 = df[["Courses","Fee","Duration"]]
# Using loc[] to take column slices
df2 = df.loc[:, ["Courses","Fee","Duration"]] # Selecte multiple columns
df2 = df.loc[:, ["Courses","Fee","Discount"]] # Select Random columns
df2 = df.loc[:,'Fee':'Discount'] # Select columns between two columns
df2 = df.loc[:,'Duration':] # Select columns by range
df2 = df.loc[:,:'Duration'] # Select columns by range
df2 = df.loc[:,::2] # Select every alternate column
# Using iloc[] to select column by Index
df2 = df.iloc[:,[1,3,4]] # Select columns by Index
df2 = df.iloc[:,1:4] # Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,2:] # Select From 3rd to end
df2 = df.iloc[:,:2] # Select First Two Columns
# Using get()
df2 = df.get(['Courses','Fee'])
To run some examples of selecting multiple columns in pandas DataFrame. Now, let’s run these examples by creating the sample DataFrame with a few rows and columns.
# Create DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark"],
'Fee' :[20000,25000],
'Duration':['30days','40days'],
'Discount':[1000,2300]
}
df = pd.DataFrame(technologies)
print("Create DataFrame\n",df)
Yields below output.
Using df[] & loc[] to Select Multiple Columns by Name
By using df[]
& pandas.DataFrame.loc[] you can select multiple columns by names or labels. To select the columns by name, you can use the syntax [:, start:stop:step]
where start
is the name of the first column to include, stop
is the name of the last column to include, and step
determines the number of indices to advance after each extraction, allowing for selecting alternate columns.
Another syntax available with pandas.DataFrame.loc[]
is [:, [labels]]
, where a label is a list of column names to include. This approach provides flexibility in selecting multiple columns by names or labels.
# loc[] syntax to slice columns
df.loc[:,start:stop:step]
Select Columns Using df[] Notation
The []
notation in Pandas DataFrame allows for a concise selection of single or multiple columns by their labels or names simply by providing the names of the columns as a list within the brackets.
# Select Columns by labels
df2 = df[["Courses","Fee","Duration"]]
print("Selecting specific columns by labels:\n",df2)
In the above examples, Select the columns with labels Courses
, Fee
, and Duration
from the original DataFrame df
. The result is a new DataFrame (df2
) containing only these columns.
Select Multiple Columns using df.loc[]
You can select multiple columns from a pandas DataFrame by passing a list of column names or labels as an argument.
# Select Multiple Columns
df2 = df.loc[:, ["Courses","Fee","Discount"]]
print(df2)
# Output:
# Courses Fee Discount
# 0 Spark 20000 1000
# 1 PySpark 25000 2300
In the above example, select the columns labeled Courses
, Fee
,and Discount
from the original DataFrame df
. The :
before the comma indicates that you want to select all rows for these columns.
Select DataFrame Columns by Range
When selecting columns by range using the loc[] indexer in Pandas, it’s essential to provide both the start and stop column names.
- When you don’t provide a start column, loc[] selects columns from the beginning.
- If you don’t provide a stop column, loc[] selects all columns from the start label to the end.
- When you provide both start and stop columns, loc[] selects all columns in between those two columns, inclusive of both start and stop columns..
# Select all columns between Fee an Discount columns
df2 = df.loc[:,'Fee':'Discount']
print(df2)
# Output:
# Fee Duration Discount
# 0 20000 30days 1000
# 1 25000 40days 2300
# Select from 'Duration' column
df2 = df.loc[:,'Duration':]
print(df2)
# Output:
# Duration Discount Tutor
# 0 30days 1000 Michel
# 1 40days 2300 Sam
# Select from beginning and end at 'Duration' column
df2 = df.loc[:,:'Duration']
print(df2)
# Output:
# Courses Fee Duration
# 0 Spark 20000 30days
# 1 PySpark 25000 40days
Select Every Other Column
Using loc[], to select every alternate column in a DataFrame, you can use slicing with a step size.
# Select every alternate column
df2 = df.loc[:,::2]
print(df2)
# Output:
# Courses Duration Tutor
# 0 Spark 30days Michel
# 1 PySpark 40days Sam
The above example, uses loc[]
with slice notation to select every column with a step of 2. The :
before the comma indicates that you want to select all rows, and ::2
specifies the step, which means every alternate column.
Using iloc[] – Select Multiple Columns by Index
By using pandas.DataFrame.iloc[], you can select multiple columns from a DataFrame by their positional indices. Remember index starts from 0. You can use the syntax [:, start:stop:step]
with iloc[]
, where start
indicates the index of the first column to include, stop
indicates the index of the last column to include, step
indicates the number of indices to advance after each extraction, allowing for selecting alternate columns. Or, you can use the syntax [:, [indices]]
with iloc[]
, where indices is a list of column indices to include.
Select Multiple Columns using df.iloc[]
You can use the iloc
indexer to select specific rows and columns from a DataFrame by integer position. To select multiple columns using iloc
, you can provide a comma-separated list of column indices.
# Selected by column position
df2 = df.iloc[:,[1,3,4]]
print(df2)
# Output:
# Fee Discount Tutor
# 0 20000 1000 Michel
# 1 25000 2300 Sam
In the above example, df.iloc[:,[1,3,4]]
select all rows (:
) and the columns at positions 1, 3, and 4. The result will be a DataFrame containing only columns Fee
, Discount
and Tutor
and it returns a new DataFrame with these selected columns.
Select Columns by Position Range
To select columns by position range using iloc
in Pandas, you can provide a range of column indices. For instance, df.iloc[:, 1:4]
select all rows (:) and columns at positions 1 through 4 (exclusive). The result will be a DataFrame containing columns ‘Fee’, ‘Duration’, and ‘Discount’.
# Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,1:4]
print(df2)
# Output:
# Fee Duration Discount
# 0 20000 30days 1000
# 1 25000 40days 2300
# Select From 3rd to end
df2 = df.iloc[:,2:]
# Returns
# Output:
# Duration Discount Tutor
# 0 30days 1000 Michel
# 1 40days 2300 Sam
# Select First Two Columns
df2 = df.iloc[:,:2]
print(df2)
# Output:
# Courses Fee
# 0 Spark 20000
# 1 PySpark 25000
Using df.iloc[:,-1:]
retrieves the last column of the DataFrame, and df.iloc[:,:1]
retrieves just the first column.
Using filter() & get() Methods
Similarly, you can also use the filter() method to select multiple columns based on their labels or names and the get()
method to retrieve a single/multiple columns from a DataFrame.
# Using DataFrame.filter()
# To select multiple columns
df2 = df.filter(['Courses','Fee'])
print(df2)
# Using DataFrame.get() method
df2 = df.get(['Courses','Fee'])
print(df2)
Complete Example of Selecting Multiple Columns
import pandas as pd
technologies = {
'Courses':["Spark","PySpark"],
'Fee' :[20000,25000],
'Duration':['30days','40days'],
'Discount':[1000,2300],
'Tutor':['Michel','Sam']
}
df = pd.DataFrame(technologies)
print(df)
# Select multiple columns
print(df[["Courses","Fee","Duration"]])
# Select Random columns
print(df.loc[:, ["Courses","Fee","Discount"]])
# Select columns by range
print(df.loc[:,'Fee':'Discount'])
print(df.loc[:,'Duration':])
print(df.loc[:,:'Duration'])
# Select every alternate column
print(df.loc[:,::2])
# Selected by column position
print(df.iloc[:,[1,3,4]])
# Select between indexes 1 and 4 (2,3,4)
print(df.iloc[:,1:4])
# Select From 3rd to end
print(df.iloc[:,2:])
# Select First Two Columns
print(df.iloc[:,:2])
FAQ on Select Multiple Columns in DataFrame
To select multiple columns at once from a Pandas DataFrame, you can use the double square bracket notation. In this case, loc
allows you to specify both rows and columns. The colon :
before the comma indicates that you want to select all rows.
You can use the loc
method to select multiple columns from a Pandas DataFrame. The loc
method is commonly used for label-based indexing, and you can use it to select specific columns based on their names.
Use the iloc
method to select specific columns by index in a Pandas DataFrame. The iloc
method is primarily used for integer-location based indexing.
To use the filter
method along with a callable function to select columns in a Pandas DataFrame. The filter
method allows you to subset the DataFrame based on a callable function that takes the column labels as input.
You can use regular expressions (regex) to select columns in a Pandas DataFrame using the filter
method. The filter
method in Pandas allows you to specify a regular expression to match column names.
You can select columns from a Pandas DataFrame based on their data types using the select_dtypes
method. This method allows you to specify the data types you want to include or exclude.
Conclusion
In this article, I have explained how to select multiple columns from pandas DataFrame using loc[], and iloc[] functions and using get()
and filter() with examples.
Happy Learning !!
Related Articles
- Pandas Select Rows Based on List Index
- Pandas Select Columns by Name or Index
- Pandas Select Rows Based on Column Values
- Pandas Select Rows by Index (Position/Label)
- How to Read Excel Multiple Sheets in Pandas
- How to Transpose() DataFrame in Pandas?
- Select Rows From List of Values in Pandas DataFrame
- Pandas Select All Columns Except One Column
- Pandas Join DataFrames on Columns
- How to Change Column Name in Pandas
- Pandas Drop Level From Multi-Level Column Index
- Pandas Create New DataFrame By Selecting Specific Columns