• Post author:
  • Post category:Pandas
  • Post last modified:April 2, 2024
  • Reading time:20 mins read
You are currently viewing Pandas Select Multiple Columns in DataFrame

By using df[], loc[], iloc[], and get() you can select multiple columns from pandas DataFrame. When working with a table-like structure we are often required to retrieve the data from columns. Similar to SQL, selecting multiple columns in Pandas DataFrame is one of the most frequently performed tasks while manipulating data. Pandas provide several techniques to efficiently retrieve subsets of data from your DataFrame.

loc[] is used to select columns by name and iloc[] is used to select columns by index. You can also use these operators to select rows from pandas DataFrame. Also, refer to a related article how to get cell value from pandas DataFrame.

Key Points –

  • Use square brackets or the .loc accessor with a list of column names to select multiple columns efficiently.
  • Alternatively, employ the .loc accessor with a colon to select all rows and a list of column names to select multiple columns
  • Utilize the .iloc accessor with a colon to select all rows and a list of column indices to select multiple columns.
  • Combine these methods with boolean indexing to select specific columns based on conditions.

Quick Examples of Select Multiple Columns

If you are in a hurry, below are some quick examples of how to select multiple columns in Pandas dataframe.


# Quick examples of select multiple columns 

# By using df[] Notation to select multiple columns
df2 = df[["Courses","Fee","Duration"]] 

# Using loc[] to take column slices
df2 = df.loc[:, ["Courses","Fee","Duration"]] # Selecte multiple columns
df2 = df.loc[:, ["Courses","Fee","Discount"]] # Select Random columns
df2 = df.loc[:,'Fee':'Discount'] # Select columns between two columns
df2 = df.loc[:,'Duration':]  # Select columns by range
df2 = df.loc[:,:'Duration']  # Select columns by range
df2 = df.loc[:,::2]          # Select every alternate column

# Using iloc[] to select column by Index
df2 = df.iloc[:,[1,3,4]] # Select columns by Index
df2 = df.iloc[:,1:4] # Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,2:] # Select From 3rd to end
df2 = df.iloc[:,:2] # Select First Two Columns

# Using get()
df2 = df.get(['Courses','Fee'])

Now, let’s run these examples by creating the sample DataFrame with a few rows and columns. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


# Create DataFrame
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark"],
    'Fee' :[20000,25000],
    'Duration':['30days','40days'],
    'Discount':[1000,2300]
              }
df = pd.DataFrame(technologies)
print("Create DataFrame\n",df)

Yields below output.

pandas select multiple columns

Using df[] & loc[] to Select Multiple Columns by Name

By using df[] & pandas.DataFrame.loc[] you can select multiple columns by names or labels. To select the columns by name, you can use the syntax [:, start:stop:step] where start is the name of the first column to include, stop is the name of the last column to include, and step determines the number of indices to advance after each extraction, allowing for selecting alternate columns.

Another syntax available with pandas.DataFrame.loc[] is [:, [labels]], where a label is a list of column names to include. This approach provides flexibility in selecting multiple columns by names or labels.


# loc[] syntax to slice columns
df.loc[:,start:stop:step]

Select Columns Using df[] Notation

The [] notation in Pandas DataFrame allows for a concise selection of single or multiple columns by their labels or names simply by providing the names of the columns as a list within the brackets. Here you can use the [] notation instead of df.loc[,start:stop:step] approach.

In the below examples, Select the columns with labels “Courses,” “Fee,” and “Duration” from the original DataFrame df. The result is a new DataFrame (df2) containing only these columns.


# Select Columns by labels
df2 = df[["Courses","Fee","Duration"]]
print("Selecting specific columns by labels:\n",df2)

Yields below output.

pandas select multiple columns

Select Multiple Columns using df.loc[]

You can select multiple columns from a pandas DataFrame by passing a list of column names or labels as an argument. For instance, select the columns labeled “Courses,” “Fee,” and “Discount” from the original DataFrame df. The : before the comma indicates that you want to select all rows for these columns.


# Select Multiple Columns
df2 = df.loc[:, ["Courses","Fee","Discount"]]
print(df2)


# Output:
#   Courses    Fee  Discount
# 0    Spark  20000      1000
# 1  PySpark  25000      2300

Select DataFrame Columns by Range

When selecting columns by range using the loc[] indexer in Pandas, it’s essential to provide both the start and stop column names.

  • When you don’t provide a start column, loc[] selects columns from the beginning.
  • If you don’t provide a stop column, loc[] selects all columns from the start label to the end.
  • When you provide both start and stop columns, loc[] selects all columns in between those two columns, inclusive of both start and stop columns..

# Select all columns between Fee an Discount columns
df2 = df.loc[:,'Fee':'Discount']
print(df2)

# Output:
#     Fee Duration  Discount
# 0  20000   30days      1000
# 1  25000   40days      2300

# Select from 'Duration' column
df2 = df.loc[:,'Duration':]
print(df2)

# Output:
#  Duration  Discount   Tutor
# 0   30days      1000  Michel
# 1   40days      2300     Sam

# Select from beginning and end at 'Duration' column
df2 = df.loc[:,:'Duration']
print(df2)

# Output:
#   Courses    Fee Duration
# 0    Spark  20000   30days
# 1  PySpark  25000   40days

Select Every Other Column

Using loc[], to select every alternate column in a DataFrame, you can use slicing with a step size. For instance, uses loc[] with slice notation to select every column with a step of 2. The : before the comma indicates that you want to select all rows, and ::2 specifies the step, which means every alternate column.


# Select every alternate column
df2 = df.loc[:,::2]
print(df2)

# Output:
#   Courses Duration   Tutor
# 0    Spark   30days  Michel
# 1  PySpark   40days     Sam

Using iloc[] – Select Multiple Columns by Index

By using pandas.DataFrame.iloc[], you can select multiple columns from a DataFrame by their positional indices. Remember index starts from 0. You can use the syntax [:, start:stop:step] with iloc[], where start indicates the index of the first column to include, stop indicates the index of the last column to include, step indicates the number of indices to advance after each extraction, allowing for selecting alternate columns. Or, you can use the syntax [:, [indices]] with iloc[], where indices is a list of column indices to include.

Select Multiple Columns using df.iloc[]

You can use the iloc indexer to select specific rows and columns from a DataFrame by integer position. To select multiple columns using iloc, you can provide a comma-separated list of column indices.


# Selected by column position
df2 = df.iloc[:,[1,3,4]]
print(df2)

# Output:
#     Fee  Discount   Tutor
# 0  20000      1000  Michel
# 1  25000      2300     Sam

In the above example, df.iloc[:,[1,3,4]] select all rows (:) and the columns at positions 1, 3, and 4. The result will be a DataFrame containing only columns "Fee","Discount" and "Tutor" and it returns a new DataFrame with these selected columns.

Select Columns by Position Range

To select columns by position range using iloc in Pandas, you can provide a range of column indices. For instance, df.iloc[:, 1:4] select all rows (:) and columns at positions 1 through 4 (exclusive). The result will be a DataFrame containing columns ‘Fee’, ‘Duration’, and ‘Discount’.


# Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,1:4]
print(df2)

# Output:
#     Fee Duration  Discount
# 0  20000   30days      1000
# 1  25000   40days      2300

# Select From 3rd to end
df2 = df.iloc[:,2:]
# Returns

# Output:
#  Duration  Discount   Tutor
# 0   30days      1000  Michel
# 1   40days      2300     Sam

# Select First Two Columns
df2 = df.iloc[:,:2]
print(df2)

# Output:
#   Courses    Fee
# 0    Spark  20000
# 1  PySpark  25000

Using df.iloc[:,-1:] retrieves the last column of the DataFrame, and df.iloc[:,:1] retrieves just the first column.

Using filter() & get() Methods

Similarly, you can also use the filter() method to select multiple columns based on their labels or names and the get() method to retrieve a single/multiple columns from a DataFrame.


# Using DataFrame.filter() 
# To select multiple columns
df2 = df.filter(['Courses','Fee'])
print(df2)

# Using DataFrame.get() method
df2 = df.get(['Courses','Fee'])
print(df2)

Complete Example of Selecting Multiple Columns


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark"],
    'Fee' :[20000,25000],
    'Duration':['30days','40days'],
    'Discount':[1000,2300],
    'Tutor':['Michel','Sam']
              }
df = pd.DataFrame(technologies)
print(df)

# Select multiple columns
print(df[["Courses","Fee","Duration"]])

# Select Random columns
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select columns by range
print(df.loc[:,'Fee':'Discount']) 
print(df.loc[:,'Duration':])
print(df.loc[:,:'Duration'])

# Select every alternate column
print(df.loc[:,::2])

# Selected by column position
print(df.iloc[:,[1,3,4]])

# Select between indexes 1 and 4 (2,3,4)
print(df.iloc[:,1:4])

# Select From 3rd to end
print(df.iloc[:,2:])

# Select First Two Columns
print(df.iloc[:,:2])

FAQ on Select Multiple Columns in DataFrame

How can I select multiple columns at once?

To select multiple columns at once from a Pandas DataFrame, you can use the double square bracket notation. In this case, loc allows you to specify both rows and columns. The colon : before the comma indicates that you want to select all rows.

Can I use the loc method to select multiple columns?

You can use the loc method to select multiple columns from a Pandas DataFrame. The loc method is commonly used for label-based indexing, and you can use it to select specific columns based on their names.

Can I use iloc to select specific columns by index?

Use the iloc method to select specific columns by index in a Pandas DataFrame. The iloc method is primarily used for integer-location based indexing.

Is there a way to select columns using a callable function?

To use the filter method along with a callable function to select columns in a Pandas DataFrame. The filter method allows you to subset the DataFrame based on a callable function that takes the column labels as input.

Can I use regular expressions to select columns?

You can use regular expressions (regex) to select columns in a Pandas DataFrame using the filter method. The filter method in Pandas allows you to specify a regular expression to match column names.

How do I select columns based on their data types?

You can select columns from a Pandas DataFrame based on their data types using the select_dtypes method. This method allows you to specify the data types you want to include or exclude.

Conclusion

In this article, you have learned how to select multiple columns from pandas DataFrame using DataFrame.loc[], and DataFrame.iloc[] functions and using get() and filter().

Happy Learning !!

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply