Site icon Spark By {Examples}

Pandas Select Multiple Columns in DataFrame

pandas select multiple columns

By using df[], loc[], iloc[], and get() you can select multiple columns from pandas DataFrame. When working with a table-like structure we are often required to retrieve the data from columns. Similar to SQL, selecting multiple columns in Pandas DataFrame is one of the most frequently performed tasks while manipulating data. Pandas provide several techniques to efficiently retrieve subsets of data from your DataFrame.

loc[] is used to select columns by name and iloc[] is used to select columns by index. You can also use these operators to select rows from pandas DataFrame. Also, refer to a related article how to get cell value from pandas DataFrame.

Key Points –

Quick Examples of Select Multiple Columns

If you are in a hurry, below are some quick examples of how to select multiple columns in Pandas DataFrame.


# Quick examples of select multiple columns 

# By using df[] Notation to select multiple columns
df2 = df[["Courses","Fee","Duration"]] 

# Using loc[] to take column slices
df2 = df.loc[:, ["Courses","Fee","Duration"]] # Selecte multiple columns
df2 = df.loc[:, ["Courses","Fee","Discount"]] # Select Random columns
df2 = df.loc[:,'Fee':'Discount'] # Select columns between two columns
df2 = df.loc[:,'Duration':]  # Select columns by range
df2 = df.loc[:,:'Duration']  # Select columns by range
df2 = df.loc[:,::2]          # Select every alternate column

# Using iloc[] to select column by Index
df2 = df.iloc[:,[1,3,4]] # Select columns by Index
df2 = df.iloc[:,1:4] # Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,2:] # Select From 3rd to end
df2 = df.iloc[:,:2] # Select First Two Columns

# Using get()
df2 = df.get(['Courses','Fee'])

To run some examples of selecting multiple columns in pandas DataFrame. Now, let’s run these examples by creating the sample DataFrame with a few rows and columns.


# Create DataFrame
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark"],
    'Fee' :[20000,25000],
    'Duration':['30days','40days'],
    'Discount':[1000,2300]
              }
df = pd.DataFrame(technologies)
print("Create DataFrame\n",df)

Yields below output.

pandas select multiple columns

Using df[] & loc[] to Select Multiple Columns by Name

By using df[] & pandas.DataFrame.loc[] you can select multiple columns by names or labels. To select the columns by name, you can use the syntax [:, start:stop:step] where start is the name of the first column to include, stop is the name of the last column to include, and step determines the number of indices to advance after each extraction, allowing for selecting alternate columns.

Another syntax available with pandas.DataFrame.loc[] is [:, [labels]], where a label is a list of column names to include. This approach provides flexibility in selecting multiple columns by names or labels.


# loc[] syntax to slice columns
df.loc[:,start:stop:step]

Select Columns Using df[] Notation

The [] notation in Pandas DataFrame allows for a concise selection of single or multiple columns by their labels or names simply by providing the names of the columns as a list within the brackets.


# Select Columns by labels
df2 = df[["Courses","Fee","Duration"]]
print("Selecting specific columns by labels:\n",df2)

In the above examples, Select the columns with labels Courses, Fee, and Duration from the original DataFrame df. The result is a new DataFrame (df2) containing only these columns.

pandas select multiple columns

Select Multiple Columns using df.loc[]

You can select multiple columns from a pandas DataFrame by passing a list of column names or labels as an argument.


# Select Multiple Columns
df2 = df.loc[:, ["Courses","Fee","Discount"]]
print(df2)


# Output:
#   Courses    Fee  Discount
# 0    Spark  20000      1000
# 1  PySpark  25000      2300

In the above example, select the columns labeled Courses, Fee,and Discount from the original DataFrame df. The : before the comma indicates that you want to select all rows for these columns.

Select DataFrame Columns by Range

When selecting columns by range using the loc[] indexer in Pandas, it’s essential to provide both the start and stop column names.


# Select all columns between Fee an Discount columns
df2 = df.loc[:,'Fee':'Discount']
print(df2)

# Output:
#     Fee Duration  Discount
# 0  20000   30days      1000
# 1  25000   40days      2300

# Select from 'Duration' column
df2 = df.loc[:,'Duration':]
print(df2)

# Output:
#  Duration  Discount   Tutor
# 0   30days      1000  Michel
# 1   40days      2300     Sam

# Select from beginning and end at 'Duration' column
df2 = df.loc[:,:'Duration']
print(df2)

# Output:
#   Courses    Fee Duration
# 0    Spark  20000   30days
# 1  PySpark  25000   40days

Select Every Other Column

Using loc[], to select every alternate column in a DataFrame, you can use slicing with a step size.


# Select every alternate column
df2 = df.loc[:,::2]
print(df2)

# Output:
#   Courses Duration   Tutor
# 0    Spark   30days  Michel
# 1  PySpark   40days     Sam

The above example, uses loc[] with slice notation to select every column with a step of 2. The : before the comma indicates that you want to select all rows, and ::2 specifies the step, which means every alternate column.

Using iloc[] – Select Multiple Columns by Index

By using pandas.DataFrame.iloc[], you can select multiple columns from a DataFrame by their positional indices. Remember index starts from 0. You can use the syntax [:, start:stop:step] with iloc[], where start indicates the index of the first column to include, stop indicates the index of the last column to include, step indicates the number of indices to advance after each extraction, allowing for selecting alternate columns. Or, you can use the syntax [:, [indices]] with iloc[], where indices is a list of column indices to include.

Select Multiple Columns using df.iloc[]

You can use the iloc indexer to select specific rows and columns from a DataFrame by integer position. To select multiple columns using iloc, you can provide a comma-separated list of column indices.


# Selected by column position
df2 = df.iloc[:,[1,3,4]]
print(df2)

# Output:
#     Fee  Discount   Tutor
# 0  20000      1000  Michel
# 1  25000      2300     Sam

In the above example, df.iloc[:,[1,3,4]] select all rows (:) and the columns at positions 1, 3, and 4. The result will be a DataFrame containing only columns Fee, Discount and Tutor and it returns a new DataFrame with these selected columns.

Select Columns by Position Range

To select columns by position range using iloc in Pandas, you can provide a range of column indices. For instance, df.iloc[:, 1:4] select all rows (:) and columns at positions 1 through 4 (exclusive). The result will be a DataFrame containing columns ‘Fee’, ‘Duration’, and ‘Discount’.


# Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,1:4]
print(df2)

# Output:
#     Fee Duration  Discount
# 0  20000   30days      1000
# 1  25000   40days      2300

# Select From 3rd to end
df2 = df.iloc[:,2:]
# Returns

# Output:
#  Duration  Discount   Tutor
# 0   30days      1000  Michel
# 1   40days      2300     Sam

# Select First Two Columns
df2 = df.iloc[:,:2]
print(df2)

# Output:
#   Courses    Fee
# 0    Spark  20000
# 1  PySpark  25000

Using df.iloc[:,-1:] retrieves the last column of the DataFrame, and df.iloc[:,:1] retrieves just the first column.

Using filter() & get() Methods

Similarly, you can also use the filter() method to select multiple columns based on their labels or names and the get() method to retrieve a single/multiple columns from a DataFrame.


# Using DataFrame.filter() 
# To select multiple columns
df2 = df.filter(['Courses','Fee'])
print(df2)

# Using DataFrame.get() method
df2 = df.get(['Courses','Fee'])
print(df2)

Complete Example of Selecting Multiple Columns


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark"],
    'Fee' :[20000,25000],
    'Duration':['30days','40days'],
    'Discount':[1000,2300],
    'Tutor':['Michel','Sam']
              }
df = pd.DataFrame(technologies)
print(df)

# Select multiple columns
print(df[["Courses","Fee","Duration"]])

# Select Random columns
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select columns by range
print(df.loc[:,'Fee':'Discount']) 
print(df.loc[:,'Duration':])
print(df.loc[:,:'Duration'])

# Select every alternate column
print(df.loc[:,::2])

# Selected by column position
print(df.iloc[:,[1,3,4]])

# Select between indexes 1 and 4 (2,3,4)
print(df.iloc[:,1:4])

# Select From 3rd to end
print(df.iloc[:,2:])

# Select First Two Columns
print(df.iloc[:,:2])

FAQ on Select Multiple Columns in DataFrame

How can I select multiple columns at once?

To select multiple columns at once from a Pandas DataFrame, you can use the double square bracket notation. In this case, loc allows you to specify both rows and columns. The colon : before the comma indicates that you want to select all rows.

Can I use the loc method to select multiple columns?

You can use the loc method to select multiple columns from a Pandas DataFrame. The loc method is commonly used for label-based indexing, and you can use it to select specific columns based on their names.

Can I use iloc to select specific columns by index?

Use the iloc method to select specific columns by index in a Pandas DataFrame. The iloc method is primarily used for integer-location based indexing.

Is there a way to select columns using a callable function?

To use the filter method along with a callable function to select columns in a Pandas DataFrame. The filter method allows you to subset the DataFrame based on a callable function that takes the column labels as input.

Can I use regular expressions to select columns?

You can use regular expressions (regex) to select columns in a Pandas DataFrame using the filter method. The filter method in Pandas allows you to specify a regular expression to match column names.

How do I select columns based on their data types?

You can select columns from a Pandas DataFrame based on their data types using the select_dtypes method. This method allows you to specify the data types you want to include or exclude.

Conclusion

In this article, I have explained how to select multiple columns from pandas DataFrame using loc[], and iloc[] functions and using get() and filter() with examples.

Happy Learning !!

Exit mobile version