• Post author:
  • Post category:Pandas
  • Post last modified:April 17, 2024
  • Reading time:18 mins read
You are currently viewing Pandas Select Columns by Name or Index

Use DataFrame.loc[] and DataFrame.iloc[] to select a single column or multiple columns from pandas DataFrame by column names/label or index position respectively. where loc[] is used with column labels/names and iloc[] is used with column index/position. You can also use these operators to select rows from Pandas DataFrame. Also, refer to a related article how to get cell value from pandas DataFrame.

In this article, I will explain how to select single or multiple columns from DataFrame by column labels & index, certain positions of the column, and by range e.t.c with examples.

Key Points –

  • Pandas allow selecting columns from a DataFrame by their names using square brackets notation or the .loc[] accessor.
  • The .loc[] accessor allows for more explicit selection, accepting row and column labels or boolean arrays.
  • Alternatively, you can use the .iloc[] accessor to select columns by their integer index positions.
  • For selecting the last column, use df.iloc[:,-1:], and for the first column, use df.iloc[:,:1].
  • Understanding both column name and index-based selection is essential for efficient data manipulation with Pandas.

Quick Examples of Select Columns by Name or Index

If you are in a hurry, below are some quick examples of how to select columns by name or index


# Quick examples of select columns by name or index

# Example 1: By using df[] Notation
df2 = df[["Courses","Fee","Duration"]] # select multile columns

# Example 2: Using loc[] to take column slices
df2 = df.loc[:, ["Courses","Fee","Duration"]] # Selecte multiple columns
df2 = df.loc[:, ["Courses","Fee","Discount"]] # Select Random columns
df2 = df.loc[:,'Fee':'Discount'] # Select columns between two columns
df2 = df.loc[:,'Duration':]  # Select columns by range
df2 = df.loc[:,:'Duration']  # Select columns by range
df2 = df.loc[:,::2]          # Select every alternate column

# Example 3: Using iloc[] to select column by Index
df2 = df.iloc[:,[1,3,4]] # Select columns by Index
df2 = df.iloc[:,1:4] # Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,2:] # Select From 3rd to end
df2 = df.iloc[:,:2] # Select First Two Columns

Now, let’s create a DataFrame with a few rows and columns and execute some examples of how to select columns in pandas. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark"],
    'Fee' :[20000,25000],
    'Duration':['30days','40days'],
    'Discount':[1000,2300]
              }
df = pd.DataFrame(technologies)
print("Create DataFrame:\n", df)

Yields below output.

pandas select columns

Using loc[] to Select Columns by Name

By using df[] & pandas.DataFrame.loc[] you can select multiple columns by names or labels. To select the columns by name, you can use the syntax [:, start:stop:step] where start is the name of the first column to include, stop is the name of the last column to include, and step determines the number of indices to advance after each extraction, allowing for selecting alternate columns. Another syntax available with pandas.DataFrame.loc[] is [:, [labels]], where a label is a list of column names to include.


# loc[] syntax to slice columns
df.loc[:,start:stop:step]

Select DataFrame Columns by Name

You can select single or multiple columns by their labels or names using the square brackets [] notation. Simply enclose the names of the columns you wish to select within the brackets as a list.


# Select Columns by labels
df2 = df[["Courses","Fee","Duration"]]
print("Select columns by labels:\n", df2)

Yields below output.

pandas select columns

Select Columns by Index in Multiple Columns

You can select multiple columns from Pandas DataFrame by passing a list of column names or labels as an argument. Note that loc[] also supports multiple conditions when selecting rows based on column values.


# Select multiple columns
df2 = df.loc[:, ["Courses","Fee","Discount"]]
print("Select multiple columns by labels:\n", df2)

# Output:
# Select multiple columns by labels:
#   Courses    Fee  Discount
# 0    Spark  20000      1000
# 1  PySpark  25000      2300

Select DataFrame Columns by Range

When selecting columns by range using the loc[] accessor, it’s important to provide both the start and stop column names.

  • When you don’t provide a start column, loc[] selects columns from the beginning.
  • If you don’t provide a stop column, loc[] selects all columns from the start label to the end.
  • When you provide both start and stop columns, loc[] selects all columns in between those two columns, inclusive of both start and stop columns.

# Select all columns between Fee an Discount columns
df2 = df.loc[:,'Fee':'Discount']
print("Select columns by labels:\n", df2)

# Output
# Select columns by labels:
#     Fee Duration  Discount
# 0  20000   30days      1000
# 1  25000   40days      2300

# Select from 'Duration' column
df2 = df.loc[:,'Duration':]
print("Select columns by labels:\n", df2)

# Output
# Select columns by labels:
#  Duration  Discount   Tutor
# 0   30days      1000  Michel
# 1   40days      2300     Sam

# Select from beginning and end at 'Duration' column
df2 = df.loc[:,:'Duration']
print("Select columns by labels:\n", df2)

# Output
# Select columns by labels:
#   Courses    Fee Duration
# 0    Spark  20000   30days
# 1  PySpark  25000   40days

Select Every Alternate Column

To select every alternate column from a DataFrame, you can use the loc[] accessor with the step parameter.


# Select every alternate column
df2 = df.loc[:,::2]
print("Select columns by labels:\n", df2)

# Output:
# Select columns by labels:
#   Courses Duration   Tutor
# 0    Spark   30days  Michel
# 1  PySpark   40days     Sam

This code effectively selects every alternate column, starting from the first column, which results in selecting Courses and Duration.

Pandas iloc[] to Select Column by Index or Position

By using pandas.DataFrame.iloc[], you can select multiple columns from a DataFrame by their positional indices. Remember index starts from 0. You can use the syntax [:, start:stop:step] with iloc[], where start indicates the index of the first column to include, stop indicates the index of the last column to include, step indicates the number of indices to advance after each extraction, allowing for selecting alternate columns. Or, you can use the syntax [:, [indices]] with iloc[], where indices is a list of column indices to include.

Select Multiple Columns by Index Position

To select multiple columns from a DataFrame by their index positions, you can use the iloc[] accessor. Below example retrieves "Fee","Discount" and "Duration" and returns a new DataFrame with the columns selected.


# Select columns by position
df2 = df.iloc[:,[1,3,4]]
print("Selec columns by position:\n", df2)

# Output:
# Selec columns by position:
#     Fee  Discount   Tutor
# 0  20000      1000  Michel
# 1  25000      2300     Sam

Select Columns by Position Range

You can also slice a DataFrame by a range of positions. For instance, select columns by position range using the .iloc[] accessor in Pandas. It selects columns with positions 1 through 3 (exclusive of position 4) from the DataFrame df and assigns them to df2.


# Select between indexes 1 and 4 (2,3,4)
df2 = df.iloc[:,1:4]
print("Select columns by position:\n", df2)

# OUtput:
# Selec columns by position:
#     Fee Duration  Discount
# 0  20000   30days      1000
# 1  25000   40days      2300

# Select From 3rd to end
df2 = df.iloc[:,2:]
print("Select columns by position:\n", df2)

# Output:
# Selec columns by position:
#  Duration  Discount   Tutor
# 0   30days      1000  Michel
# 1   40days      2300     Sam

# Select First Two Columns
df2 = df.iloc[:,:2]
print("Selec columns by position:\n", df2))

# Output:
# Selec columns by position:
#   Courses    Fee
# 0    Spark  20000
# 1  PySpark  25000

To retrieve the last column of a DataFrame, you can use df.iloc[:,-1:], and to obtain just the first column, you can use df.iloc[:,:1].

Complete Example


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark"],
    'Fee' :[20000,25000],
    'Duration':['30days','40days'],
    'Discount':[1000,2300],
    'Tutor':['Michel','Sam']
              }
df = pd.DataFrame(technologies)
print(df)

# Select multiple columns
print(df[["Courses","Fee","Duration"]])

# Select Random columns
print(df.loc[:, ["Courses","Fee","Discount"]])

# Select columns by range
print(df.loc[:,'Fee':'Discount']) 
print(df.loc[:,'Duration':])
print(df.loc[:,:'Duration'])

# Select every alternate column
print(df.loc[:,::2])

# Selected by column position
print(df.iloc[:,[1,3,4]])

# Select between indexes 1 and 4 (2,3,4)
print(df.iloc[:,1:4])

# Select From 3rd to end
print(df.iloc[:,2:])

# Select First Two Columns
print(df.iloc[:,:2])

FAQ on Select Columns by Name or Index

How do I select a single column by name in Pandas?

To select a single column by name, you can use square bracket([]) or dot(.) notation. For example, df['column_name'] or df.column_name

How do I select multiple columns by name in Pandas?

To select multiple columns by name, you can pass a list of column names within square brackets. For example, df[['column_name1', 'column_name2']]

How do I select columns by index in Pandas?

You can select columns by their index using the df.iloc[] attribute. For example, df.iloc[:, [0, 2]] Use to Select the first and third columns.

How do I select a single column by both name and index in Pandas?

You can use the .loc attribute to select a column by name and .iloc to select by index. For example, df['column_name'] Use to select by name and df.iloc[:, 0] Use to select by index.

How can I select all columns in a Pandas DataFrame?

You can select all columns by using a colon : in place of column names or indices. For example, df[:] Use to select all columns.

Conclusion

In this article, I have explained the pandas select columns by name or index using DataFrame.loc[], and DataFrame.iloc[] properties. To understand the similarities and differences between these two refer to pandas loc[] vs iloc[] with examples.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply