Pandas Drop Column(s) from DataFrame

pandas.DataFrame.drop() method removes the column/columns from the DataFrame, by default it doesn’t remove on the existing DataFrame instead it returns a new DataFrame after dropping the columns specified with the drop method. In order to remove columns on the existing DataFrame object use inplace=True param.

In this pandas drop columns article, I will explain how to remove/delete/drop column, different columns, by name, by index, between two columns e.t.c. drop() method is used to remove columns and rows according to the specific column(label) name and corresponding axis.

Now, let’s see the drop() syntax and how to delete or drop columns (two or more) from DataFrame with examples.

1. Quick Examples of pandas Drop Column(s)

Below are some quick examples of how to drop column(s) by name, by index e.t.c.


# Drop single column by Name
df2=df.drop(["Fee"], axis = 1)
df2=df.drop(columns=["Fee"], axis = 1)
df2=df.drop(labels=["Fee"], axis = 1)

# Drop single column by Index
df2=df.drop(df.columns[1], axis = 1)

#Updates the DataFrame in place
df.drop(df.columns[1], axis = 1, inplace=True)

# Drop multiple columns
df.drop(["Courses", "Fee"], axis = 1, inplace=True)
df.drop(df.columns[[1,2]], axis = 1, inplace=True)

# Other ways to drop columns
df.loc[:, 'Courses':'Fee'].columns, axis = 1, inplace=True)
df.drop(df.iloc[:, 1:2], axis=1, inplace=True)

2 pandas.DataFrame.drop() Syntax

Below is the syntax of pandas.DataFrame.drop() method.


# pandas DaraFrame drop() Syntax
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
  • labels – single label or list-like.
  • axis – Use 1 to drop columns and 0 to drop rows from DataFrame.
  • index – Column index to drop
  • columns – single label or list-like.
  • level – int or level name, optional, use for Multiindex.
  • inplace – Default False and returns a copy of DataFrame. When used True, it drop’s column inplace and returns None.
  • errors – {‘ignore’, ‘raise’}, default ‘raise’

Now, Let’s see a detailed example. first, create a pandas DataFrame with a dictionary of lists. On our DataFrame, we have column names Courses, Fee and Duration.


import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30day', '40days' ,'35days', '40days', '60days', '50days', '55days']
              })
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee Duration
0    Spark  20000    30day
1  PySpark  25000   40days
2   Hadoop  26000   35days
3   Python  22000   40days
4   pandas  24000   60days
5   Oracle  21000   50days
6     Java  22000   55days

3. pandas Drop Column

pandas drop() method remove the column by name and index from the DataFrame, by default it doesn’t remove on the existing DataFrame instead it returns a new DataFrame without the columns specified with the drop method. In order to remove columns on the existing DataFrame object use inplace=True param.

If a column you wanted to remove is not present on the DataFrame it returns an error message and you can handle this error using errors param.

You can also drop the index of the DataFrame using index param.

3.1 Drop Column by Name

This example removes a column by name Fee from a DataFrame. Note that to use axis=1 in order to delete columns.


# Drops 'Fee' column
df2=df.drop(["Fee"], axis = 1)
print(df2)

# Explicitly using parameter name 'labels'
df2=df.drop(labels=["Fee"], axis = 1)

# Alternatively you can also use columns instead of labels.
df2=df.drop(columns=["Fee"], axis = 1)

Yields below output. Use inplace=True to update the self DataFrame.


   Courses Duration
0    Spark    30day
1  PySpark   40days
2   Hadoop   35days
3   Python   40days
4   pandas   60days
5   Oracle   50days
6     Java   55days

3.2 Drop Column by Index

In order to remove the DataFrame columns by Index, first, we should get the DataFrame column as a list by using df.columns and then pick the column by index. Note that the index starts from 0 in Python. On below example df.columns[1] represents the second column on DataFrame which is Fee.


# Drop column by index.
print(df.drop(df.columns[[1]], axis = 1))

# using inplace=True
#df.drop(df.columns[[1]], axis = 1, inplace=True)
#print(df)

Yields same output as above.

4. Drop Different Columns From DataFrame

Below are some examples of dropping multiple columns from DataFrame by column name and index.

4.1 Drop Two or More Columns By Label Name

When you have a list of column names to drop, create a list object with the column names and use it with drop() method or directly use the list. The Below examples delete columns Courses and Fee from DataFrame.


df2=df.drop(["Courses", "Fee"], axis = 1)
print(df2)

Yields below output. Use inplace=True to update the self DataFrame.


  Duration
0    30day
1   40days
2   35days
3   40days
4   60days
5   50days
6   55days

4.2 Drop Two or More Columns by Index

If you wanted to drop two or more columns by index, unfortunately, the drop() method doesn’t take an index as param, but we can overcome this by getting column names by index using df.columns[]. Use the below example to delete columns 0 and 1 (index starts from 0) index.


df2=df.drop(df.columns[[0,1]], axis = 1)
print(df2)

Yields same output as above.

4.3 Drop Columns from List of Columns

If you have a list of columns and you wanted to delete all columns from the list, use the below approach.


lisCol = ["Courses","Fee"]
df2=df.drop(lisCol, axis = 1)
print(df2)

5. Other ways to Remove Columns from DataFrame

Above are the most used ways to remove/delete columns from DataFrame, below are some of the other ways to remove one or two columns.

5.1 Remove columns From DataFrame inplace

In case you wanted to remove a column in place then you should use inplace=True. By using this on drop() function, returns None. Below example drops


df.drop(df.columns[1], axis = 1, inplace=True)

5.2 Remove Columns from a List of Columns (iteratively) By Condition.

In one of the above examples, I have explained how to remove/delete columns from the list of columns. Now let’s see another example doing the same iteratively. This code removes Fee column.


for col in df.columns:
    if 'Fee' in col:
        del df[col]
print(df)

5.3 Using df.loc() to Remove Columns Between Specified Columns

Drop() method using loc[] function to remove all columns between a specific column name to another column’s name. Use [ : , 'Courses':'Fee'] to drop the one and second columns. inplace option would work on the original object.


df.drop(df.loc[:, 'Courses':'Fee'].columns, axis = 1, inplace=True)
print(df)

5.4 Using df.iloc() to Remove Columns Between Specified Column Indexes.

drop() method using iloc[] function to remove all columns between a specific column to another column. Use [: , 1:2] for deleting the second column. For instance, df.drop(df.iloc[:, 1:2], inplace=True, axis=1), removes Fee column.


df.drop(df.iloc[:, 1:2], inplace=True, axis=1)
print(df)

6. Complete Example For Reference


import pandas as pd
technologies = ({
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30day', '40days' ,'35days', '40days', '60days', '50days', '55days']
              })
df = pd.DataFrame(technologies)
print(df)

# Drop single column by Name
df2=df.drop(["Fee"], axis = 1)
print(df2)

df2=df.drop(columns=["Fee"], axis = 1)
print(df2)

df2=df.drop(labels=["Fee"], axis = 1)
print(df2)

# Drop column by index
df2=df.drop(df.columns[1], axis = 1)
print(df2)

# Drop multiple columns by Name
df2=df.drop(["Courses", "Fee"], axis = 1)
print(df2)

# Drop multiple columns by Index
df2=df.drop(df.columns[[0,1]], axis = 1)
print(df2)

# Drop Columns from List
lisCol = ["Courses","Fee"]
df2=df.drop(lisCol, axis = 1)
print(df2)

# Drop columns between two columns
df2=df.drop(df.loc[:, 'Courses':'Fee'].columns, axis = 1)
print(df)

df.drop(df.iloc[:, 1:2], inplace=True, axis=1)
print(df)

# Drop columns by condition
for col in df.columns:
    if 'Fee' in col:
                del df[col]
print(df)

Conclusion

Happy Learning !!

In this drop column article, you have learned how to remove or delete a column, two or more columns from DataFrame by name, labels, index. Also, you have learned how to remove columns between two columns and many more examples.

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply