Pandas – Change the Order of DataFrame Columns

Spread the love

You can use DataFrame.reindex() to change the order of pandas DataFrame columns, In this article, I will explain how to change the order of DataFrame columns in pandas and how to sort columns in alphabetical order. One easy way to re-arrange columns would be to reassign the same DataFrame with the order of the columns changed, this is similar to selecting the DataFrame with the desired order and assign it to another DataFrame.

1. Create a DataFrame with a Dictionary of Lists

Now, let’s create a DataFrame with a few rows and columns to explain changing the column order with examples. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30day', '40days' ,'35days', '40days', '60days', '50days', '55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000]
                }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


   Courses    Fee Duration  Discount
0    Spark  20000    30day      1000
1  PySpark  25000   40days      2300
2   Hadoop  26000   35days      1500
3   Python  22000   40days      1200
4   pandas  24000   60days      2500
5   Oracle  21000   50days      2100
6     Java  22000   55days      2000

2. Change Order of Columns in pandas DataFrame

You can change/rearrange the order of the DataFrame columns in any way you want by specifying the columns in a list to df[], for example df[['Discount',"Fee","Courses","Duration"]].


# Using double brackets to change column
df = pd.DataFrame(technologies)
df2 = df[['Discount',"Fee","Courses","Duration"]]
print(df2)  

Yields below output.


   Discount    Fee  Courses Duration
0      1000  20000    Spark    30day
1      2300  25000  PySpark   40days
2      1500  26000   Hadoop   35days
3      1200  22000   Python   40days
4      2500  24000   pandas   60days
5      2100  21000   Oracle   50days
6      2000  22000     Java   55days

3. Change Columns Order Using DataFrame.reindex()

Use df.reindex(columns=change_column) with a list of columns in the desired order as change_column to reorder the columns.


# Using DataFrame.reindex() to change columns order
change_column = ['Courses','Duration','Fee','Discount']
df = df.reindex(columns=change_column)
print(df)

# you can also try
df = df.reindex(['Courses','Duration','Fee','Discount'], axis=1)
print(df)

Yields below output.


   Courses Duration    Fee  Discount
0    Spark    30day  20000      1000
1  PySpark   40days  25000      2300
2   Hadoop   35days  26000      1500
3   Python   40days  22000      1200
4   pandas   60days  24000      2500
5   Oracle   50days  21000      2100
6     Java   55days  22000      2000

4. Reorder DataFrame Columns in Sorted Order

You can get the pandas DataFrame column names as a list using df.columns, use sorted() method to sort the columns and send the sorted columns to DataFrame.reindex() method get a DataFrame with sort ordered columns


# Change sorted order columns
df = df.reindex(sorted(df.columns), axis=1)
print(df)

# Reorder DataFrame column in sorted order
df = df.reindex(columns=sorted(df.columns))
print(df)

Yields below output.


   Courses  Discount Duration    Fee
0    Spark      1000    30day  20000
1  PySpark      2300   40days  25000
2   Hadoop      1500   35days  26000
3   Python      1200   40days  22000
4   pandas      2500   60days  24000
5   Oracle      2100   50days  21000
6     Java      2000   55days  22000

5. Using DataFrame Constructor

You can also use pd.DataFrame(df,columns=['Courses','Discount','Duration','Fee']) to rearrange the order of columns from the existing DataFrame. Consider the existing DataFrame as df, and create a new DataFrame column.


# Using DataFrame constructor
df = pd.DataFrame(df, columns=['Courses','Discount','Duration','Fee'])
print(df)

In our case yields the same output as above.

6. Pandas Reorder the Columns

Use df=df.columns.tolist() to rearrange the list anyway you want to reorder the pandas DataFrame column. For instance, df2=df[-1:]+df[:-1] method.


df = df.columns.tolist()
# Rearrange the list any way you want
df2 = df[-1:] + df[:-1]
print(df2)

Yields below output.


'Discount', 'Courses', 'Fee', 'Duration']

7. Create New List Column in the Desired Order

You need to create a new list of your columns in the desired order, then use df[['Duration']+[col for col in df.columns if col!='Duration']] to rearrange the columns in this new order.


# Using desired order to change column
df2 = df[ ['Duration'] + [ col for col in df.columns if col != 'Duration']]
print(df2)

Yields below output.


  Duration  Courses    Fee  Discount
0    30day    Spark  20000      1000
1   40days  PySpark  25000      2300
2   35days   Hadoop  26000      1500
3   40days   Python  22000      1200
4   60days   pandas  24000      2500
5   50days   Oracle  21000      2100
6   55days     Java  22000      2000

You can also use [df.columns[-2]]+[col for col in df if col!=df.columns[-2]] to the last column (indicated by -2) is inserted as the first column.


df2 = [df.columns[-2]] + [col for col in df if col != df.columns[-2]]
print(df2)

Yields below output.


['Duration', 'Courses', 'Fee', 'Discount']

8. Complete Example For Reference


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30day', '40days' ,'35days', '40days', '60days', '50days', '55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000]
                }
df = pd.DataFrame(technologies)
print(df)

# Using double brackets to change columns
df = pd.DataFrame(technologies)
df2 = df[['Discount',"Fee","Courses","Duration"]]
print(df2)  

# Using Pandas.DataFrame.list(zip()) 
df =pd.DataFrame(list(zip(c1,c2,c3,c4)))
df.columns =["Courses","Fee","Duration","Discount"]
#  altering the DataFrame
df2 = df[["Courses","Fee","Discount","Duration"]]
print(df2)

# Using DataFrame.reindex() to change columns order
change_column = ['Courses','Duration','Fee','Discount']
df = df.reindex(columns=change_column)
print(df)

#change order of columns
df = df.reindex(['Courses','Duration','Fee','Discount'], axis=1)
print(df) 

# Change sorted order columns
df = df.reindex(sorted(df.columns), axis=1)
print(df)

# Reorder DataFrame column in sorted order
df = df.reindex(columns=sorted(df.columns))
print(df)

# Using DataFrame constructor
df = pd.DataFrame(df, columns=['Courses','Discount','Duration','Fee'])
print(df)

df = df.columns.tolist()
# Rearrange the list any way you want
df2 = df[-1:] + df[:-1]
print(df2)

# Using desired order to change column
df2 = df[ ['Duration'] + [ col for col in df.columns if col != 'Duration']]
print(df2)

Conclusion

In this article, you have learned how to change the order of DataFrame columns in pandas using DataFrame.reindex(), DataFrame construction and referring indexes. Also, learned how to sort DataFrame columns with examples.

Happy Learning !!

References

Naveen (NNK)

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply

You are currently viewing Pandas – Change the Order of DataFrame Columns