• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:17 mins read
You are currently viewing Pandas – Change the Order of DataFrame Columns

You can use DataFrame.reindex() to change the order of pandas DataFrame columns, In this article, I will explain how to change the order of DataFrame columns in pandas and how to sort columns in alphabetical order. One easy way to re-arrange columns would be to reassign the same DataFrame with the order of the columns changed, this is similar to selecting the DataFrame with the desired order and assign it to another DataFrame.

Key Points –

  • The order of DataFrame columns can significantly impact data analysis and visualization.
  • Pandas offers multiple methods to change the order of DataFrame columns, such as direct indexing and the DataFrame.reindex() method.
  • Reordering columns can enhance readability and facilitate downstream data processing tasks.
  • Ensure consistency in column ordering across operations and analyses to maintain clarity and reproducibility.
  • Choosing a logical and intuitive order for columns can streamline data exploration and manipulation workflows.

1. Create a DataFrame with a Dictionary of Lists

Now, let’s create a DataFrame with a few rows and columns to explain changing the column order with examples. Our DataFrame contains column names Courses, Fee, Duration, and Discount.


# Create a DataFrame with a Dictionary of Lists
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30days', '40days' ,'35days', '40days', '60days', '50days', '55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000]
                }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  20000   30days      1000
1  PySpark  25000   40days      2300
2   Hadoop  26000   35days      1500
3   Python  22000   40days      1200
4   pandas  24000   60days      2500
5   Oracle  21000   50days      2100
6     Java  22000   55days      2000

2. Change Order of Columns in Pandas DataFrame

You can change/rearrange the order of the DataFrame columns in any way you want by specifying the columns in a list to df[], for example df[['Discount',"Fee","Courses","Duration"]].


# Using double brackets to change column
df = pd.DataFrame(technologies)
df2 = df[['Discount',"Fee","Courses","Duration"]]
print(df2)  

Yields below output.


# Output:
   Discount    Fee  Courses Duration
0      1000  20000    Spark   30days
1      2300  25000  PySpark   40days
2      1500  26000   Hadoop   35days
3      1200  22000   Python   40days
4      2500  24000   pandas   60days
5      2100  21000   Oracle   50days
6      2000  22000     Java   55days

This code will output the DataFrame with the columns reordered according to your specification. You can replace the column names with the order you desire.

3. Change Columns Order Using DataFrame.reindex()

Alternatively, you can also change the order of columns in a pandas DataFrame using the DataFrame.reindex() method. Use df.reindex(columns=change_column) with a list of columns in the desired order as change_column to reorder the columns.


# Using DataFrame.reindex() to change columns order
change_column = ['Courses','Duration','Fee','Discount']
df = df.reindex(columns=change_column)
print(df)

# You can also try
df = df.reindex(['Courses','Duration','Fee','Discount'], axis=1)
print(df)

This program will output the DataFrame with the columns reordered according to your specification using the reindex() method. You can replace the change_column list with the order you desire.


# Output:
   Courses Duration    Fee  Discount
0    Spark   30days  20000      1000
1  PySpark   40days  25000      2300
2   Hadoop   35days  26000      1500
3   Python   40days  22000      1200
4   pandas   60days  24000      2500
5   Oracle   50days  21000      2100
6     Java   55days  22000      2000

4. Reorder DataFrame Columns in Sorted Order

You can get the pandas DataFrame column names as a list using df.columns, use sorted() method to sort the columns and send the sorted columns to DataFrame.reindex() method get a DataFrame with sort ordered columns


# Change sorted order columns
df = df.reindex(sorted(df.columns), axis=1)
print(df)

# Reorder DataFrame column in sorted order
df = df.reindex(columns=sorted(df.columns))
print(df)

Yields below output.


# Output:
   Courses  Discount Duration    Fee
0    Spark      1000   30days  20000
1  PySpark      2300   40days  25000
2   Hadoop      1500   35days  26000
3   Python      1200   40days  22000
4   pandas      2500   60days  24000
5   Oracle      2100   50days  21000
6     Java      2000   55days  22000

5. Using DataFrame Constructor

You can also use pd.DataFrame(df,columns=['Courses','Discount','Duration','Fee']) to rearrange the order of columns from the existing DataFrame. Consider the existing DataFrame as df, and create a new DataFrame column.


# Using DataFrame constructor
df = pd.DataFrame(df, columns=['Courses','Discount','Duration','Fee'])
print(df)

In our case yields the same output as above.

6. Pandas Reorder the Columns

Use df=df.columns.tolist() to rearrange the list anyway you want to reorder the pandas DataFrame column. For instance, df2=df[-1:]+df[:-1] method.


df = df.columns.tolist()
# Rearrange the list any way you want
df2 = df[-1:] + df[:-1]
print(df2)

Yields below output.


# Output:
'Discount', 'Courses', 'Fee', 'Duration']

7. Create New List Column in the Desired Order

You need to create a new list of your columns in the desired order, then use df[['Duration']+[col for col in df.columns if col!='Duration']] to rearrange the columns in this new order.


# Using desired order to change column
df2 = df[ ['Duration'] + [ col for col in df.columns if col != 'Duration']]
print(df2)

Yields below output.


# Output:
  Duration  Courses    Fee  Discount
0   30days    Spark  20000      1000
1   40days  PySpark  25000      2300
2   35days   Hadoop  26000      1500
3   40days   Python  22000      1200
4   60days   pandas  24000      2500
5   50days   Oracle  21000      2100
6   55days     Java  22000      2000

You can also use [df.columns[-2]]+[col for col in df if col!=df.columns[-2]] to the last column (indicated by -2) is inserted as the first column.


df2 = [df.columns[-2]] + [col for col in df if col != df.columns[-2]]
print(df2)

Yields below output.


# Output:
['Duration', 'Courses', 'Fee', 'Discount']

8. Complete Example


import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas","Oracle","Java"],
    'Fee' :[20000,25000,26000,22000,24000,21000,22000],
    'Duration':['30days', '40days' ,'35days', '40days', '60days', '50days', '55days'],
    'Discount':[1000,2300,1500,1200,2500,2100,2000]
                }
df = pd.DataFrame(technologies)
print(df)

# Using double brackets to change columns
df = pd.DataFrame(technologies)
df2 = df[['Discount',"Fee","Courses","Duration"]]
print(df2)  

# Using Pandas.DataFrame.list(zip()) 
df =pd.DataFrame(list(zip(c1,c2,c3,c4)))
df.columns =["Courses","Fee","Duration","Discount"]
# Altering the DataFrame
df2 = df[["Courses","Fee","Discount","Duration"]]
print(df2)

# Using DataFrame.reindex() to change columns order
change_column = ['Courses','Duration','Fee','Discount']
df = df.reindex(columns=change_column)
print(df)

# Change order of columns
df = df.reindex(['Courses','Duration','Fee','Discount'], axis=1)
print(df) 

# Change sorted order columns
df = df.reindex(sorted(df.columns), axis=1)
print(df)

# Reorder DataFrame column in sorted order
df = df.reindex(columns=sorted(df.columns))
print(df)

# Using DataFrame constructor
df = pd.DataFrame(df, columns=['Courses','Discount','Duration','Fee'])
print(df)

df = df.columns.tolist()
# Rearrange the list any way you want
df2 = df[-1:] + df[:-1]
print(df2)

# Using desired order to change column
df2 = df[ ['Duration'] + [ col for col in df.columns if col != 'Duration']]
print(df2)

Frequently Asked Questions on Change the Order of DataFrame Columns

Can I change the order of DataFrame columns without modifying the original DataFrame?

You can create a new DataFrame with the desired column order without modifying the original DataFrame. This can be achieved by reassigning the DataFrame with the reordered columns or by using the reindex() method.

What is the best way to reorder columns in a DataFrame?

The best way depends on personal preference and specific requirements. Both direct indexing and the reindex() method are commonly used and efficient ways to reorder columns in a DataFrame. Choose the method that best fits your coding style and workflow.

Will changing the order of columns affect the data in my DataFrame?

Changing the order of columns does not affect the data itself. It only changes the way the data is presented within the DataFrame. The values in each row remain associated with their respective column labels, regardless of the column order.

Can I reorder columns based on specific criteria, such as alphabetical order or data type?

You can reorder columns based on specific criteria. For example, you can use sorting functions or conditional statements to reorder columns alphabetically or based on data type. However, keep in mind that this may require additional processing steps.

Is there a limit to the number of columns I can reorder in a DataFrame?

There is no inherent limit to the number of columns you can reorder in a DataFrame. You can reorder as many columns as needed based on your data analysis requirements. However, keep in mind memory limitations and computational efficiency when working with large datasets.

Conclusion

In this article, you have learned how to change the order of DataFrame columns in pandas using DataFrame.reindex(), DataFrame construction and referring indexes. Also, learned how to sort DataFrame columns with examples.

Happy Learning !!

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply