• Post author:
  • Post category:Pandas
  • Post last modified:December 10, 2024
  • Reading time:15 mins read
You are currently viewing Pandas Create New DataFrame By Selecting Specific Columns

To create a new DataFrame by selecting specific columns from an existing DataFrame in Pandas, you can use the DataFrame.copy(), DataFrame.filter(), DataFrame.transpose(), DataFrame.assign() functions. DataFrame.iloc[] and DataFrame.loc[] are also used to select columns. In this article, I will explain how to select a single column or multiple columns to create new pandas DataFrame with detailed examples.

Advertisements

Key Points –

  • Selecting specific columns from a DataFrame can help reduce data size, improving performance and readability.
  • Use double square brackets (df[['col1', 'col2']]) to select multiple columns, resulting in a new DataFrame.
  • Use single square brackets (df['col1']) to select a single column, which returns a Series instead of a DataFrame.
  • The .loc indexer allows you to select columns by label, supporting both single and multiple column selections.
  • The .iloc indexer enables column selection by position, useful when column names are unknown or dynamically generated.
  • Column selection with .filter() can utilize wildcard patterns to match column names dynamically.

Quick Examples to Create New DataFrame by Selecting Specific Columns

Following are quick examples of creating a new DataFrame by selecting specific columns.


# Quick examples to create new dataframe

# Using DataFrame.copy() create new daraframe
df2 = df[['Courses', 'Fee']].copy()

# Using DataFrame.filter() method
df2 = df.filter(['Courses','Fee'], axis=1)

# Using DataFrame.transpose() method
df2 = pd.DataFrame([df.Courses, df.Fee]).transpose()

# Using DataFrame.iloc[] 
# Create new DataFrame by df.copy()
df2 = df.iloc[: , [1, 2]].copy()

# Using DataFrame.loc[] create new DataFrame by specific column
df2=df.loc[:, df.columns.drop(['Courses', 'Discount'])]

# Create new dataframe of Specific column by DataFrame.assign() method
df2 = pd.DataFrame().assign(Courses=df['Courses'], Duration=df['Duration'])

# Create new pandas DataFrame
df2 = df[['Courses','Fee']]

To run some examples of creating a new Pandas DataFrame by selecting specific columns, let’s create a Pandas DataFrame using data from a dictionary.


# Create a Pandas DataFrame.
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  20000   30days      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      1200
3   pandas  30000   50days      2000

Using DataFrame.copy() Create New DataFrame

Pandas.DataFrame.copy() function returns a copy of the DataFrame. Select the columns from the original DataFrame and copy it to create a new DataFrame using copy() function.


# Using DataFrame.copy() create new DaraFrame.
df2 = df[['Courses', 'Fee']].copy()
print(df2)

Yields below output.


# Output:
   Courses    Fee
0    Spark  20000
1  PySpark  25000
2   Python  22000
3   pandas  30000

Alternatively, You can also use DataFrame.filter() method to create a copy and create a new DataFrame by selecting specific columns.


# Using DataFrame.filter() method
df2 = df.filter(['Courses','Fee'], axis=1)
print(df2)

Yields output same as above.

Using DataFrame.transpose() Method

DataFrame.transpose() method is used to transpose index and column. It reflects the DataFrame writing rows as columns and vice-versa. Use df.columnname to select the column as a Series and pass all these column names you wanted to a constructor to create a DataFrame.


# Using DataFrame.transpose() method
df2 = pd.DataFrame([df.Courses, df.Fee]).transpose()
print(df2)

Yields below output.


# Output:
   Courses    Fee
0    Spark  20000
1  PySpark  25000
2   Python  22000
3   pandas  30000

Using DataFrame.iloc[] Create New DataFrame by DataFrame.copy()

The DataFrame.iloc[] property gets or sets, the values of the specified index. The df.iloc[] specify both row and column with an index.


# Using DataFrame.iloc[] 
# Create new DataFrame by df.copy().
df2 = df.iloc[: , [1, 2]].copy()
print(df2)

Yields below output.


# Output:
     Fee Duration
0  20000   30days
1  25000   40days
2  22000   35days
3  30000   50days

Using DataFrame.loc[] Create New DataFrame by Specific Column

DataFrame.loc[] property is used to access a group of rows and columns by label(s) or a boolean array. The .loc[] property may also be used with a boolean array. In the below example use drop() function to drop the unwanted columns from pandas DataFrame.


# Using DataFrame.loc[] create new DataFrame by specific column.
df2=df.loc[:, df.columns.drop(['Courses', 'Discount'])]
print(df2)

Yields below output.


# Output:
     Fee Duration
0  20000   30days
1  25000   40days
2  22000   35days
3  30000   50days

Create New DataFrame of Specific Column by DataFrame.assign()

You can create a new DataFrame of a specific column by using DataFrame.assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.


# Create New DataFrame of Specific column by DataFrame.assign() method
df2 = pd.DataFrame().assign(Courses=df['Courses'], Duration=df['Duration'])
print(df2)

Yields below output.


# Output:
   Courses Duration
0    Spark   30days
1  PySpark   40days
2   Python   35days
3   pandas   50days

Other Example

Another simple way to create new pandas DataFrame of selected columns.


# Create new pandas DataFrame.
df2 = df[['Courses','Fee']]
print(df2)

Yields below output.


# Output:
   Courses    Fee
0    Spark  20000
1  PySpark  25000
2   Python  22000
3   pandas  30000

Complete Examples To Create New Pandas DataFrame of Specified Column

Below are the complete examples to create new pandas DataFrame by selecting specific column.


# Create a Pandas DataFrame.
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
df = pd.DataFrame(technologies)
print(df)

# Using DataFrame.copy() create new DaraFrame
df2 = df[['Courses', 'Fee']].copy()
print(df2)

# Using DataFrame.filter() method
df2 = df.filter(['Courses','Fee'], axis=1)
print(df2)

# Using DataFrame.transpose() method
df2 = pd.DataFrame([df.Courses, df.Fee]).transpose()
print(df2)

# Using DataFrame.iloc[] 
# Create new DataFrame by df.copy()
df2 = df.iloc[: , [1, 2]].copy()
print(df2)

# Using DataFrame.loc[] 
#Ccreate new DataFrame by specific column
df2=df.loc[:, df.columns.drop(['Courses', 'Discount'])]
print(df2)

# Create New DataFrame of Specific column by DataFrame.assign() method
df2 = pd.DataFrame().assign(Courses=df['Courses'], Duration=df['Duration'])
print(df2)

# Create new pandas DataFrame
df2 = df[['Courses','Fee']]
print(df2)

FAQ on Pandas Create New DataFrame By Selecting Specific Columns

How do I select specific columns from a DataFrame?

You can use square brackets ([]) to select specific columns. Pass a list of column names to create a new DataFrame.

What happens if I select a single column without using a list?

If you select a single column from a Pandas DataFrame without using a list, you will get a Pandas Series instead of a DataFrame.

How do I handle columns that may not exist?

Use the filter method with the errors='ignore' parameter to avoid errors if a column is missing.

Can I use column indexes instead of names?

You can use column indexes instead of names to select columns in a Pandas DataFrame. You can do this using the .iloc[] method, which allows you to index by position (rather than by label).

How do I reorder columns when creating a new DataFrame?

To reorder columns when creating a new DataFrame in Pandas, you simply specify the desired column order by passing a list of column names in the desired sequence. This can be done by accessing the columns directly from the original DataFrame and reordering them.

Is there a way to select columns based on a condition?

You can select columns from a Pandas DataFrame based on a condition, such as columns whose names meet a specific pattern or condition. Below are a few approaches you can use to filter columns based on different conditions.

Conclusion

In this article, I have explained create a new Pandas DataFrame by selecting specific columns using various functions such as DataFrame.copy(), DataFrame.filter(), DataFrame.transpose(), and DataFrame.assign(). Additionally, we explored using the DataFrame.iloc[] and DataFrame.loc[] properties for selecting single or multiple columns from a Pandas DataFrame.

References

Leave a Reply