Pandas Create New DataFrame By Selecting Specific Columns

  • Post author:
  • Post category:Pandas
  • Post last modified:October 5, 2023

You can create new pandas DataFrame by selecting specific columns by using DataFrame.copy(), DataFrame.filter(), DataFrame.transpose(), DataFrame.assign() functions. DataFrame.iloc[] and DataFrame.loc[] are also used to select columns. In this article, I will explain how to select a single column or multiple columns to create a new pandas Dataframe with detailed examples.

1. Quick Examples to Create New DataFrame by Selecting Specific Columns

If, You are in hurry below are some quick examples to create a new DataFrame by selecting specific columns.


# Below are some quick examples.

# Using DataFrame.copy() create new DaraFrame.
df2 = df[['Courses', 'Fee']].copy()

# Using DataFrame.filter() method.
df2 = df.filter(['Courses','Fee'], axis=1)

# Using DataFrame.transpose() Method.
df2 = pd.DataFrame([df.Courses, df.Fee]).transpose()

# Using DataFrame.iloc[] create new DataFrame by df.copy().
df2 = df.iloc[: , [1, 2]].copy()

# Using DataFrame.loc[] create new DataFrame by specific column.
df2=df.loc[:, df.columns.drop(['Courses', 'Discount'])]

# Create New DataFrame of Specific column by DataFrame.assign() method.
df2 = pd.DataFrame().assign(Courses=df['Courses'], Duration=df['Duration'])

# Create new pandas DataFrame.
df2 = df[['Courses','Fee']]

Now, let’s create a Pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names CoursesFeeDuration, and Discount.


# Create a Pandas DataFrame.
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  20000   30days      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      1200
3   pandas  30000   50days      2000

2. Using DataFrame.copy() Create New DataFrame

Pandas.DataFrame.copy() function returns a copy of the DataFrame. Select the columns from the original DataFrame and copy it to create a new DataFrame using copy() function.


# Using DataFrame.copy() create new DaraFrame.
df2 = df[['Courses', 'Fee']].copy()
print(df2)

Yields below output.


# Output:
   Courses    Fee
0    Spark  20000
1  PySpark  25000
2   Python  22000
3   pandas  30000

Alternatively, You can also use DataFrame.filter() method to create a copy and create a new DataFrame by selecting specific columns.


# Using DataFrame.filter() method.
df2 = df.filter(['Courses','Fee'], axis=1)
print(df2)

Yields output same as above.

3. Using DataFrame.transpose() Method

DataFrame.transpose() method is used to transpose index and column. It reflects the DataFrame writing rows as columns and vice-versa. Use df.columnname to select the column as a Series and pass all these column names you wanted to a constructor to create a DataFrame.


# Using DataFrame.transpose() Method.
df2 = pd.DataFrame([df.Courses, df.Fee]).transpose()
print(df2)

Yields below output.


# Output:
   Courses    Fee
0    Spark  20000
1  PySpark  25000
2   Python  22000
3   pandas  30000

4. Using DataFrame.iloc[] Create New DataFrame by DataFrame.copy()

The DataFrame.iloc[] property gets or sets, the values of the specified index. The df.iloc[] specify both row and column with an index.


# Using DataFrame.iloc[] create new DataFrame by df.copy().
df2 = df.iloc[: , [1, 2]].copy()
print(df2)

Yields below output.


# Output:
     Fee Duration
0  20000   30days
1  25000   40days
2  22000   35days
3  30000   50days

5. Using DataFrame.loc[] Create New DataFrame by Specific Column

DataFrame.loc[] property is used to access a group of rows and columns by label(s) or a boolean array. The .loc[] property may also be used with a boolean array. In the below example use drop() function to drop the unwanted columns from pandas DataFrame.


# Using DataFrame.loc[] create new DataFrame by specific column.
df2=df.loc[:, df.columns.drop(['Courses', 'Discount'])]
print(df2)

Yields below output.


# Output:
     Fee Duration
0  20000   30days
1  25000   40days
2  22000   35days
3  30000   50days

6. Create New DataFrame of Specific Column by DataFrame.assign()

You can create a new DataFrame of a specific column by using DataFrame.assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.


# Create New DataFrame of Specific column by DataFrame.assign() method.
df2 = pd.DataFrame().assign(Courses=df['Courses'], Duration=df['Duration'])
print(df2)

Yields below output.


# Output:
   Courses Duration
0    Spark   30days
1  PySpark   40days
2   Python   35days
3   pandas   50days

7. Other Example

Another simple way to create new pandas DataFrame of selected columns.


# Create new pandas DataFrame.
df2 = df[['Courses','Fee']]
print(df2)

Yields below output.


# Output:
   Courses    Fee
0    Spark  20000
1  PySpark  25000
2   Python  22000
3   pandas  30000

8. Complete Examples To Create New Pandas DataFrame of Specified Column

Below are the complete examples to create new pandas DataFrame by selecting specific column.


# Create a Pandas DataFrame.
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
df = pd.DataFrame(technologies)
print(df)

# Using DataFrame.copy() create new DaraFrame.
df2 = df[['Courses', 'Fee']].copy()
print(df2)

# Using DataFrame.filter() method.
df2 = df.filter(['Courses','Fee'], axis=1)
print(df2)

# Using DataFrame.transpose() Method.
df2 = pd.DataFrame([df.Courses, df.Fee]).transpose()
print(df2)

# Using DataFrame.iloc[] create new DataFrame by df.copy().
df2 = df.iloc[: , [1, 2]].copy()
print(df2)

# Using DataFrame.loc[] create new DataFrame by specific column.
df2=df.loc[:, df.columns.drop(['Courses', 'Discount'])]
print(df2)

# Create New DataFrame of Specific column by DataFrame.assign() method.
df2 = pd.DataFrame().assign(Courses=df['Courses'], Duration=df['Duration'])
print(df2)

# Create new pandas DataFrame.
df2 = df[['Courses','Fee']]
print(df2)

Conclusion

In this article, You have learned how to create a new pandas DataFrame by selecting specific columns by using DataFrame.copy(), DataFrame.filter(), DataFrame.transpose(), DataFrame.assign() functions. DataFrame.iloc[] and DataFrame.loc[] properties also used to select a single column or multiple columns from pandas DataFrame.

References

Naveen

I am a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, I have honed my expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. My journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. I have started this SparkByExamples.com to share my experiences with the data as I come across. You can learn more about me at LinkedIn

Leave a Reply

You are currently viewing Pandas Create New DataFrame By Selecting Specific Columns