Site icon Spark By {Examples}

Pandas Create New DataFrame By Selecting Specific Columns

pandas create new columns

You can create new pandas DataFrame by selecting specific columns by using DataFrame.copy(), DataFrame.filter(), DataFrame.transpose(), DataFrame.assign() functions. DataFrame.iloc[] and DataFrame.loc[] are also used to select columns. In this article, I will explain how to select a single column or multiple columns to create a new pandas Dataframe with detailed examples.

1. Quick Examples to Create New DataFrame by Selecting Specific Columns

If, You are in hurry below are some quick examples to create a new DataFrame by selecting specific columns.


# Below are some quick examples.

# Using DataFrame.copy() create new DaraFrame.
df2 = df[['Courses', 'Fee']].copy()

# Using DataFrame.filter() method.
df2 = df.filter(['Courses','Fee'], axis=1)

# Using DataFrame.transpose() Method.
df2 = pd.DataFrame([df.Courses, df.Fee]).transpose()

# Using DataFrame.iloc[] create new DataFrame by df.copy().
df2 = df.iloc[: , [1, 2]].copy()

# Using DataFrame.loc[] create new DataFrame by specific column.
df2=df.loc[:, df.columns.drop(['Courses', 'Discount'])]

# Create New DataFrame of Specific column by DataFrame.assign() method.
df2 = pd.DataFrame().assign(Courses=df['Courses'], Duration=df['Duration'])

# Create new pandas DataFrame.
df2 = df[['Courses','Fee']]

Now, let’s create a Pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names CoursesFeeDuration, and Discount.


# Create a Pandas DataFrame.
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
   Courses    Fee Duration  Discount
0    Spark  20000   30days      1000
1  PySpark  25000   40days      2300
2   Python  22000   35days      1200
3   pandas  30000   50days      2000

2. Using DataFrame.copy() Create New DataFrame

Pandas.DataFrame.copy() function returns a copy of the DataFrame. Select the columns from the original DataFrame and copy it to create a new DataFrame using copy() function.


# Using DataFrame.copy() create new DaraFrame.
df2 = df[['Courses', 'Fee']].copy()
print(df2)

Yields below output.


# Output:
   Courses    Fee
0    Spark  20000
1  PySpark  25000
2   Python  22000
3   pandas  30000

Alternatively, You can also use DataFrame.filter() method to create a copy and create a new DataFrame by selecting specific columns.


# Using DataFrame.filter() method.
df2 = df.filter(['Courses','Fee'], axis=1)
print(df2)

Yields output same as above.

3. Using DataFrame.transpose() Method

DataFrame.transpose() method is used to transpose index and column. It reflects the DataFrame writing rows as columns and vice-versa. Use df.columnname to select the column as a Series and pass all these column names you wanted to a constructor to create a DataFrame.


# Using DataFrame.transpose() Method.
df2 = pd.DataFrame([df.Courses, df.Fee]).transpose()
print(df2)

Yields below output.


# Output:
   Courses    Fee
0    Spark  20000
1  PySpark  25000
2   Python  22000
3   pandas  30000

4. Using DataFrame.iloc[] Create New DataFrame by DataFrame.copy()

The DataFrame.iloc[] property gets or sets, the values of the specified index. The df.iloc[] specify both row and column with an index.


# Using DataFrame.iloc[] create new DataFrame by df.copy().
df2 = df.iloc[: , [1, 2]].copy()
print(df2)

Yields below output.


# Output:
     Fee Duration
0  20000   30days
1  25000   40days
2  22000   35days
3  30000   50days

5. Using DataFrame.loc[] Create New DataFrame by Specific Column

DataFrame.loc[] property is used to access a group of rows and columns by label(s) or a boolean array. The .loc[] property may also be used with a boolean array. In the below example use drop() function to drop the unwanted columns from pandas DataFrame.


# Using DataFrame.loc[] create new DataFrame by specific column.
df2=df.loc[:, df.columns.drop(['Courses', 'Discount'])]
print(df2)

Yields below output.


# Output:
     Fee Duration
0  20000   30days
1  25000   40days
2  22000   35days
3  30000   50days

6. Create New DataFrame of Specific Column by DataFrame.assign()

You can create a new DataFrame of a specific column by using DataFrame.assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.


# Create New DataFrame of Specific column by DataFrame.assign() method.
df2 = pd.DataFrame().assign(Courses=df['Courses'], Duration=df['Duration'])
print(df2)

Yields below output.


# Output:
   Courses Duration
0    Spark   30days
1  PySpark   40days
2   Python   35days
3   pandas   50days

7. Other Example

Another simple way to create new pandas DataFrame of selected columns.


# Create new pandas DataFrame.
df2 = df[['Courses','Fee']]
print(df2)

Yields below output.


# Output:
   Courses    Fee
0    Spark  20000
1  PySpark  25000
2   Python  22000
3   pandas  30000

8. Complete Examples To Create New Pandas DataFrame of Specified Column

Below are the complete examples to create new pandas DataFrame by selecting specific column.


# Create a Pandas DataFrame.
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Python","pandas"],
    'Fee' :[20000,25000,22000,30000],
    'Duration':['30days','40days','35days','50days'],
    'Discount':[1000,2300,1200,2000]
              }
df = pd.DataFrame(technologies)
print(df)

# Using DataFrame.copy() create new DaraFrame.
df2 = df[['Courses', 'Fee']].copy()
print(df2)

# Using DataFrame.filter() method.
df2 = df.filter(['Courses','Fee'], axis=1)
print(df2)

# Using DataFrame.transpose() Method.
df2 = pd.DataFrame([df.Courses, df.Fee]).transpose()
print(df2)

# Using DataFrame.iloc[] create new DataFrame by df.copy().
df2 = df.iloc[: , [1, 2]].copy()
print(df2)

# Using DataFrame.loc[] create new DataFrame by specific column.
df2=df.loc[:, df.columns.drop(['Courses', 'Discount'])]
print(df2)

# Create New DataFrame of Specific column by DataFrame.assign() method.
df2 = pd.DataFrame().assign(Courses=df['Courses'], Duration=df['Duration'])
print(df2)

# Create new pandas DataFrame.
df2 = df[['Courses','Fee']]
print(df2)

Conclusion

In this article, You have learned how to create a new pandas DataFrame by selecting specific columns by using DataFrame.copy(), DataFrame.filter(), DataFrame.transpose(), DataFrame.assign() functions. DataFrame.iloc[] and DataFrame.loc[] properties also used to select a single column or multiple columns from pandas DataFrame.

References

Exit mobile version