The Pandas DataFrame can be split into smaller DataFrames based on either single or multiple-column values. Pandas provide various features and functions for splitting DataFrame into smaller ones by using the index/value of column index, and row index.
In this article, I will explain how to split a Pandas DataFrame by column value condition and also I explain using the df.groupby()
function how we can split the DataFrame based on single column value/multiple column values.
Quick Examples of Split DataFrame by Column Value
If you are in a hurry, below are some quick examples of splitting Pandas DataFrame by column value.
# Below are the quick examples.
# Example 1: Split DataFrame based on column value condition
df1 = df[df['Fee'] <= 25000]
# Example 2: Split DataFrame based on Duration == 35days
df1 = df[df['Duration'] == '35days']
# Example 3: Split Dataframe using groupby() &
# Grouping by particular dataframe column
grouped = df.groupby(df.Duration)
df1 = grouped.get_group("35days")
# Example 4: Split Dataframe using groupby() &
# Grouping by multiple columns
grouped = df.groupby(['Discount', 'Fee'])
df1 = grouped.get_group((1000, 23000))
To run some examples of split Pandas DataFrame by column value, let’s create Pandas DataFrame using data from a dictionary.
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
'Fee' :[22000, 25000, 23000, 24000, 26000],
'Discount':[1000, 2300, 1000, 1200, 2500],
'Duration':['35days', '35days', '40days', '30days', '25days']
}
df = pd.DataFrame(technologies)
print("Create DateFrame:\n", df)
Yields below output.
Split DataFrame Based on Column Value Condition
We can create smaller DataFrames from a given DataFrame based on a specified column value by using the condition. Using the below syntax we can split the given DataFrame into smaller DataFrame using conditions based on specified column value.
# Split DataFrame based on column value condition
df1 = df[df['Fee'] <= 25000]
print("After splitting by column value:\n", df1)
Yield below output.
In another example, I will apply the condition 'Duration' == '35days'
on the given DataFrame. It splits the DataFrame based on the condition and returns the smaller DataFrame.
# Split DataFrame based on Duration == 35days
df1 = df[df['Duration'] == '35days']
print("After splitting by column value:\n", df1)
Yields below output.
# Output:
# After splitting by column value:
Courses Fee Discount Duration
0 Spark 22000 1000 35days
1 PySpark 25000 2300 35days
Split DataFrame by Unique Column Value
The Pandas groupby() function serves to partition a DataFrame according to the values in one or more columns. Initially, we use groupby()
to segment the DataFrame based on specified column values. Then, we can extract specific groups by utilizing the get_group()
function. This function proves most effective when we aim to divide a DataFrame based on a specified column containing unique values.
# Split Dataframe using groupby() &
# Grouping by particular dataframe column
grouped = df.groupby(['Duration'])
df1 = grouped.get_group("35days")
print("After splitting by column value:\n", df1)
Yields below output.
# Output:
# After splitting by column value:
Courses Fee Discount Duration
0 Spark 22000 1000 35days
1 PySpark 25000 2300 35days
Pandas Split by Multiple Column Values
Similarly, we can also use the groupby()
function to perform the splitting of more than one column of a given DataFrame. For that, we need to specify more than one column that we want to group using the groupby()
function and select specified groups using the get_group()
function. For example.
# Split Dataframe apply groupby() on multiple columns
grouped = df.groupby(['Discount', 'Fee'])
df1 = grouped.get_group((1000, 23000))
print("After splitting by column value:\n", df1)
Yields below output.
# Output:
# After splitting by column value:
Courses Fee Discount Duration
2 Hadoop 23000 1000 40days
Conclusion
In this article, you have learned to split a Pandas DataFrame based on column value condition and also I explain using the df.groupby()
function, the process of splitting the DataFrame based on either single-column value/multiple-column values.
Happy learning!!
Related Articles
- PySpark Convert String to Array Column
- PySpark split() Column into Multiple Columns
- Split the column of DataFrame into two columns
- How to Unpivot DataFrame in Pandas?
- Pandas Groupby Aggregate Explained
- Pandas GroupBy Multiple Columns Explained
- Pandas Groupby Sort within Groups
- Spark split() function to convert string to Array column