Split Pandas DataFrame by Column Value

The Pandas DataFrame can be split into smaller DataFrames based on either single or multiple-column values. Pandas provide various features and functions for splitting DataFrame into smaller ones by using the index/value of column index, and row index.

In this article, I will explain how to split a Pandas DataFrame by column value condition and also I explain using the df.groupby() function how we can split the DataFrame based on single column value/multiple column values.

Quick Examples of Split DataFrame by Column Value

If you are in a hurry, below are some quick examples of splitting Pandas DataFrame by column value.


# Below are the quick examples.

# Example 1: Split DataFrame based on column value condition
df1 = df[df['Fee'] <= 25000]

# Example 2: Split DataFrame based on Duration == 35days
df1 = df[df['Duration'] == '35days'] 

# Example 3: Split Dataframe using groupby() &
# Grouping by particular dataframe column
grouped = df.groupby(df.Duration)
df1 = grouped.get_group("35days")

# Example 4: Split Dataframe using groupby() &
# Grouping by multiple columns 
grouped = df.groupby(['Discount', 'Fee'])
df1 = grouped.get_group((1000, 23000))

To run some examples of split Pandas DataFrame by column value, let’s create Pandas DataFrame using data from a dictionary.


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1000, 1200, 2500],
    'Duration':['35days', '35days', '40days', '30days', '25days']
          }

df = pd.DataFrame(technologies)
print("Create DateFrame:\n", df)

Yields below output.

Split DataFrame Based on Column Value Condition

We can create smaller DataFrames from a given DataFrame based on a specified column value by using the condition. Using the below syntax we can split the given DataFrame into smaller DataFrame using conditions based on specified column value.


# Split DataFrame based on column value condition
df1 = df[df['Fee'] <= 25000]
print("After splitting by column value:\n", df1)

Yield below output.

In another example, I will apply the condition 'Duration' == '35days' on the given DataFrame. It splits the DataFrame based on the condition and returns the smaller DataFrame.


# Split DataFrame based on Duration == 35days
df1 = df[df['Duration'] == '35days']
print("After splitting by column value:\n", df1)

Yields below output.


# Output:
# After splitting by column value:
   Courses    Fee  Discount Duration
 0    Spark  22000      1000   35days
 1  PySpark  25000      2300   35days

Split DataFrame by Unique Column Value

The Pandas groupby() function serves to partition a DataFrame according to the values in one or more columns. Initially, we use groupby() to segment the DataFrame based on specified column values. Then, we can extract specific groups by utilizing the get_group() function. This function proves most effective when we aim to divide a DataFrame based on a specified column containing unique values.


# Split Dataframe using groupby() &
# Grouping by particular dataframe column
grouped = df.groupby(['Duration'])
df1 = grouped.get_group("35days")
print("After splitting by column value:\n", df1)

Yields below output.


# Output:
# After splitting by column value:
   Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days

Pandas Split by Multiple Column Values

Similarly, we can also use the groupby() function to perform the splitting of more than one column of a given DataFrame. For that, we need to specify more than one column that we want to group using the groupby() function and select specified groups using the get_group() function. For example.


# Split Dataframe apply groupby() on multiple columns 
grouped = df.groupby(['Discount', 'Fee'])
df1 = grouped.get_group((1000, 23000))
print("After splitting by column value:\n", df1)

Yields below output.


# Output:
# After splitting by column value:
  Courses    Fee  Discount Duration
2  Hadoop  23000      1000   40days

Conclusion

In this article, you have learned to split a Pandas DataFrame based on column value condition and also I explain using the df.groupby() function, the process of splitting the DataFrame based on either single-column value/multiple-column values.

Happy learning!!

References

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html

Quick Examples of Split DataFrame by Column Value

Split DataFrame Based on Column Value Condition

Split DataFrame by Unique Column Value

Pandas Split by Multiple Column Values

Conclusion

Related Articles

References