Split Pandas DataFrame by Column Value

  • Post author:
  • Post category:Pandas / Python
  • Post last modified:February 16, 2023

We can split the Pandas DataFrame into smaller DataFrames by using single/multiple column values. Pandas provide various features and functions for splitting DataFrame into smaller ones by using index/value of column index, row index.

In this article, I will explain how to split a Pandas DataFrame based on column value condition and also I explain using the df.groupby() function how we can split the DataFrame based on single column value/multiple column values.

1. Quick Examples of Split DataFrame by Column Value

Following are the quick examples of split Pandas DataFrame by column value.


# Below are the quick examples.

# Example 1: Split DataFrame based on column value condition
df1 = df[df['Fee'] <= 25000]

# Example 2: Split DataFrame based on Duration == 35days
df1 = df[df['Duration'] == '35days'] 

# Example 3: Split Dataframe using groupby() &
# Grouping by particular dataframe column
grouped = df.groupby(df.Duration)
df1 = grouped.get_group("35days")

# Example 4: Split Dataframe using groupby() &
# Grouping by multiple columns 
grouped = df.groupby(['Discount', 'Fee'])
df1 = grouped.get_group((1000, 23000))

Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are 'Courses', 'Fee', 'Discount', and 'Duration'.


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1000, 1200, 2500],
    'Duration':['35days', '35days', '40days', '30days', '25days']
          }

df = pd.DataFrame(technologies)
print(df)

Yields below output.

Pandas DataFrame split column

2. Split DataFrame based on Column Value condition

We can create smaller DataFrames from a given DataFrame based on a specified column value by using the condition. Using the below syntax we can split the given DataFrame into smaller DataFrame using conditions based on specified column value.


# Split DataFrame based on column value condition
df1 = df[df['Fee'] <= 25000]
print(df1)

Yield below output.

Pandas DataFrame split column

In another example, I will apply condition 'Duration' == '35days' on the given DataFrame. It splits the DataFrame based on the condition and returns the smaller DataFrame.


# Split DataFrame based on Duration == 35days
df1 = df[df['Duration'] == '35days']
print(df1)

Yields below output.


# Output:
   Courses    Fee  Discount Duration
 0    Spark  22000      1000   35days
 1  PySpark  25000      2300   35days

3. Split Dataframe by unique Column Value

The Pandas.groupby() function is used to split the DataFrame based on column values. First, we can group the DataFrame on column values using the groupby() function after that we can select specified groups using the get_group() function. This is the best function when we want to split a DataFrame based on the specified column that has unique values.


# Split Dataframe using groupby() &
# Grouping by particular dataframe column
grouped = df.groupby(['Duration'])
df1 = grouped.get_group("35days")
print(df1)

Yields below output.


# Output:
   Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days

The above syntax has returned a new DataFrame consisting of grouped data where 'Duration' is '35days'.

4. Pandas Split by Multiple Column Values

We can also use the groupby() function to perform the splitting of more than one column of a given Dataframe. For that, we need to specify more than one column that we want to group using the groupby() function and select specified groups using the get_group() function. For example.


# Split Dataframe apply groupby() on multiple columns 
grouped = df.groupby(['Discount', 'Fee'])
df1 = grouped.get_group((1000, 23000))
print(df1)

Yields below output.


# Output:
  Courses    Fee  Discount Duration
2  Hadoop  23000      1000   40days
  

5. Conclusion

In this article, I have explained how to split a Pandas DataFrame based on column value condition and also I explain using the df.groupby() function how we can split the DataFrame based on single column value/multiple column values.

Happy learning!!

Related Articles

References

Leave a Reply

You are currently viewing Split Pandas DataFrame by Column Value