• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:7 mins read
You are currently viewing Split Pandas DataFrame by Column Value

We can split the Pandas DataFrame into smaller DataFrames by using single/multiple column values. Pandas provide various features and functions for splitting DataFrame into smaller ones by using the index/value of column index, and row index.

In this article, I will explain how to split a Pandas DataFrame based on column value condition and also I explain using the df.groupby() function how we can split the DataFrame based on single column value/multiple column values.

1. Quick Examples of Split DataFrame by Column Value

Following are the quick examples of splitting Pandas DataFrame by column value.


# Below are the quick examples.

# Example 1: Split DataFrame based on column value condition
df1 = df[df['Fee'] <= 25000]

# Example 2: Split DataFrame based on Duration == 35days
df1 = df[df['Duration'] == '35days'] 

# Example 3: Split Dataframe using groupby() &
# Grouping by particular dataframe column
grouped = df.groupby(df.Duration)
df1 = grouped.get_group("35days")

# Example 4: Split Dataframe using groupby() &
# Grouping by multiple columns 
grouped = df.groupby(['Discount', 'Fee'])
df1 = grouped.get_group((1000, 23000))

Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are 'Courses', 'Fee', 'Discount', and 'Duration'.


import pandas as pd
import numpy as np
technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1000, 1200, 2500],
    'Duration':['35days', '35days', '40days', '30days', '25days']
          }

df = pd.DataFrame(technologies)
print("Create DateFrame:\n", df)

Yields below output.

Pandas DataFrame split column

2. Split DataFrame based on Column Value condition

We can create smaller DataFrames from a given DataFrame based on a specified column value by using the condition. Using the below syntax we can split the given DataFrame into smaller DataFrame using conditions based on specified column value.


# Split DataFrame based on column value condition
df1 = df[df['Fee'] <= 25000]
print("After splitting by column value:\n", df1)

Yield below output.

Pandas DataFrame split column

In another example, I will apply the condition 'Duration' == '35days' on the given DataFrame. It splits the DataFrame based on the condition and returns the smaller DataFrame.


# Split DataFrame based on Duration == 35days
df1 = df[df['Duration'] == '35days']
print("After splitting by column value:\n", df1)

Yields below output.


# Output:
# After splitting by column value:
   Courses    Fee  Discount Duration
 0    Spark  22000      1000   35days
 1  PySpark  25000      2300   35days

3. Split Dataframe by unique Column Value

The Pandas.groupby() function is used to split the DataFrame based on column values. First, we can group the DataFrame on column values using the groupby() function after that we can select specified groups using the get_group() function. This is the best function when we want to split a DataFrame based on the specified column that has unique values.


# Split Dataframe using groupby() &
# Grouping by particular dataframe column
grouped = df.groupby(['Duration'])
df1 = grouped.get_group("35days")
print("After splitting by column value:\n", df1)

Yields below output.


# Output:
# After splitting by column value:
   Courses    Fee  Discount Duration
0    Spark  22000      1000   35days
1  PySpark  25000      2300   35days

The above syntax has returned a new DataFrame consisting of grouped data where 'Duration' is '35days'.

4. Pandas Split by Multiple Column Values

We can also use the groupby() function to perform the splitting of more than one column of a given DataFrame. For that, we need to specify more than one column that we want to group using the groupby() function and select specified groups using the get_group() function. For example.


# Split Dataframe apply groupby() on multiple columns 
grouped = df.groupby(['Discount', 'Fee'])
df1 = grouped.get_group((1000, 23000))
print("After splitting by column value:\n", df1)

Yields below output.


# Output:
# After splitting by column value:
  Courses    Fee  Discount Duration
2  Hadoop  23000      1000   40days
  

5. Conclusion

In this article, I have explained how to split a Pandas DataFrame based on column value condition and also I explain using the df.groupby() function how we can split the DataFrame based on single column value/multiple column values.

Happy learning!!

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium