Pandas Split Column into Two Columns

Pandas Series.str.the split() function is used to split the one string column value into two columns based on a specified separator or delimiter. This function works the same as Python.string.split() method, but the split() method works on all Dataframe columns, whereas the Series.str.split() function works on specified columns.

In this article, I will explain Series.str.split() and using its syntax and parameters how we can split a column into multiple columns in Pandas with examples.

1. Quick Examples of Split Column into Two Columns

Following are quick examples of splitting a string column into two columns.


# Below are the quick examples
# Example 1: Split column of lists into two new columns
 Split string column into two new columns
df[['First Name', 'Last Name']] = df.Student_details.str.split("_", expand = True)

# Example 2: Split single column into two columns use ',' delimiter
df[['First Name', 'Last Name']] = df.Student_details.str.split(",", expand = True)

# Example 3: Split single column into two columns use ',' delimiter
df[['First Name', 'Last Name']] = df.Student_details.str.split(",", expand = True)

# Example 4: Split single column into two columns use apply()
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split(",")))

# Example 5: # Split single column into two columns use apply()
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split("_")))

2. Syntax of Series.str.split()

Following is the syntax of Series.str.split().


# Syntax of Series.str.split()
Series.str.split(pat=None, n=-1, expand=False)

2.1 Parameters of Series.str.split()

  • pat: It is a delimiter symbol, is used to split a single column into two columns. By default it is whitespace.
  • n: (int type) Is a number of splits, default is -1.
  • expand: (bool type)Default is False. If it is set to True, this function will return DataFrame. By default it returns Series.

2.2 Return Value

It returns DataFrame/Series

3. Usage of Series.str.split()

Pandas provide Series.str.split() function that is used to split the string column value into two or multiple columns along with a specified delimiter. Delimited string values are multiple values in a single column that are either separated by dashes, whitespace, comma, e.t.c. This function returns Pandas Series or DataFrame.

Let’s create Pandas DataFrame using data from a Python dictionary I have a DataFrame with one (string) column named 'Student_details' and I would like to split it into two (string) columns named 'First Name', and 'Last Name'.


import pandas as pd
import numpy as np
technologies = {
    'Student_details':["Pramodh_Roy", "Leena_Singh", "James_William", "Addem_Smith"],
    'Courses':["Spark", "PySpark", "Pandas",  "Hadoop"],
    'Fee' :[25000, 20000, 22000, 25000]
              }
df = pd.DataFrame(technologies)
print(df)

Yields below output.


# Output:
  Student_details  Courses    Fee
0     Pramodh_Roy    Spark  25000
1     Leena_Singh  PySpark  20000
2   James_William   Pandas  22000
3     Addem_Smith   Hadoop  25000

4. Split String Column into Two Columns in Pandas

Apply Pandas Series.str.split() on a given DataFrame column to split into multiple columns where column has delimited string values. Here, I specified the '_'(underscore) delimiter between the string values of one of the columns (which we want to split into two columns) of our DataFrame. So we pass '_' as the first argument to the Series.str.split() function.

Let’s apply above function and split the column into two columns,


# Split string column into two new columns
df[['First Name', 'Last Name']] = df.Student_details.str.split("_", expand = True)
print(df)

Yields below output


# Output:
  Student_details  Courses    Fee First Name Last Name
0     Pramodh_Roy    Spark  25000    Pramodh       Roy
1     Leena_Singh  PySpark  20000      Leena     Singh
2   James_William   Pandas  22000      James   William
3     Addem_Smith   Hadoop  25000      Addem     Smith

5. Use ‘,’ Delimiter & Split Column

In this example, I specified the ','(comma) delimiter between the string values of one of the columns (which we want to split into two columns) of Our DataFrame.


# Create One of the column of DataFrame 
# contain ',' delimiter values
'Student_details':["Pramodh, Roy", "Leena, Singh", "James, William", "Addem, Smith"]
    
# Split single column into two columns use ',' delimiter
df[['First Name', 'Last Name']] = df.Student_details.str.split(",", expand = True)
print(df)

Yields below output’


# Output:
  Student_details  Courses    Fee First Name Last Name
0    Pramodh, Roy    Spark  25000    Pramodh       Roy
1    Leena, Singh  PySpark  20000      Leena     Singh
2  James, William   Pandas  22000      James   William
3    Addem, Smith   Hadoop  25000      Addem     Smith

6. Use apply() Function Split Column into two columns In Pandas

In Pandas, the apply() function is used to execute a function that can be used to split one column values into multiple columns. For that, we have to pass the lambda function and Series.str.split() into pandas apply() function, then call the DataFrame column, which we want to split into two columns.


# Split single column into two columns use apply()
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split(",")))
print(df)

Yields below output.


# Output:
 Student_details  Courses    Fee First Name Last Name
0    Pramodh, Roy    Spark  25000    Pramodh       Roy
1    Leena, Singh  PySpark  20000      Leena     Singh
2  James, William   Pandas  22000      James   William
3    Addem, Smith   Hadoop  25000      Addem     Smith

6.1 Using Underscore(_)

In this example, I have separated one of the column values of a given DataFrame using (‘_’) underscore delimiter. We pass ‘_’ as a param of the split() function along with lambda and apply() function.


# Create One of the column of DataFrame 
# contain '_' delimiter values
'Student_details':["Pramodh_Roy", "Leena_Singh", "James_William", "Addem_Smith"]

# Split single column into two columns use apply()
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split("_")))
print(df)
    

Yields below output.


# Output:
  Student_details  Courses    Fee First Name Last Name
0     Pramodh_Roy    Spark  25000    Pramodh       Roy
1     Leena_Singh  PySpark  20000      Leena     Singh
2   James_William   Pandas  22000      James   William
3     Addem_Smith   Hadoop  25000      Addem     Smith

7. Conclusion

In this article, I have explained Series.str.split() function and using its syntax and parameters how to split Pandas DataFrame string column into multiple columns. Also, I have used apply() function in some examples for splitting one string column into two columns.

References

Leave a Reply