Pandas Series.str.the split()
function is used to split the one string column value into two columns based on a specified separator or delimiter. This function works the same as Python.string.split()
method, but the split() method works on all Dataframe columns, whereas the Series.str.split() function works on specified columns.
In this article, I will explain Series.str.split() and using its syntax and parameters how we can split a column into multiple columns in Pandas with examples.
1. Quick Examples of Split Column into Two Columns
Following are quick examples of splitting a string column into two columns.
# Below are the quick examples
# Example 1: Split column of lists into two new columns
Split string column into two new columns
df[['First Name', 'Last Name']] = df.Student_details.str.split("_", expand = True)
# Example 2: Split single column into two columns use ',' delimiter
df[['First Name', 'Last Name']] = df.Student_details.str.split(",", expand = True)
# Example 3: Split single column into two columns use ',' delimiter
df[['First Name', 'Last Name']] = df.Student_details.str.split(",", expand = True)
# Example 4: Split single column into two columns use apply()
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split(",")))
# Example 5: # Split single column into two columns use apply()
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split("_")))
2. Syntax of Series.str.split()
Following is the syntax of Series.str.split()
.
# Syntax of Series.str.split()
Series.str.split(pat=None, n=-1, expand=False)
2.1 Parameters of Series.str.split()
pat:
It is a delimiter symbol, is used to split a single column into two columns. By default it is whitespace.n:
(int type) Is a number of splits, default is -1.expand:
(bool type)Default is False. If it is set to True, this function will return DataFrame. By default it returns Series.
2.2 Return Value
It returns DataFrame/Series
3. Usage of Series.str.split()
Pandas provide Series.str.split() function that is used to split the string column value into two or multiple columns along with a specified delimiter. Delimited string values are multiple values in a single column that are either separated by dashes, whitespace, comma, e.t.c. This function returns Pandas Series or DataFrame.
Let’s create Pandas DataFrame using data from a Python dictionary I have a DataFrame with one (string) column named 'Student_details'
and I would like to split it into two (string) columns named 'First Name', and 'Last Name'
.
import pandas as pd
import numpy as np
technologies = {
'Student_details':["Pramodh_Roy", "Leena_Singh", "James_William", "Addem_Smith"],
'Courses':["Spark", "PySpark", "Pandas", "Hadoop"],
'Fee' :[25000, 20000, 22000, 25000]
}
df = pd.DataFrame(technologies)
print(df)
Yields below output.
# Output:
Student_details Courses Fee
0 Pramodh_Roy Spark 25000
1 Leena_Singh PySpark 20000
2 James_William Pandas 22000
3 Addem_Smith Hadoop 25000
4. Split String Column into Two Columns in Pandas
Apply Pandas Series.str.split()
on a given DataFrame column to split into multiple columns where column has delimited string values. Here, I specified the '_'
(underscore) delimiter between the string values of one of the columns (which we want to split into two columns) of our DataFrame. So we pass '_'
as the first argument to the Series.str.split() function.
Let’s apply above function and split the column into two columns,
# Split string column into two new columns
df[['First Name', 'Last Name']] = df.Student_details.str.split("_", expand = True)
print(df)
Yields below output
# Output:
Student_details Courses Fee First Name Last Name
0 Pramodh_Roy Spark 25000 Pramodh Roy
1 Leena_Singh PySpark 20000 Leena Singh
2 James_William Pandas 22000 James William
3 Addem_Smith Hadoop 25000 Addem Smith
5. Use ‘,’ Delimiter & Split Column
In this example, I specified the ','
(comma) delimiter between the string values of one of the columns (which we want to split into two columns) of Our DataFrame.
# Create One of the column of DataFrame
# contain ',' delimiter values
'Student_details':["Pramodh, Roy", "Leena, Singh", "James, William", "Addem, Smith"]
# Split single column into two columns use ',' delimiter
df[['First Name', 'Last Name']] = df.Student_details.str.split(",", expand = True)
print(df)
Yields below output’
# Output:
Student_details Courses Fee First Name Last Name
0 Pramodh, Roy Spark 25000 Pramodh Roy
1 Leena, Singh PySpark 20000 Leena Singh
2 James, William Pandas 22000 James William
3 Addem, Smith Hadoop 25000 Addem Smith
6. Use apply() Function Split Column into two columns In Pandas
In Pandas, the apply() function is used to execute a function that can be used to split one column values into multiple columns. For that, we have to pass the lambda function and Series.str.split() into pandas apply() function, then call the DataFrame column, which we want to split into two columns.
# Split single column into two columns use apply()
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split(",")))
print(df)
Yields below output.
# Output:
Student_details Courses Fee First Name Last Name
0 Pramodh, Roy Spark 25000 Pramodh Roy
1 Leena, Singh PySpark 20000 Leena Singh
2 James, William Pandas 22000 James William
3 Addem, Smith Hadoop 25000 Addem Smith
6.1 Using Underscore(_)
In this example, I have separated one of the column values of a given DataFrame using (‘_’) underscore delimiter. We pass ‘_’ as a param of the split() function along with lambda and apply() function.
# Create One of the column of DataFrame
# contain '_' delimiter values
'Student_details':["Pramodh_Roy", "Leena_Singh", "James_William", "Addem_Smith"]
# Split single column into two columns use apply()
df[['First Name', 'Last Name']] = df["Student_details"].apply(lambda x: pd.Series(str(x).split("_")))
print(df)
Yields below output.
# Output:
Student_details Courses Fee First Name Last Name
0 Pramodh_Roy Spark 25000 Pramodh Roy
1 Leena_Singh PySpark 20000 Leena Singh
2 James_William Pandas 22000 James William
3 Addem_Smith Hadoop 25000 Addem Smith
7. Conclusion
In this article, I have explained Series.str.split()
function and using its syntax and parameters how to split Pandas DataFrame string column into multiple columns. Also, I have used apply() function in some examples for splitting one string column into two columns.
Thank you!