Using pandas.concat()
method you can combine/merge two or more series into a DataFrame (create DataFrame from multiple series). Besides this, you can also use Series.append()
, pandas.merge()
, DataFrame.join()
to merge multiple Series to create DataFrame.
In pandas, a Series is a one-dimensional labeled array that can hold any data type, such as integers, strings, floating-point numbers, or Python objects. It organizes data sequentially and resembles a single column in an Excel sheet or SQL table. When we combine two pandas Series into a DataFrame results in a DataFrame with two columns. In this article, I will explain how to combine two or more Series into a pandas DataFrame.
Key Points –
- Two Series can be combined to create a DataFrame by treating each Series as a separate column.
- The
pd.DataFrame()
constructor allows for direct conversion of two Series into columns in a DataFrame. pd.concat()
can be used to combine two Series along either axis, giving more flexibility in arrangement.- A dictionary with Series as values and column names as keys can be passed to
pd.DataFrame()
for a straightforward combination.
- pandas.concat()
- pandas.merge()
- DataFrame.join()
- Series.append() – This append rows instead of combining as columns
Using pandas.concat() to Combine Two Series
By using pandas.concat() method you can combine pandas objects for example create multiple series and pass them along a particular axis (column-wise or row-wise) to create a DataFrame.
import pandas as pd
# Create pandas Series
courses = pd.Series(["Spark","PySpark","Hadoop"])
print("First Series:\n", courses)
print("---------------------------------")
fees = pd.Series([22000,25000,23000])
print("Second Series:\n", fees)
print("---------------------------------")
discount = pd.Series([1000,2300,1000])
print("Third Series:\n", discount)
Yields below output.
For our scenario, we can use the concat()
method which takes several parameters. To merge series as columns instead of rows, we can use the axis
parameter as 1
. Note that using axis=0
appends series to rows instead of columns.
# Combine two series.
df = pd.concat([courses,fees], axis=1)
print("After combining two Series\n", df)
Yields below output.
Note that if the Series doesn’t contain names and by not provide names to columns while merging, it assigns numbers to columns.
# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
discount = pd.Series([1000,2300,1000],name='discount')
df = pd.concat([courses,fees,discount],axis=1)
print("After combining multiple Series:\n", df)
Yields below output.
# Output:
# After combining multiple Series:
courses fees discount
0 Spark 22000 1000
1 PySpark 25000 2300
2 Hadoop 23000 1000
If you have a custom index to Series, combine()
method carries the same index to the created DataFrame. To concatenate Series while providing custom column names, you can use the pd.concat()
function with a dictionary specifying the column names.
In the below examples, we create three Pandas Series, courses
, fees
, and discount
. Then, we concatenate these Series into a DataFrame (df
) using pd.concat()
method. We provide custom column names by passing a dictionary where keys are the desired column names and values are the Series. This ensures that the resulting DataFrame has the specified column names.
# Assign Index to Series
index_labels=['r1','r2','r3']
courses.index = index_labels
fees.index = index_labels
discount.index = index_labels
# Concat Series by Changing names
df=pd.concat({'Courses': courses,
'Course_Fee': fees,
'Course_Discount': discount},axis=1)
print("After combining multiple Series:\n", df)
Yields below output.
# Output:
# After combining multiple Series:
Courses Course_Fee Course_Discount
r1 Spark 22000 1000
r2 PySpark 25000 2300
r3 Hadoop 23000 1000
The reset_index()
method in Pandas is used to reset the index of a DataFrame. This operation moves the current index to a column and adds a new default integer index.
# change the index to a column & create new index
df = df.reset_index()
print("After combining multiple Series:\n", df)
Yields below output.
# Output:
# After combining multiple Series:
index Courses Course_Fee Course_Discount
0 r1 Spark 22000 1000
1 r2 PySpark 25000 2300
2 r3 Hadoop 23000 1000
Combine Two Series Using pandas.merge()
pandas.merge() method is used to combine complex column-wise combinations of DataFrame similar to SQL-like way. merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case. For instance, pd.merge(S1, S2, right_index=True, left_index=True)
.
# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
# Using pandas series merge()
df = pd.merge(courses, fees, right_index = True,
left_index = True)
print("After combining two Series:\n", df)
Yields below output.
# Output:
# After combining two Series:
courses fees
0 Spark 22000
1 PySpark 25000
2 Hadoop 23000
Combine Two Series Using DataFrame.join()
You can also use DataFrame.join() to join two series. In order to use the DataFrame object first you need to have a DataFrame object. One way to get this is by creating a DataFrame from the Series and using it to combine with another Series.
# Using DataFrame.join()
df=pd.DataFrame(courses).join(fees)
print("After combining two Series:\n", df)
Yields the same output as above.
Using Series.append() to Combine Two Series
You can use pandas.DataFrame(Series.append(Series,ignore_index=True))
to create a DataFrame by appending the series to another series. Note that in this example it doesn’t create multiple columns instead it just appends as a row.
# Using Series.append()
courses_am=pd.Series(["Spark","PySpark","Hadoop"])
courses_pm=pd.Series(["Pandas","Python","Scala"])
df = pd.DataFrame(courses_am.append(courses_pm,
ignore_index = True),columns=['all_courses'])
print("After combining two Series:\n", df)
Yields below output.
# Output:
# After combining two Series:
courses
0 Spark
1 PySpark
2 Hadoop
3 Pandas
4 Python
5 Scala
FAQ on Combine Two Series into Pandas DataFrame
A Pandas Series is a one-dimensional array-like structure that can hold data of any type. It is similar to a column in a spreadsheet or a single array in Python.
To combine two Pandas Series into a DataFrame, you can use the pd.DataFrame()
constructor or pd.concat()
.
To combine two Pandas Series horizontally (side-by-side), you can use the pd.concat()
function or pass the Series into a pd.DataFrame()
constructor.
To combine two Pandas Series vertically (stacked), you can use pd.concat()
or append()
.
Pandas will allow duplicates but may cause issues when performing operations like set_index()
. You can reset the index before combining.
Use pd.merge()
for more complex merging logic, but for simple index-based combination, pd.concat()
or pd.DataFrame()
suffices.
Conclusion
In this article, I have explained how to combine two series into a DataFrame in pandas using pandas.concat()
, pandas.merge()
, Series.append()
and DataFrame.join()
. If you just want to combine all series as columns into DataFrame then use the pandas.concat()
as it is simplest and pretty straightforward.
Happy Learning !!
Related Articles
- Pandas Series.clip() Function
- Pandas Series iloc[] Function
- How to Append Pandas Series?
- Pandas Series map() Function
- Pandas Series round() Function
- Pandas Series.dtype() Function
- How to Use NOT IN Filter in Pandas
- Install pandas on Windows Step-by-Step
- Pandas Combine Two DataFrames With Examples
- Pandas Combine Two Columns of Text in DataFrame
- Pandas DataFrame – Different Ways to Iterate Over Rows
- Pandas Remap Values in Column with a Dictionary (Dict)