How to Combine Two Series into Pandas DataFrame

  • Post author:
  • Post category:Pandas
  • Post last modified:January 9, 2024
  • Reading time:10 mins read

Using pandas.concat() method you can combine/merge two or more series into a DataFrame (create DataFrame from multiple series). Besides this, you can also use Series.append(), pandas.merge(), DataFrame.join() to merge multiple Series to create DataFrame.

In pandas, a Series is a one-dimensional labeled array capable of holding any data type(integers, strings, floating-point numbers, Python objects, etc.). Series stores data in sequential order. It is one-column information similar to columns in an Excel sheet/SQL table.

When you combine two pandas Series into a DataFrame, it creates a DataFrame with the two columns. In this article, I will explain different ways to combine two or more Series into a DataFrame.

1. Using pandas.concat() to Combine Two Series

By using pandas.concat() method you can combine pandas objects for example create multiple series and pass them along a particular axis (column-wise or row-wise) to create a DataFrame.


import pandas as pd
# Create pandas Series
courses = pd.Series(["Spark","PySpark","Hadoop"])
print("First Series:\n", courses)
print("---------------------------------")
fees = pd.Series([22000,25000,23000])
print("Second Series:\n", fees)
print("---------------------------------")
discount  = pd.Series([1000,2300,1000])
print("Third Series:\n", discount)

Yields below output.

Combine Pandas Series DataFrame

For our scenario, we can use the concat() method which takes several parameters. To merge series as columns instead of rows, we can use the axis parameter as 1. Note that using axis=0 appends series to rows instead of columns.


# Combine two series.
df = pd.concat([courses,fees], axis=1)
print("After combining two Series\n", df)

Yields below output.

Combine Pandas Series DataFrame

Note that if the Series doesn’t contain names and by not provide names to columns while merging, it assigns numbers to columns.


# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
discount  = pd.Series([1000,2300,1000],name='discount')

df = pd.concat([courses,fees,discount],axis=1)
print("After combining multiple Series:\n", df)

Yields below output.


# Output:
# After combining multiple Series:
   courses   fees  discount
0    Spark  22000      1000
1  PySpark  25000      2300
2   Hadoop  23000      1000

If you have a custom index to Series, combine() method carries the same index to the created DataFrame. Now let’s see how to assign an index to Series and provide custom column names to the DataFrame.


# Assign Index to Series
index_labels=['r1','r2','r3']
courses.index = index_labels
fees.index = index_labels
discount.index = index_labels

# Concat Series by Changing Names
df=pd.concat({'Courses': courses,
              'Course_Fee': fees,
              'Course_Discount': discount},axis=1)
print("After combining multiple Series:\n", df)

Yields below output.


# Output:
# After combining multiple Series:
    Courses  Course_Fee  Course_Discount
r1    Spark       22000             1000
r2  PySpark       25000             2300
r3   Hadoop       23000             1000

Finally, let’s see how to reset an index using reset_index() method. This moves the current index as a column and adds a new index to a combined DataFrame.


# change the index to a column & create new index
df = df.reset_index()
print("After combining multiple Series:\n", df)

Yields below output.


# Output:
# After combining multiple Series:
  index  Courses  Course_Fee  Course_Discount
0    r1    Spark       22000             1000
1    r2  PySpark       25000             2300
2    r3   Hadoop       23000             1000

2. Combine Two Series Using pandas.merge()

pandas.merge() method is used to combine complex column-wise combinations of DataFrame similar to SQL-like way. merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case. For instance, pd.merge(S1, S2, right_index=True, left_index=True).


# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')

# Using pandas series merge()
df = pd.merge(courses, fees, right_index = True,
               left_index = True)
print("After combining two Series:\n", df)

Yields below output.


# Output:
# After combining two Series:
   courses   fees
0    Spark  22000
1  PySpark  25000
2   Hadoop  23000

3. Combine Two Series Using DataFrame.join()

You can also use DataFrame.join() to join two series. In order to use the DataFrame object first you need to have a DataFrame object. One way to get this is by creating a DataFrame from the Series and using it to combine with another Series.


# Using DataFrame.join()
df=pd.DataFrame(courses).join(fees)
print("After combining two Series:\n", df)

Yields the same output as above.

4. Using Series.append() to Combine Two Series

You can use pandas.DataFrame(Series.append(Series,ignore_index=True)) to create a DataFrame by appending the series to another series. Note that in this example it doesn’t create multiple columns instead it just appends as a row.


# Using Series.append()
courses_am=pd.Series(["Spark","PySpark","Hadoop"])
courses_pm=pd.Series(["Pandas","Python","Scala"])
df = pd.DataFrame(courses_am.append(courses_pm, 
                  ignore_index = True),columns=['all_courses'])
print("After combining two Series:\n", df)

Yields below output.


# Output:
# After combining two Series:
   courses
0    Spark
1  PySpark
2   Hadoop
3   Pandas
4   Python
5    Scala

Conclusion

In this article, you have learned how to combine two series into a DataFrame in pandas using pandas.concat(), pandas.merge(), Series.append() and DataFrame.join(). If you just want to combine all series as columns into DataFrame then use the pandas.concat() as it is simplest and pretty straightforward.

Happy Learning !!

Related Articles

References

Naveen (NNK)

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ @ LinkedIn

Leave a Reply