Using pandas.concat()
method you can combine/merge two or more series into a DataFrame (create DataFrame from multiple series). Besides this you can also use Series.append()
, pandas.merge()
, DataFrame.join()
to merge multiple Series to create DataFrame.
In pandas, Series is a one-dimensional labeled array capable of holding any data type(integers, strings, floating-point numbers, Python objects, etc.). Series stores data in sequential order. It is one-column information similar to a columns in an excel sheet/SQL table.
When you combine two pandas Series into a DataFrame, it creates a DataFrame with the two columns. In this aritcle I will explain different ways to combine two and more Series into a DataFrame.
- pandas.concat()
- pandas.merge()
- DataFrame.join()
- Series.append() – This append rows instead of combining as columns
1. Using pandas.concat() to Combine Two Series
By using pandas.concat()
you can combine pandas objects for example multiple series along a particular axis (column-wise or row-wise) to create a DataFrame.
concat() method takes several params, for our scenario we use list
that takes series to combine and axis=1
to specify merge series as columns instead of rows. Note that using axis=0
appends series to rows instead of columns.
import pandas as pd
# Create pandas Series
courses = pd.Series(["Spark","PySpark","Hadoop"])
fees = pd.Series([22000,25000,23000])
discount = pd.Series([1000,2300,1000])
# Combine two series.
df=pd.concat([courses,fees],axis=1)
# It also supports to combine multiple series.
df=pd.concat([courses,fees,discount],axis=1)
print(df)
Yields below output.
# Output:
0 1 2
0 Spark 22000 1000
1 PySpark 25000 2300
2 Hadoop 23000 1000
Note that if Series doesn’t contains names and by not proving names to columns while merging, it assigns numbers to columns.
# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
discount = pd.Series([1000,2300,1000],name='discount')
df=pd.concat([courses,fees,discount],axis=1)
print(df)
Yields below output.
# Output:
courses fees discount
0 Spark 22000 1000
1 PySpark 25000 2300
2 Hadoop 23000 1000
If you have custom index to Series, combine()
method carries the same index to the created DataFrame. Now let’s see how to assign an index to Series and provide custom column names to the DataFrame.
# Assign Index to Series
index_labels=['r1','r2','r3']
courses.index = index_labels
fees.index = index_labels
discount.index = index_labels
# Concat Series by Changing Names
df=pd.concat({'Courses': courses,
'Course_Fee': fees,
'Course_Discount': discount},axis=1)
print(df)
Yields below output.
# Output:
Courses Course_Fee Course_Discount
r1 Spark 22000 1000
r2 PySpark 25000 2300
r3 Hadoop 23000 1000
Finally, let’s see how to rest an index using reset_index()
method. This moves the current index as a column and adds a new index to a combined DataFrame.
#change the index to a column & create new index
df = df.reset_index()
print(df)
Yields below output.
# Output:
index Courses Course_Fee Course_Discount
0 r1 Spark 22000 1000
1 r2 PySpark 25000 2300
2 r3 Hadoop 23000 1000
2. Combine Two Series Using pandas.merge()
pandas.merge() method is used to combine complex column-wise combinations of DataFrame similar to SQL-like way. merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case. For instance, pd.merge(S1, S2, right_index=True, left_index=True)
.
# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
# Using pandas series merge()
df = pd.merge(courses, fees, right_index = True,
left_index = True)
print(df)
Yields below output.
# Output:
courses fees
0 Spark 22000
1 PySpark 25000
2 Hadoop 23000
3. Combine Two Series Using DataFrame.join()
You can also use DataFrame.join()
to join two series. In order to use DataFrame object first you need to have a DataFrame object. One way to get is by creating a DataFrame from Series and use it to combine with another Series.
# Using DataFrame.join()
df=pd.DataFrame(courses).join(fees)
print(df)
Yields same output as above.
4. Using Series.append() to Combine Two Series
You can use pandas.DataFrame(Series.append(Series,ignore_index=True))
to create a DataFrame by appending series to another series. Note that in this example it doens’t create multiple columns instead it just appends as a row’s.
# Using Series.append()
courses_am=pd.Series(["Spark","PySpark","Hadoop"])
courses_pm=pd.Series(["Pandas","Python","Scala"])
df = pd.DataFrame(courses_am.append(courses_pm,
ignore_index = True),columns=['all_courses'])
print(df)
Yields below output.
# Output:
courses
0 Spark
1 PySpark
2 Hadoop
3 Pandas
4 Python
5 Scala
Conclusion
In this article, you have learned how to combine two series into a DataFrame in pandas using pandas.concat()
, pandas.merge()
, Series.append()
and DataFrame.join()
. If you just wanted to combine all series as columns into DataFrame then use pandas.concat() as it is simplest and pretty straight forward.
Happy Learning !!
Related Articles
- What is a Pandas Series Explained With Examples
- How to Combine Two Columns of Text in Pandas DataFrame
- Install pandas on Windows Step-by-Step
- Pandas DataFrame – Different Ways to Iterate Over Rows
- Pandas Remap Values in Column with a Dictionary (Dict)
- Pandas Combine Two DataFrames With Examples
- Pandas Combine Two Columns of Text in DataFrame
- How to Use NOT IN Filter in Pandas
- How to Append Pandas Series?