If you have a multiple series and wanted to create a pandas DataFrame by appending each series as a columns to DataFrame, you can use concat() method.
In pandas, Series is a one-dimensional labeled array capable of holding any data type(integers, strings, floating-point numbers, Python objects, etc.). Series stores data in sequential order. It is one-column information similar to a columns in an excel sheet/SQL table.
When you combine multiple pandas Series into a DataFrame, it creates a DataFrame with the number of columns equivalent to number of series you are merging.
1. Create pandas DataFrame From Multiple Series
You can create a DataFrame from multiple Series objects by adding each series as a columns.
By using concat()
method you can merge multiple series together into DataFrame. This takes several params, for our scenario we use list
that takes series to combine and axis=1
to specify merge series as columns instead of rows. Note that using axis=0
appends series to rows instead of columns.
import pandas as pd
# Create pandas Series
courses = pd.Series(["Spark","PySpark","Hadoop"])
fees = pd.Series([22000,25000,23000])
discount = pd.Series([1000,2300,1000])
# Combine two series.
df=pd.concat([courses,fees],axis=1)
# It also supports to combine multiple series.
df=pd.concat([courses,fees,discount],axis=1)
print(df)
Yields below output.
# Output:
0 1 2
0 Spark 22000 1000
1 PySpark 25000 2300
2 Hadoop 23000 1000
It assigns numbers to columns. you can assign names to Series to use it as columns.
# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
discount = pd.Series([1000,2300,1000],name='discount')
df=pd.concat([courses,fees,discount],axis=1)
print(df)
Yields below output.
# Output:
courses fees discount
0 Spark 22000 1000
1 PySpark 25000 2300
2 Hadoop 23000 1000
Let’s see how to assign an index to Series and provide custom column names to the DataFrame.
# Assign Index to Series
index_labels=['r1','r2','r3']
courses.index = index_labels
fees.index = index_labels
discount.index = index_labels
# Concat Series by Changing Names
df=pd.concat({'Courses': courses,
'Course_Fee': fees,
'Course_Discount': discount},axis=1)
print(df)
Yields below output.
# Output:
Courses Course_Fee Course_Discount
r1 Spark 22000 1000
r2 PySpark 25000 2300
r3 Hadoop 23000 1000
Finally, let’s see how to rest the index using reset_index()
method. This moves the current index as a column and adds a new index to a combined DataFrame.
# Change the index to a column & create new index
df = df.reset_index()
print(df)
Yields below output.
# Output:
index Courses Course_Fee Course_Discount
0 r1 Spark 22000 1000
1 r2 PySpark 25000 2300
2 r3 Hadoop 23000 1000
Happy Learning !!
Conclusion
In this article you have learned how to create a DataFrame from multiple pandas Series objects. On DataFrame each series becomes a column. Also learned how to change the column names while creating a DataFrame and reset indexes.
Related Articles
- What is a Pandas Series Explained With Examples
- How to Combine Two Columns of Text in Pandas DataFrame
- pandas Add New Column to DataFrame
- Pandas DataFrame – Different Ways to Iterate Over Rows
- pandas Rename DataFrame Columns
- Pandas Create DataFrame From Dict (Dictionary)
- Pandas Create New DataFrame By Selecting Specific Columns
- Pandas Create Conditional Column in DataFrame