Pandas – Create DataFrame From Multiple Series

If you have a multiple series and wanted to create a pandas DataFrame by appending each series as a columns to DataFrame, you can use concat() method.

In pandas, Series is a one-dimensional labeled array capable of holding any data type(integers, strings, floating-point numbers, Python objects, etc.). Series stores data in sequential order. It is one-column information similar to a columns in an excel sheet/SQL table.

When you combine multiple pandas Series into a DataFrame, it creates a DataFrame with the number of columns equivalent to number of series you are merging.

1. Create pandas DataFrame From Multiple Series

You can create a DataFrame from multiple Series objects by adding each series as a columns.

By using concat() method you can merge multiple series together into DataFrame. This takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows. Note that using axis=0 appends series to rows instead of columns.


import pandas as pd
# Create pandas Series
courses = pd.Series(["Spark","PySpark","Hadoop"])
fees = pd.Series([22000,25000,23000])
discount  = pd.Series([1000,2300,1000])

# Combine two series.
df=pd.concat([courses,fees],axis=1)

# It also supports to combine multiple series.
df=pd.concat([courses,fees,discount],axis=1)
print(df)

Yields below output.


         0      1     2
0    Spark  22000  1000
1  PySpark  25000  2300
2   Hadoop  23000  1000

It assigns numbers to columns. you can assign names to Series to use it as columns.


# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
discount  = pd.Series([1000,2300,1000],name='discount')

df=pd.concat([courses,fees,discount],axis=1)
print(df)

Yields below output.


   courses   fees  discount
0    Spark  22000      1000
1  PySpark  25000      2300
2   Hadoop  23000      1000

Let’s see how to assign an index to Series and provide custom column names to the DataFrame.


# Assign Index to Series
index_labels=['r1','r2','r3']
courses.index = index_labels
fees.index = index_labels
discount.index = index_labels

# Concat Series by Changing Names
df=pd.concat({'Courses': courses,
              'Course_Fee': fees,
              'Course_Discount': discount},axis=1)
print(df)

Yields below output.


    Courses  Course_Fee  Course_Discount
r1    Spark       22000             1000
r2  PySpark       25000             2300
r3   Hadoop       23000             1000

Finally, let’s see how to rest the index using reset_index() method. This moves the current index as a column and adds a new index to a combined DataFrame.


#change the index to a column & create new index
df = df.reset_index()
print(df)

Yields below output.


  index  Courses  Course_Fee  Course_Discount
0    r1    Spark       22000             1000
1    r2  PySpark       25000             2300
2    r3   Hadoop       23000             1000

Happy Learning !!

Conclusion

In this article you have learned how to create a DataFrame from multiple pandas Series objects. On DataFrame each series becomes a column. Also learned how to change the column names while creating a DataFrame and reset indexes.

You May Also Like

References

NNK

SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply