Pandas – Create DataFrame From Multiple Series

If you have a multiple series and wanted to create a pandas DataFrame by appending each series as a columns to DataFrame, you can use concat() method.

In pandas, Series is a one-dimensional labeled array capable of holding any data type(integers, strings, floating-point numbers, Python objects, etc.). Series stores data in sequential order. It is one-column information similar to a columns in an excel sheet/SQL table.

When you combine multiple pandas Series into a DataFrame, it creates a DataFrame with the number of columns equivalent to number of series you are merging.

1. Create pandas DataFrame From Multiple Series

You can create a DataFrame from multiple Series objects by adding each series as a columns.

By using concat() method you can merge multiple series together into DataFrame. This takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows. Note that using axis=0 appends series to rows instead of columns.

import pandas as pd
# Create pandas Series
courses = pd.Series(["Spark","PySpark","Hadoop"])
fees = pd.Series([22000,25000,23000])
discount  = pd.Series([1000,2300,1000])

# Combine two series.

# It also supports to combine multiple series.

Yields below output.

         0      1     2
0    Spark  22000  1000
1  PySpark  25000  2300
2   Hadoop  23000  1000

It assigns numbers to columns. you can assign names to Series to use it as columns.

# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
discount  = pd.Series([1000,2300,1000],name='discount')


Yields below output.

   courses   fees  discount
0    Spark  22000      1000
1  PySpark  25000      2300
2   Hadoop  23000      1000

Let’s see how to assign an index to Series and provide custom column names to the DataFrame.

# Assign Index to Series
courses.index = index_labels
fees.index = index_labels
discount.index = index_labels

# Concat Series by Changing Names
df=pd.concat({'Courses': courses,
              'Course_Fee': fees,
              'Course_Discount': discount},axis=1)

Yields below output.

    Courses  Course_Fee  Course_Discount
r1    Spark       22000             1000
r2  PySpark       25000             2300
r3   Hadoop       23000             1000

Finally, let’s see how to rest the index using reset_index() method. This moves the current index as a column and adds a new index to a combined DataFrame.

#change the index to a column & create new index
df = df.reset_index()

Yields below output.

  index  Courses  Course_Fee  Course_Discount
0    r1    Spark       22000             1000
1    r2  PySpark       25000             2300
2    r3   Hadoop       23000             1000

Happy Learning !!


In this article you have learned how to create a DataFrame from multiple pandas Series objects. On DataFrame each series becomes a column. Also learned how to change the column names while creating a DataFrame and reset indexes.

You May Also Like


NNK is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment Read more ..

Leave a Reply