• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:13 mins read
You are currently viewing Pandas – Create DataFrame From Multiple Series

If you have a multiple series and wanted to create a pandas DataFrame by appending each series as a columns to DataFrame, you can use concat() method.

In pandas, Series is a one-dimensional labeled array capable of holding any data type(integers, strings, floating-point numbers, Python objects, etc.). Series stores data in sequential order. It is one-column information similar to a columns in an excel sheet/SQL table.

When you combine multiple pandas Series into a DataFrame, it creates a DataFrame with the number of columns equivalent to number of series you are merging.

1. Create pandas DataFrame From Multiple Series

You can create a DataFrame from multiple Series objects by adding each series as a columns.

By using concat() method you can merge multiple series together into DataFrame. This takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows. Note that using axis=0 appends series to rows instead of columns.


import pandas as pd

# Create pandas Series
courses = pd.Series(["Spark","PySpark","Hadoop"])
fees = pd.Series([22000,25000,23000])
discount  = pd.Series([1000,2300,1000])

# Combine two series.
df=pd.concat([courses,fees],axis=1)

# It also supports to combine multiple series.
df=pd.concat([courses,fees,discount],axis=1)
print("Create pandas Series:\n",df)

Yields below output.

pandas create dataframe series

It assigns numbers to columns. you can assign names to Series to use it as columns.


# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
discount  = pd.Series([1000,2300,1000],name='discount')

df=pd.concat([courses,fees,discount],axis=1)
print(df)

Yields below output.


# Output:
   courses   fees  discount
0    Spark  22000      1000
1  PySpark  25000      2300
2   Hadoop  23000      1000

Let’s see how to assign an index to Series and provide custom column names to the DataFrame.


# Assign Index to Series
index_labels=['r1','r2','r3']
courses.index = index_labels
fees.index = index_labels
discount.index = index_labels

# Concat Series by Changing Names
df=pd.concat({'Courses': courses,
              'Course_Fee': fees,
              'Course_Discount': discount},axis=1)
print(df)

Yields below output.


# Output:
    Courses  Course_Fee  Course_Discount
r1    Spark       22000             1000
r2  PySpark       25000             2300
r3   Hadoop       23000             1000

Finally, let’s see how to rest the index using reset_index() method. This moves the current index as a column and adds a new index to a combined DataFrame.


# Change the index to a column & create new index
df = df.reset_index()
print(df)

Yields below output.


# Output:
  index  Courses  Course_Fee  Course_Discount
0    r1    Spark       22000             1000
1    r2  PySpark       25000             2300
2    r3   Hadoop       23000             1000

Frequently Asked Questions on Create DataFrame From Multiple Series

What is a DataFrame in Pandas?

In Pandas, a DataFrame is a two-dimensional, tabular data structure with labeled axes (rows and columns). It is similar to a spreadsheet or SQL table and is a powerful tool for data manipulation and analysis.

How can I create a DataFrame from multiple Series in Pandas?

To create a DataFrame from multiple Series in Pandas, you can use the pd.DataFrame constructor.

Can the Series have different lengths when creating a DataFrame?

The Series used to create a DataFrame must have the same length. If the Series have different lengths, it will result in a ValueError. Each Series will be treated as a column in the DataFrame, and they must align in length to form a coherent tabular structure.

Can I add more Series to an existing DataFrame?

You can add more Series to an existing DataFrame by specifying a new column name. For example, a new Series (new_series) is created, and then it is added to the existing DataFrame (df) using square bracket notation. The new column is labeled ‘Column3’, and the data from the new_series is assigned to this column. The resulting DataFrame will have three columns: ‘Column1’, ‘Column2’, and ‘Column3’.

Can I specify custom column names when creating a DataFrame from multiple Series?

You can specify custom column names when creating a DataFrame from multiple Series. Instead of using the default names, you can provide your own column names in the dictionary passed to the pd.DataFrame constructor.

How can I set the index for the DataFrame when creating it from multiple Series?

You can set the index for the DataFrame when creating it from multiple Series using the index parameter in the pd.DataFrame constructor. For example, the index parameter is set to a list of custom index labels (['row1', 'row2', 'row3']). The resulting DataFrame will have the specified index instead of the default integer index.

Conclusion

In this article, you have learned how to create a DataFrame from multiple pandas Series objects. On DataFrame each series becomes a column. Also learned how to change the column names while creating a DataFrame and reset indexes.

Happy Learning !!

Related Articles

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium

Leave a Reply