• Post author:
  • Post category:Pandas
  • Post last modified:May 17, 2024
  • Reading time:12 mins read
You are currently viewing Pandas – Create DataFrame From Multiple Series

If you have a multiple series and wanted to create a pandas DataFrame by appending each series as a columns to DataFrame, you can use concat() method.

Advertisements

In pandas, a Series acts as a one-dimensional labeled array, capable of accommodating various data types like integers, strings, floating-point numbers, Python objects, and more. It organizes data sequentially, representing a single column of information, much like a column in an Excel sheet or an SQL table.

Combining multiple pandas Series into a DataFrame results in a DataFrame that contains several columns equal to the number of Series being merged.

1. Create pandas DataFrame From Multiple Series

You can create a DataFrame from multiple Series objects by adding each series as a columns.

By using concat() method you can merge multiple series together into DataFrame. This takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows. Note that using axis=0 appends series to rows instead of columns.


import pandas as pd

# Create pandas Series
courses = pd.Series(["Spark","PySpark","Hadoop"])
fees = pd.Series([22000,25000,23000])
discount  = pd.Series([1000,2300,1000])

# Combine two series.
df=pd.concat([courses,fees],axis=1)

# It also supports to combine multiple series.
df=pd.concat([courses,fees,discount],axis=1)
print("Create pandas Series:\n",df)

Yields below output.

pandas create dataframe series

It assigns numbers to columns. you can assign names to Series to use it as columns.


# Create Series by assigning names
courses = pd.Series(["Spark","PySpark","Hadoop"], name='courses')
fees = pd.Series([22000,25000,23000], name='fees')
discount  = pd.Series([1000,2300,1000],name='discount')

df=pd.concat([courses,fees,discount],axis=1)
print(df)

# Output:
#    courses   fees  discount
# 0    Spark  22000      1000
# 1  PySpark  25000      2300
# 2   Hadoop  23000      1000

Let’s explore how to assign an index to a Series and how to specify custom column names for a DataFrame.


# Assign Index to Series
index_labels=['r1','r2','r3']
courses.index = index_labels
fees.index = index_labels
discount.index = index_labels

# Concat Series by Changing Names
df=pd.concat({'Courses': courses,
              'Course_Fee': fees,
              'Course_Discount': discount},axis=1)
print(df)

# Output:
#     Courses  Course_Fee  Course_Discount
# r1    Spark       22000             1000
# r2  PySpark       25000             2300
# r3   Hadoop       23000             1000

Similarly, the reset_index() method in pandas is commonly used to reset the index of a DataFrame, moving it into a column, and then creating a new default integer index.


# Change the index to a column & create new index
df = df.reset_index()
print(df)

# Output:
#  index  Courses  Course_Fee  Course_Discount
# 0    r1    Spark       22000             1000
# 1    r2  PySpark       25000             2300
# 2    r3   Hadoop       23000             1000

FAQ on Create DataFrame From Multiple Series

How can I create a DataFrame from multiple Series in Pandas?

To create a DataFrame from multiple Series in Pandas, you can use the pd.DataFrame constructor.

Can the Series have different lengths when creating a DataFrame?

The Series used to create a DataFrame must have the same length. If the Series have different lengths, it will result in a ValueError. Each Series will be treated as a column in the DataFrame, and they must align in length to form a coherent tabular structure.

Can I add more Series to an existing DataFrame?

You can add more Series to an existing DataFrame by specifying a new column name. For example, a new Series (new_series) is created, and then it is added to the existing DataFrame (df) using square bracket notation. The new column is labeled ‘Column3’, and the data from the new_series is assigned to this column. The resulting DataFrame will have three columns: ‘Column1’, ‘Column2’, and ‘Column3’.

Can I specify custom column names when creating a DataFrame from multiple Series?

You can specify custom column names when creating a DataFrame from multiple Series. Instead of using the default names, you can provide your own column names in the dictionary passed to the pd.DataFrame constructor.

How can I set the index for the DataFrame when creating it from multiple Series?

You can set the index for the DataFrame when creating it from multiple Series using the index parameter in the pd.DataFrame constructor. For example, the index parameter is set to a list of custom index labels (['row1', 'row2', 'row3']). The resulting DataFrame will have the specified index instead of the default integer index.

Conclusion

In this article, I have explained how to create a DataFrame from multiple pandas Series objects. On DataFrame each series becomes a column. Also learned how to change the column names while creating a DataFrame and reset indexes.

Happy Learning !!

Related Articles

References

Leave a Reply