• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:20 mins read
You are currently viewing Pandas – What is a Series Explained With Examples

What is a Pandas Series

The Pandas Series is a one-dimensional labeled array holding any data type(integers, strings, floating-point numbers, Python objects, etc.). Series stores data in sequential order. It is one-column information. Series can take any type of data, but it should be consistent throughout the series (all values in a series should have the same type). You can create a series by calling pandas.Series(). In this article, we’ll explain how to creates Pandas series data structure, how to access by index & labels and finally using some functions with examples.

Advertisements

Pandas Series Methods

Following are the most used Pandas Series methods.

FUNCTIONS DESCRIPTION
series()The series method can be created with the Series() constructor method. This constructor method accepts a variety of inputs.
count()Count Method returns the number of non-NA/null observations in the Series.
size()Size Method returns the number of elements in the underlying data.
name() The method allows giving a name to a Series object, i.e. to the column.
head()Return a specified number of rows from the beginning of a Series. The method returns a brand new Series.
tail()Return a specified number of rows from the end of a Series. The method returns a brand new Series.
unique()Used to see the unique values in a particular column.
nunique()Used to get a count of unique values on Pandas.
map()Map() method to tie together the values from one object to another.
combine_first()Used to combine two series into one.

1. Create Pandas Series From a Python List

Pandas Series can create several ways by using Python list & dictionaries, below example creates a Series from a list. In order to use Pandas first, you need to import using import pandas as pd.


# Create Pandas Series
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses)

Pandas by default add a sequence number to the Series.


# Output:
0          Spark
1          PySpark
2          Hadoop
3          Python
4          pandas
5           Oracle
dtype:   object

2. Accessing Pandas Series Value by Using Index

This label can be used to access a specified value. the values are labeled with their index number. The first value has index 0, the second value has index 1, etc.

Example 1:


# Accessing Series values by using index
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[3])

Yields below output.


# Output:
Python

Example 2:

Accessing the first four elements in the series. If you use the index operator [:4] to access an element in a series. you can use the Slice operation. Retrieve multiple elements from a pandas series.


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[:4])

Yields below output


# Output:
1    PySpark
2     Hadoop
3     Python
dtype: object

Example 3:

If you can use this syntax: courses[-4:], retrieve the last four elements.


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[-4:])

Yields below output.


# Output:
2    Hadoop
3    Python
4    pandas
5    Oracle
dtype: object

3. Accessing Pandas Series Values by Using Labels

You can create a name for your own label index argument.


# Accessing Pandas Series values by Using Labels
import pandas as pd
c = ( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series(c, index= ["subject0","subject1","subject2","subject3","subject4","subject5"] )
print(courses)

Yields below outpu.t


# Output:
subject0         Spark
subject1       PySpark
subject2       Hadoop
subject3       Python
subject4       pandas
subject5       Oracle
dtype      : object

4. Create a Series From Scalar

If the data is a scalar value, the index must be provided. replaced to match the length of the index value.


# Create a Series From Scalar
import pandas as pd
Scalar= pd.Series(56, index= [13, 26, 53, 74, 53, 69])
print(Scalar)

Yields below output.


# Output:
13    56
26    56
53    56
74    56
53    56
69    56
dtype: int64

5. Create a Pandas Series From Python Dictionary

If the dictionary object is being passed as an input and the index is not specified, dictionary keys are taken in sorted order to construct the index. If the index is passed, then values correspond to a particular label in the index will be extracted from the dictionary.


# Create a Pandas Series From Python Dictionary
import pandas as pd
population_dict = {'India': 1366417754,
                   'China': 1397715000,
                   'USA': 328239523,
                   'England': 55977200,
                   'Russia': 143666931,
                   'Japan':126264931}
population = pd.Series(population_dict)
print(population)

Yields below output.


# Output:
India      1366417754
China      1397715000
USA         328239523
England      55977200
Russia      143666931
Japan       126264931
dtype: int64

The index labels need not be unique. you can give the same index dictionary ‘India‘it automatically overrides the dictionary property.


import pandas as pd
population_dict = {'India': 1366417754,
                   'India': 1466428893,
                   'China': 1397715000,
                   'USA': 328239523,
                   'England': 55977200,
                   'Russia': 143666931,
                   'Japan':126264931}
population = pd.Series(population_dict)
print(population)

Yields below output


# Output:
India      1466428893
China      1397715000
USA         328239523
England      55977200
Russia      143666931
Japan       126264931
dtype: int64

6. Series Attributes

Series attributes return information about the object, do not modify or manipulate the object.


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses)

Yields below output.


# Output:
0      Spark
1    PySpark
2     Hadoop
3     Python
4     pandas
5     Oracle
dtype: object

6.1 values:

If you can use Pandas DataFrame the values attribute returns a Numpy representation of the given DataFrame. For instance, courses. values.


# Get Numpy representation using values attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.values)

Yields below output.


# Output:
['Spark' 'PySpark' 'Hadoop' 'Python' 'pandas' 'Oracle']

6.2 index:

If you can use Index in pandas means selecting particular rows and columns of data from a DataFrame. For E.x: courses.index.


# Get the index range using index attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.index)

Yields below output.


# Output:
RangeIndex(start=0, stop=6, step=1)

6.3 dtype: Use to get the datatype of a series


# Get datatype of series using dtype attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.dtype)

Yields below output.


# Output:
object

6.4 shape:


# Get shape of Series using shape attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.shape)

Yields below output.


# Output:
(6,)

6.5 size:


# Get the size of Series using size attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.size)

Yields below output


# Output:
6

6.6 array:


# Get array from Series using array attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.array)

Yields below output


# Output:
<pandasarray>
['Spark', 'PySpark', 'Hadoop', 'Python', 'pandas', 'Oracle']
Length: 6, dtype: object
</pandasarray>

6.7 ndim:


# Get dimensions of Series using ndim attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.ndim)

Yields below output


# Output:
1

7. Series Methods

A method modify or manipulate an object. represents the behavior of an object.


# Create Series from list of values
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers)

Yields below output


# Output:
0     43
1    728
2    355
3    121
4     45
5    642
6    522
dtype: int64

7.1 sum():

If you can use the sum() method returns the sum of the values for the requested axis. for E.x: numbers.sum().


# Get sum of Series values using sum()
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.sum())

Yields below output.


# Output:
2456

7.2 median()

If you can use pandas DataFrame.median() function return the median of the values for the requested axis. For instance, numbers.median().


# Get median of Series Values
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.median())

Yields below output


# Output:
355.0

7.3 product():

If you can use Pandas DataFrame the product() function returns the value of the product for the requested axis. It multiplies all the elements together on the requested axis. For instance, numbers.product().


# Get product of Series values
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.product())

Yields below output


# Output:
20278302770325600

7.4 mean()

If you can use the mean() method returns the mean of the values for the requested axis. for instance, numbers.mean().


# Get mean value of Series values
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.mean())

Yields below output


# Output:
350.85714285714283

7.5 count():

If you can use the count() method returns the number of non-NA/null observations in the Series. For E.x: numbers.count().


# Get number of elements in a Series
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.count())

Yields below output.


# Output:
7

7.6 describe():

If you can use Pandas describe() is used to view some basic statistical details like percentile, mean, std, etc. For instance, numbers.describe().


# Describe the Series using describe()
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.describe())

Yields below output


# Output:
count      7.000000
mean     350.857143
std      287.942951
min       43.000000
25%       83.000000
50%      355.000000
75%      582.000000
max      728.000000
dtype: float64

Reference

https://www.w3schools.com/python/pandas/pandas_series.asp

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium