Pandas – What is a Series Explained With Examples

What is a Pandas Series

The Pandas Series is a one-dimensional labeled array holding any data type(integers, strings, floating-point numbers, Python objects, etc.). Series stores data in sequential order. It is one-column information. Series can take any type of data, but it should be consistent throughout the series (all values in a series should have the same type). You can create a series by calling pandas.Series(). In this article, we’ll explain how to creates Pandas series data structure, how to access by index & labels and finally using some functions with examples.

Pandas Series Methods

Following are the most used Pandas Series methods.

FUNCTIONS DESCRIPTION
series()The series method can be created with the Series() constructor method. This constructor method accepts a variety of inputs.
count()Count Method returns the number of non-NA/null observations in the Series.
size()Size Method returns the number of elements in the underlying data.
name() The method allows giving a name to a Series object, i.e. to the column.
head()Return a specified number of rows from the beginning of a Series. The method returns a brand new Series.
tail()Return a specified number of rows from the end of a Series. The method returns a brand new Series.
unique()Used to see the unique values in a particular column.
nunique()Used to get a count of unique values on Pandas.
map()Map() method to tie together the values from one object to another.
combine_first()Used to combine two series into one.

Create Pandas Series From a Python List

Pandas Series can create several ways by using Python list & dictionaries, below example creates a Series from a list. In order to use Pandas first, you need to import using import pandas as pd.


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses)

Pandas by default add a sequence number to the Series.


0          Spark
1          PySpark
2          Hadoop
3          Python
4          pandas
5           Oracle
dtype:   object

Accessing Pandas Series by Using Index

This label can be used to access a specified value. the values are labeled with their index number. The first value has index 0, the second value has index 1, etc.


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[3])

Yields below output


Python

Example 2:

Accessing the first four elements in the series. If you use the index operator [:4] to access an element in a series. you can use the Slice operation. Retrieve multiple elements from a pandas series.


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[:4])

Yields below output


1    PySpark
2     Hadoop
3     Python
dtype: object

Example 3:

If you can use this syntax: courses[-4:], retrieve the last four elements.


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[-4:])

Yields below output


2    Hadoop
3    Python
4    pandas
5    Oracle
dtype: object

Accessing Pandas Series by Using Labels

You can create a name for your own label index argument.


import pandas as pd
c = ( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series(c, index= ["subject0","subject1","subject2","subject3","subject4","subject5"] )
print(courses)

Yields below output


subject0         Spark
subject1       PySpark
subject2       Hadoop
subject3       Python
subject4       pandas
subject5       Oracle
dtype      : object

Create a Series From Scalar

If the data is a scalar value, the index must be provided. replaced to match the length of the index value.


import pandas as pd
Scalar= pd.Series(56, index= [13, 26, 53, 74, 53, 69])
print(Scalar)

Yields below output


13    56
26    56
53    56
74    56
53    56
69    56
dtype: int64

Create a Pandas Servies From Python Dictionary

If the dictionary object is being passed as an input and the index is not specified, dictionary keys are taken in sorted order to construct the index. If the index is passed, then values correspond to a particular label in the index will be extracted from the dictionary.


import pandas as pd
population_dict = {'India': 1366417754,
                   'China': 1397715000,
                   'USA': 328239523,
                   'England': 55977200,
                   'Russia': 143666931,
                   'Japan':126264931}
population = pd.Series(population_dict)
print(population)

Yields below output


India      1366417754
China      1397715000
USA         328239523
England      55977200
Russia      143666931
Japan       126264931
dtype: int64

The index labels need not be unique. you can give the same index dictionary ‘India‘it automatically overrides the dictionary property.


import pandas as pd
population_dict = {'India': 1366417754,
                   'India': 1466428893,
                   'China': 1397715000,
                   'USA': 328239523,
                   'England': 55977200,
                   'Russia': 143666931,
                   'Japan':126264931}
population = pd.Series(population_dict)
print(population)

Yields below output


India      1466428893
China      1397715000
USA         328239523
England      55977200
Russia      143666931
Japan       126264931
dtype: int64

Series Attributes

Series attributes return information about the object, do not modify or manipulate the object.


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses)

Yields below output


0      Spark
1    PySpark
2     Hadoop
3     Python
4     pandas
5     Oracle
dtype: object

values

If you can use Pandas DataFrame the values attribute returns a Numpy representation of the given DataFrame. For instance, courses. values.


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.values)

Yields below output


['Spark' 'PySpark' 'Hadoop' 'Python' 'pandas' 'Oracle']

index

If you can use Index in pandas means selecting particular rows and columns of data from a DataFrame. For E.x: courses.index.


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.index)

Yields below output


RangeIndex(start=0, stop=6, step=1)

dtype – Use to get the datatype of a series


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.dtype)

Yields below output


object

shape


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.shape)

Yields below output.


(6,)

size


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.size)

Yields below output


6

array


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.array)

Yields below output


<pandasarray>
['Spark', 'PySpark', 'Hadoop', 'Python', 'pandas', 'Oracle']
Length: 6, dtype: object
</pandasarray>

ndim


import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.ndim)

Yields below output


1

Series Methods

A method modify or manipulate an object. represents the behavior of an object.


import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers)

Yields below output


0     43
1    728
2    355
3    121
4     45
5    642
6    522
dtype: int64

sum()

If you can use the sum() method returns the sum of the values for the requested axis. for E.x: numbers.sum().


import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.sum())

Yields below output.


2456

median()

If you can use pandas DataFrame.median() function return the median of the values for the requested axis. For instance, numbers.median().


import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.median())

Yields below output


355.0

product()

If you can use Pandas DataFrame the product() function returns the value of the product for the requested axis. It multiplies all the elements together on the requested axis. For instance, numbers.product().


import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.product())

Yields below output


20278302770325600

mean()

If you can use the mean() method returns the mean of the values for the requested axis. for instance, numbers.mean().


import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.mean())

Yields below output


350.85714285714283

count()

If you can use the count() method returns the number of non-NA/null observations in the Series. For E.x: numbers.count().


import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.count())

Yields below output.


7

Describe()

If you can use Pandas describe() is used to view some basic statistical details like percentile, mean, std, etc. For instance, numbers.describe().


import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.describe())

Yields below output


count      7.000000
mean     350.857143
std      287.942951
min       43.000000
25%       83.000000
50%      355.000000
75%      582.000000
max      728.000000
dtype: float64

Reference

https://www.w3schools.com/python/pandas/pandas_series.asp

Leave a Reply