What is a Pandas Series
The Pandas Series is a one-dimensional labeled array holding any data type(integers, strings, floating-point numbers, Python objects, etc.). Series stores data in sequential order. It is one-column information. Series can take any type of data, but it should be consistent throughout the series (all values in a series should have the same type). You can create a series by calling pandas.Series()
. In this article, we’ll explain how to creates Pandas series data structure, how to access by index & labels and finally using some functions with examples.
Pandas Series Methods
Following are the most used Pandas Series methods.
FUNCTIONS | DESCRIPTION |
---|---|
series() | The series method can be created with the Series() constructor method. This constructor method accepts a variety of inputs. |
count() | Count Method returns the number of non-NA/null observations in the Series. |
size() | Size Method returns the number of elements in the underlying data. |
name() | The method allows giving a name to a Series object, i.e. to the column. |
head() | Return a specified number of rows from the beginning of a Series. The method returns a brand new Series. |
tail() | Return a specified number of rows from the end of a Series. The method returns a brand new Series. |
unique() | Used to see the unique values in a particular column. |
nunique() | Used to get a count of unique values on Pandas. |
map() | Map() method to tie together the values from one object to another. |
combine_first() | Used to combine two series into one. |
Create Pandas Series From a Python List
Pandas Series can create several ways by using Python list & dictionaries, below example creates a Series from a list. In order to use Pandas first, you need to import using import pandas as pd
.
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses)
Pandas by default add a sequence number to the Series.
0 Spark
1 PySpark
2 Hadoop
3 Python
4 pandas
5 Oracle
dtype: object
Accessing Pandas Series by Using Index
This label can be used to access a specified value. the values are labeled with their index number. The first value has index 0, the second value has index 1, etc.
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[3])
Yields below output
Python
Example 2
:
Accessing the first four elements in the series. If you use the index operator [:4] to access an element in a series. you can use the Slice operation. Retrieve multiple elements from a pandas series.
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[:4])
Yields below output
1 PySpark
2 Hadoop
3 Python
dtype: object
Example 3:
If you can use this syntax: courses[-4:], retrieve the last four elements.
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[-4:])
Yields below output
2 Hadoop
3 Python
4 pandas
5 Oracle
dtype: object
Accessing Pandas Series by Using Labels
You can create a name for your own label index argument.
import pandas as pd
c = ( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series(c, index= ["subject0","subject1","subject2","subject3","subject4","subject5"] )
print(courses)
Yields below output
subject0 Spark
subject1 PySpark
subject2 Hadoop
subject3 Python
subject4 pandas
subject5 Oracle
dtype : object
Create a Series From Scalar
If the data is a scalar value, the index must be provided. replaced to match the length of the index value.
import pandas as pd
Scalar= pd.Series(56, index= [13, 26, 53, 74, 53, 69])
print(Scalar)
Yields below output
13 56
26 56
53 56
74 56
53 56
69 56
dtype: int64
Create a Pandas Series From Python Dictionary
If the dictionary object is being passed as an input and the index is not specified, dictionary keys are taken in sorted order to construct the index. If the index is passed, then values correspond to a particular label in the index will be extracted from the dictionary.
import pandas as pd
population_dict = {'India': 1366417754,
'China': 1397715000,
'USA': 328239523,
'England': 55977200,
'Russia': 143666931,
'Japan':126264931}
population = pd.Series(population_dict)
print(population)
Yields below output
India 1366417754
China 1397715000
USA 328239523
England 55977200
Russia 143666931
Japan 126264931
dtype: int64
The index labels need not be unique. you can give the same index dictionary ‘India
‘it automatically overrides the dictionary property.
import pandas as pd
population_dict = {'India': 1366417754,
'India': 1466428893,
'China': 1397715000,
'USA': 328239523,
'England': 55977200,
'Russia': 143666931,
'Japan':126264931}
population = pd.Series(population_dict)
print(population)
Yields below output
India 1466428893
China 1397715000
USA 328239523
England 55977200
Russia 143666931
Japan 126264931
dtype: int64
Series Attributes
Series attributes return information about the object, do not modify or manipulate the object.
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses)
Yields below output
0 Spark
1 PySpark
2 Hadoop
3 Python
4 pandas
5 Oracle
dtype: object
values
If you can use Pandas DataFrame the values attribute returns a Numpy representation of the given DataFrame. For instance, courses. values
.
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.values)
Yields below output
['Spark' 'PySpark' 'Hadoop' 'Python' 'pandas' 'Oracle']
index
If you can use Index in pandas means selecting particular rows and columns of data from a DataFrame. For E.x: courses.index
.
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.index)
Yields below output
RangeIndex(start=0, stop=6, step=1)
dtype
– Use to get the datatype of a series
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.dtype)
Yields below output
object
shape
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.shape)
Yields below output.
(6,)
size
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.size)
Yields below output
6
array
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.array)
Yields below output
<pandasarray>
['Spark', 'PySpark', 'Hadoop', 'Python', 'pandas', 'Oracle']
Length: 6, dtype: object
</pandasarray>
ndim
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.ndim)
Yields below output
1
Series Methods
A method modify or manipulate an object. represents the behavior of an object.
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers)
Yields below output
0 43
1 728
2 355
3 121
4 45
5 642
6 522
dtype: int64
sum()
If you can use the sum() method returns the sum of the values for the requested axis. for E.x: numbers.sum()
.
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.sum())
Yields below output.
2456
median()
If you can use pandas DataFrame.median() function return the median of the values for the requested axis. For instance, numbers.median()
.
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.median())
Yields below output
355.0
product()
If you can use Pandas DataFrame the product() function returns the value of the product for the requested axis. It multiplies all the elements together on the requested axis. For instance, numbers.product()
.
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.product())
Yields below output
20278302770325600
mean()
If you can use the mean() method returns the mean of the values for the requested axis. for instance, numbers.mean()
.
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.mean())
Yields below output
350.85714285714283
count()
If you can use the count() method returns the number of non-NA/null observations in the Series. For E.x: numbers.count()
.
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.count())
Yields below output.
7
Describe()
If you can use Pandas describe() is used to view some basic statistical details like percentile, mean, std, etc. For instance, numbers.describe()
.
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.describe())
Yields below output
count 7.000000
mean 350.857143
std 287.942951
min 43.000000
25% 83.000000
50% 355.000000
75% 582.000000
max 728.000000
dtype: float64
Related Articles
- Pandas Series.replace() – Replace Values
- Pandas Series apply() Function Usage
- Check Values of Pandas Series is Unique
- Add Column Name to Pandas Series?
- Pandas Check Column Contains a Value in DataFrame
- Pandas – Create DataFrame From Multiple Series
- How to Check Pandas Version?
- Create Pandas Series in Python