What is a Pandas Series
The Pandas Series is a one-dimensional labeled array holding any data type(integers, strings, floating-point numbers, Python objects, etc.). Series stores data in sequential order. It is one-column information. Series can take any type of data, but it should be consistent throughout the series (all values in a series should have the same type). You can create a series by calling pandas.Series()
. In this article, we’ll explain how to creates Pandas series data structure, how to access by index & labels and finally using some functions with examples.
Pandas Series Methods
Following are the most used Pandas Series methods.
FUNCTIONS | DESCRIPTION |
---|---|
series() | The series method can be created with the Series() constructor method. This constructor method accepts a variety of inputs. |
count() | Count Method returns the number of non-NA/null observations in the Series. |
size() | Size Method returns the number of elements in the underlying data. |
name() | The method allows giving a name to a Series object, i.e. to the column. |
head() | Return a specified number of rows from the beginning of a Series. The method returns a brand new Series. |
tail() | Return a specified number of rows from the end of a Series. The method returns a brand new Series. |
unique() | Used to see the unique values in a particular column. |
nunique() | Used to get a count of unique values on Pandas. |
map() | Map() method to tie together the values from one object to another. |
combine_first() | Used to combine two series into one. |
1. Create Pandas Series From a Python List
Pandas Series can create several ways by using Python list & dictionaries, below example creates a Series from a list. In order to use Pandas first, you need to import using import pandas as pd
.
# Create Pandas Series
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses)
Pandas by default add a sequence number to the Series.
# Output:
0 Spark
1 PySpark
2 Hadoop
3 Python
4 pandas
5 Oracle
dtype: object
2. Accessing Pandas Series Value by Using Index
This label can be used to access a specified value. the values are labeled with their index number. The first value has index 0, the second value has index 1, etc.
Example 1:
# Accessing Series values by using index
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[3])
Yields below output.
# Output:
Python
Example 2:
Accessing the first four elements in the series. If you use the index operator [:4] to access an element in a series. you can use the Slice operation. Retrieve multiple elements from a pandas series.
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[:4])
Yields below output
# Output:
1 PySpark
2 Hadoop
3 Python
dtype: object
Example 3:
If you can use this syntax: courses[-4:], retrieve the last four elements.
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses[-4:])
Yields below output.
# Output:
2 Hadoop
3 Python
4 pandas
5 Oracle
dtype: object
3. Accessing Pandas Series Values by Using Labels
You can create a name for your own label index argument.
# Accessing Pandas Series values by Using Labels
import pandas as pd
c = ( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series(c, index= ["subject0","subject1","subject2","subject3","subject4","subject5"] )
print(courses)
Yields below outpu.t
# Output:
subject0 Spark
subject1 PySpark
subject2 Hadoop
subject3 Python
subject4 pandas
subject5 Oracle
dtype : object
4. Create a Series From Scalar
If the data is a scalar value, the index must be provided. replaced to match the length of the index value.
# Create a Series From Scalar
import pandas as pd
Scalar= pd.Series(56, index= [13, 26, 53, 74, 53, 69])
print(Scalar)
Yields below output.
# Output:
13 56
26 56
53 56
74 56
53 56
69 56
dtype: int64
5. Create a Pandas Series From Python Dictionary
If the dictionary object is being passed as an input and the index is not specified, dictionary keys are taken in sorted order to construct the index. If the index is passed, then values correspond to a particular label in the index will be extracted from the dictionary.
# Create a Pandas Series From Python Dictionary
import pandas as pd
population_dict = {'India': 1366417754,
'China': 1397715000,
'USA': 328239523,
'England': 55977200,
'Russia': 143666931,
'Japan':126264931}
population = pd.Series(population_dict)
print(population)
Yields below output.
# Output:
India 1366417754
China 1397715000
USA 328239523
England 55977200
Russia 143666931
Japan 126264931
dtype: int64
The index labels need not be unique. you can give the same index dictionary ‘India
‘it automatically overrides the dictionary property.
import pandas as pd
population_dict = {'India': 1366417754,
'India': 1466428893,
'China': 1397715000,
'USA': 328239523,
'England': 55977200,
'Russia': 143666931,
'Japan':126264931}
population = pd.Series(population_dict)
print(population)
Yields below output
# Output:
India 1466428893
China 1397715000
USA 328239523
England 55977200
Russia 143666931
Japan 126264931
dtype: int64
6. Series Attributes
Series attributes return information about the object, do not modify or manipulate the object.
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses)
Yields below output.
# Output:
0 Spark
1 PySpark
2 Hadoop
3 Python
4 pandas
5 Oracle
dtype: object
6.1 values:
If you can use Pandas DataFrame the values attribute returns a Numpy representation of the given DataFrame. For instance, courses. values
.
# Get Numpy representation using values attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.values)
Yields below output.
# Output:
['Spark' 'PySpark' 'Hadoop' 'Python' 'pandas' 'Oracle']
6.2 index:
If you can use Index in pandas means selecting particular rows and columns of data from a DataFrame. For E.x: courses.index
.
# Get the index range using index attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.index)
Yields below output.
# Output:
RangeIndex(start=0, stop=6, step=1)
6.3 dtype: Use to get the datatype of a series
# Get datatype of series using dtype attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.dtype)
Yields below output.
# Output:
object
6.4 shape:
# Get shape of Series using shape attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.shape)
Yields below output.
# Output:
(6,)
6.5 size:
# Get the size of Series using size attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.size)
Yields below output
# Output:
6
6.6 array:
# Get array from Series using array attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.array)
Yields below output
# Output:
<pandasarray>
['Spark', 'PySpark', 'Hadoop', 'Python', 'pandas', 'Oracle']
Length: 6, dtype: object
</pandasarray>
6.7 ndim:
# Get dimensions of Series using ndim attribute
import pandas as pd
pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
courses = pd.Series( ["Spark","PySpark","Hadoop","Python","pandas","Oracle"] )
print(courses.ndim)
Yields below output
# Output:
1
7. Series Methods
A method modify or manipulate an object. represents the behavior of an object.
# Create Series from list of values
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers)
Yields below output
# Output:
0 43
1 728
2 355
3 121
4 45
5 642
6 522
dtype: int64
7.1 sum():
If you can use the sum() method returns the sum of the values for the requested axis. for E.x: numbers.sum()
.
# Get sum of Series values using sum()
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.sum())
Yields below output.
# Output:
2456
7.2 median()
If you can use pandas DataFrame.median() function return the median of the values for the requested axis. For instance, numbers.median()
.
# Get median of Series Values
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.median())
Yields below output
# Output:
355.0
7.3 product():
If you can use Pandas DataFrame the product() function returns the value of the product for the requested axis. It multiplies all the elements together on the requested axis. For instance, numbers.product()
.
# Get product of Series values
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.product())
Yields below output
# Output:
20278302770325600
7.4 mean()
If you can use the mean() method returns the mean of the values for the requested axis. for instance, numbers.mean()
.
# Get mean value of Series values
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.mean())
Yields below output
# Output:
350.85714285714283
7.5 count():
If you can use the count() method returns the number of non-NA/null observations in the Series. For E.x: numbers.count()
.
# Get number of elements in a Series
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.count())
Yields below output.
# Output:
7
7.6 describe():
If you can use Pandas describe() is used to view some basic statistical details like percentile, mean, std, etc. For instance, numbers.describe()
.
# Describe the Series using describe()
import pandas as pd
list = [43,728,355,121,45,642,522]
numbers = pd.Series(list)
print(numbers.describe())
Yields below output
# Output:
count 7.000000
mean 350.857143
std 287.942951
min 43.000000
25% 83.000000
50% 355.000000
75% 582.000000
max 728.000000
dtype: float64
Related Articles
- Pandas Series.replace() – Replace Values
- Pandas Series apply() Function Usage
- Check Values of Pandas Series is Unique
- Add Column Name to Pandas Series?
- Pandas Check Column Contains a Value in DataFrame
- Pandas – Create DataFrame From Multiple Series
- How to Check Pandas Version?
- Create Pandas Series in Python
- Pandas Series.clip() Function
- How to Convert NumPy Array to Pandas Series?
- How to Get the Length of a Series in Pandas?
- Pandas Series Drop duplicates() Function
- Pandas Series unique() Function with Examples
- Pandas Series groupby() Function with Examples
Reference
https://www.w3schools.com/python/pandas/pandas_series.asp