How to Compute Standard Deviation in NumPy

  • Post author:
  • Post category:NumPy / Python
  • Post last modified:November 14, 2023
  • Reading time:20 mins read

In NumPy, you can compute the standard deviation of a set of values using the numpy.std() function. The standard deviation is a measure of the amount of variation or dispersion in a set of values. By default, it is calculated for the flattened array but you can change this by specifying the axis param.

In order to calculate the standard deviation first, you need to compute the average of the NumPy array by using x.sum()/N, and here, N=len(x) which results in the mean value. now to calculate std use, std=sqrt(mean(x)) where x=abs(arr-arr.mean())**2

1. Quick Examples of Standard Deviation Function

If you are in a hurry, below are some quick examples of the standard deviation of the NumPy Array with examples.


# Quick examples of standard deviation 

# Example 1: Compute the standard deviation
# Using 1-dimensional array
arr = np.array([5,6,4])
arr1 = np.std(arr)

# Example 2: Get the standard deviation 
# Of with no axis
arr = np.array([[2, 3],[2, 5]])
arr1 = np.std(arr)

# Example 3: Get the standard deviation 
# Of an array in column-wise 
arr = np.array([[2, 3],[2, 5]])
arr1 = np.std(arr, axis=0)

# Example 4: Standard deviation 
# Of array row-wise
arr = np.array([[2, 3],[2, 5]])
arr1 = np.std(arr, axis=1)

# Example 5: Get the standard deviation value 
# With float32 data 
arr = np.array([5, 6, 4])
arr1 = np.std(arr, dtype = np.float32)

# Example 6: Calculate the standard deviation 
# With a specified data type (e.g., np.float64)
arr2 = np.std(arr, dtype=np.float64)

2. Syntax of std()

Following is the syntax of std().


#  Syntax of numpy.std() 
numpy.std(arr, axis=None, dtype=None, out=None) 

2.1 Parameters of std()

Following are the parameters of std().

  • arr – This is the input array for which you want to calculate the standard deviation.
  • axis – None, int or tuple of int. Axis or axes. The default is to compute the standard deviation of the flattened array. axis=0 means standard deviation computed along the column, axis=1 means standard deviation along the row. It treats the multiple dimension array as a flattened list if axis is not given.
  • dtype – This is an optional parameter that specifies the data type used in computing the standard deviation. If not specified, it uses the data type of the input array.
  • out – Alternative output array in which to place the result. 

2.2 Return Value of std()

It returns the standard deviation of array elements with float64 data type. You can change this by specifying dtype param.

3. Usage of NumPy std()

The numpy.std() function is indeed a statistical function in the NumPy library used for computing the standard deviation of arrays. It supports both single-dimensional and multi-dimensional arrays, and you can specify the axis along which the standard deviation is calculated, as well as the data type of the result.

You can use numpy.std() to calculate the standard deviation of a 1-dimensional NumPy array. For instance, first, import the NumPy library as np. Create a 1-dimensional NumPy array called arr with elements [5, 6, 4]. Use numpy.std() to calculate the standard deviation of the array.


# Import NumPy Module
import numpy as np

# Create NumPy array
arr = np.array([5,6,4])
print("Original array:",arr)

# Compute the standard deviation
# Using 1-dimensional array
arr1 = np.std(arr)
print("Standard Deviation:",arr1)

Yields below output.

numpy standard deviation

Following is the mathematical calculation of the Standard Deviation of the 1-D Array.


# Mathematical calculation of standard deviation
Standard Deviation is std =  sqrt(mean(x)), where x = abs(arr - arr.mean())**2
Mean = 5 + 6 + 4 / 3
     = 5

Standard Deviation = sqrt( ((5-5)**2 + (6-5)**2 + (4-5)**2)/3 )
                   = sqrt( (0+ 1+ 1)/3 )
                   = sqrt(2/3)
                   = sqrt(0.6666)
                   = 0.816496580927726

4. Get the Standard Deviation of 2D Array

To use numpy.std() to calculate the standard deviation of a 2D NumPy array without specifying the axis. For instance, you import the NumPy library as np. Create a 2D NumPy array called arr with elements [[2, 3], [2, 5]]. Use numpy.std() to calculate the standard deviation of the entire array (no axis specified). it will calculate all the values in an array and return the std value.


# Create a 2D numpy array
arr = np.array([[2, 3],[2, 5]])
print("Original array:\n",arr)

# Get the standard deviation of with no axis
arr1 = np.std(arr)
print("Standard Deviation of the entire array:\n",arr1)

Yields below output.

numpy standard deviation

Following is the mathematical calculation of the Standard Deviation of the 2-D Array.


# Mathematical calculation of standard deviation
Mean = 2 + 3 + 2 + 5 / 4
     = 3

Standard Deviation = sqrt( ((2-3)**2 + (3-3)**2 + (2-3)**2 + (5-3)**2)/4 )
                   = sqrt( (1+ 0+ 1+ 4)/4 )
                   = sqrt(6/4)
                   = sqrt(1.5)
                   = 1.224744871391589

5. Get the Standard Deviation using axis Param

When you pass axis=0 to numpy.std(), it calculates the standard deviation along the rows, i.e., column-wise. This means that it provides the standard deviation for each column independently.

This program calculates the standard deviation along the rows (axis=0) and prints the result, giving the standard deviation for each column. Remember, specifying axis=0 in NumPy functions often means performing the operation along the vertical axis, which corresponds to operating on columns in a 2D array.


# Create a 2D numpy array
arr = np.array([[2, 3],[2, 5]])
print("Original array:\n",arr)

# Get the standard deviation of array in column-wise 
arr1 = np.std(arr, axis=0)
print("Standard Deviation along Rows:\n",arr1)

# Output:
# Standard Deviation along Rows:
#  [0. 1.]

Below is how it calculates internally.


# Mathematical calculation of standard deviation
1st column values are 2, 2
mean = (2+2)/2 = 0

Standard Deviation = sqrt( ( (2-2)**2 + (2-2)**2 )/2 )
                   = sqrt( 0 + 0/2 )
                   = sqrt(0/2)
                   = 0.

2nd column values are 3, 5
mean = (3+5)/2 = 4

Standard Deviation = sqrt( ( (3-4)**2 + (5-4)**2 )/2 )
                   = sqrt( 1 + 1/2 )
                   = sqrt(2/2)
                   = 1.

If you want to calculate the standard deviation row-wise (along axis 1) for a 2D NumPy array, you can use np.std(arr,axis=1). This code calculates the standard deviation along the columns (axis=1), giving the standard deviation for each row.


# Standard deviation of array row-wise
arr1 = np.std(arr, axis=1)
print("Standard Deviation along Columns:\n",arr1)

# Output:
# Standard Deviation along Columns:
#  [0.5 1.5]

The mathematical calculation is the same as above, I will level this for you to explore. The result shows the standard deviation calculated for each row. The axis=1 parameter specifies that the operation is performed along the horizontal axis, which corresponds to operating on rows in a 2D array.

6. Using dtype Param

If you want to specify the data type for the result when using numpy.std(), you can use the dtype parameter. By default, it returns float64 but you can change this by passing dtype parameter to this function, it has a lower resolution if you assign dtype with float32 rather than float64.


# Create a 1D array
arr = np.array([5, 6, 4])
print("Original array:\n",arr)

# Get the standard deviation value with float32 data 
arr1 = np.std(arr, dtype = np.float32)
print("Standard Deviation with Custom Data Type:",arr1)

# Output:
# Standard Deviation with Custom Data Type: 0.8164966

Similarly, the standard deviation is calculated for the 1D array arr, and the dtype parameter is used to specify that the result should have the data type np.float64. You can replace np.float64 with the desired data type for your specific use case.

The dtype parameter is particularly useful when you want to ensure that the result has a specific data type, different from the default data type of the input array.


# Calculate the standard deviation 
# With a specified data type (e.g., np.float64)
arr2 = np.std(arr, dtype=np.float64)
print("Standard Deviation with Custom Data Type:", arr2 )

# Output:
# Standard Deviation with Custom Data Type: 0.816496580927726

Frequently Asked Questions

How do I calculate the standard deviation of a 1D array in NumPy?

To calculate the standard deviation of a 1D array in NumPy, you can use the numpy.std() function. For example, the numpy.std() function is applied to the 1D array data, and the result is stored in the variable std_dev. The std_dev variable now contains the standard deviation of the values in the array.

How can I calculate the standard deviation along a specific axis for a 2D array?

To calculate the standard deviation along a specific axis for a 2D array in NumPy, you can use the numpy.std() function with the axis parameter.

What is the default behavior of numpy.std() regarding the axis parameter?

The default behavior of numpy.std() regarding the axis parameter is to calculate the standard deviation for the flattened array. In other words, if you don’t specify the axis parameter, the function will treat the input array as if it were flattened into a 1D array, and it will compute the standard deviation for the entire flattened array.

How do I specify the data type for the result of numpy.std()?

You can specify the data type for the result of numpy.std() using the dtype parameter. The dtype parameter allows you to set the desired data type for the output

How do I calculate the standard deviation for a sample rather than the entire population?

To calculate the standard deviation for a sample rather than the entire population, you need to use the ddof (degrees of freedom) parameter in the numpy.std() function. The default value of ddof is 0, which corresponds to calculating the standard deviation for the entire population. To calculate the sample standard deviation, set ddof to 1.

What is the purpose of the ddof parameter in numpy.std()?

The ddof parameter in the numpy.std() function stands for “degrees of freedom.” It is used to adjust the divisor in the calculation of the standard deviation. The purpose of the ddof parameter is to provide flexibility in calculating the standard deviation for different scenarios, particularly when dealing with samples rather than the entire population.

Conclusion

In this article, I have explained the standard deviation of NumPy array single-dimensional and multi-dimensional using std() function with detailed examples.

Happy Learning!!

References

Vijetha

Vijetha is an experienced technical writer with a strong command of various programming languages. She has had the opportunity to work extensively with a diverse range of technologies, including Python, Pandas, NumPy, and R. Throughout her career, Vijetha has consistently exhibited a remarkable ability to comprehend intricate technical details and adeptly translate them into accessible and understandable materials. Follow me at Linkedin.

Leave a Reply