How to Plot the Boxplot from DataFrame?

Pandas DataFrame boxplot() function is used to make a box plot from the given DataFrame columns. Boxplot is also called a Whisker plot that helps us better understand by providing the range of values in your data set and identifying any outliers in a format that’s easier to understand than the raw data.

In the boxplot graph, the x-axis represents the data we are going to plot and the y-axis represents frequency. In this article, I will explain how to plot the boxplot from DataFrame. The boxplot is also present in the Matplotlib library.

1. Quick Examples of Create Boxplot of DataFrame

If you are in a hurry below are some quick examples of how to create a box plot using boxplot().


# Below are the quick examples
# Create DataFrame
np.random.seed(10)
df = pd.DataFrame(np.random.rand(10, 3),
                  columns=['Num1', 'Num2', 'Num3' ])

# Example 1: Plot the box plot of single column of DataFrame
b_plot = df.boxplot(column = 'Num1') 
b_plot.plot()

# Example 2: Create plot box for multiple columns
b_plot = df.boxplot(column = ['Num1', 'Num2', 'Num3']) 
b_plot.plot() 

# Example 3: Customize the boxplot color
b_plot = df.boxplot(column = 'Num1', color = 'orange' ) 
b_plot.plot() 

# Example 4 : Create the title of  the boxplot 
b_plot = df.boxplot(column = 'Num1') 
plot.title('Random Numbers')
b_plot.plot()

# Example 5:  Customize the font size of boxplot
b_plot = df.boxplot(column = 'Num1', fontsize = 15) 
b_plot.plot() 
 

2. Syntax of Pandas boxplot()

Following is the syntax of the boxplot().


# Syntax of boxplot()
DataFrame.boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, backend=None, **kwargs)

2.1 Parameters of the boxplot()

Following are the parameters of the boxplot().

  • column: ( string, list of string) Column name or names
  • by: (string, array) Column in the DataFrame to group by.
  • ax: object of class matplot.axes.Axes – The matplot axis to be used by a boxplot.
  • fontsize: (int or float) The font size of the label.
  • rot: (int or float) The degree by which the labels should be rotated.
  • grid: (bool) Whether or not to show the grid.
  • figsize: tuple (width, height)  The size of the output image.
  • **kwargs: tuple (rows, columns) All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot().

2.2 Return Value

When return_type is ,

  • axes : Returns the matplot axes that the boxplot is drawn on
  • dict: Returns the dictionary that is in the matplotlib Lines of the boxplo
  • axes and dict : Returns a named tuple with the axes and dict.
  • Grouping with by : A series mapping columns to return_type is returned.
  • None : A NumPy array of axes with the same shape as layout is returned.

3. Usage of boxplot()

Box plot is a popular method for visualizing numerical data in pandas, which can be created by computing the quartiles of a data set, which divides the number range into four pieces based on their distribution. Following is the basic information of the quartile.

  • Median : Which is the value in the middle of the distribution.
  • Lower quartile : Midpoint between the median and lowest value in the range
  • Upper quartile : Midpoint between the median and highest value in the range
  • Lower boundary : Which is the lowest value in the distribution
  • Higher boundary : Which is the highest value in the distribution

4. Pandas Boxplot Single Column

We can visualize the given DataFrame in box plot chart by using boxplot() function, it will return the summarization of the given data in the form of boxplot. Let’s create a Pandas DataFrame with columns of randomly generated numbers using np.random.rand() function. In order to stop the repeating random numbers for every run time execution, we have to feed the random seed() function.


# Imports
import matplotlib.pyplot as plot
import pandas as pd
import numpy as np

# Create DataFrame
np.random.seed(10)
df = pd.DataFrame(np.random.rand(10, 3),
                  columns=['Num1', 'Num2', 'Num3' ])
print(df)

Yields below output.


# Output
       Num1      Num2      Num3
0  0.771321  0.020752  0.633648
1  0.748804  0.498507  0.224797
2  0.198063  0.760531  0.169111
3  0.088340  0.685360  0.953393
4  0.003948  0.512192  0.812621
5  0.612526  0.721755  0.291876
6  0.917774  0.714576  0.542544
7  0.142170  0.373341  0.674134
8  0.441833  0.434014  0.617767
9  0.513138  0.650397  0.601039

By using the above DataFrame, plot the Boxplot on random numbers. In the boxplot, the bottom line indicates the minimum number of random numbers and the top line indicates the maximum number of random numbers. Between the bottom and top, the middle 3 lines indicate 1st quartile, median, and 3rd quartile respectively.

Let’s create a boxplot for a single column of a given DataFrame using boxplot() function. It will generate a boxplot from the column of 'Num1'.


# Plot the box plot of single column of DataFrame
b_plot = df.boxplot(column = 'Num1')
b_plot.plot()
plot.show() 

Yields below output.

Pandas boxplot
Boxplot of the ‘Num1″

5. Pandas Boxplot Multiple Columns

Let’s create a boxplot() with multiple column names, for each column it creates a boxplot. It will generate multiple boxplots from the columns of 'Num1', 'Num2', 'Num3'. Boxplots are not limited to depicting single columns, A major use case for boxplots is to compare related distributions. For example,


# Create plot box for multiple columns
b_plot = df.boxplot(column = ['Num1', 'Num2', 'Num3']) 
b_plot.plot()
plot.show()

Yields below output.

Pandas boxplot chart
Boxplot of multiple columns

From the above, you can see the distributions of the random number for all columns of random numbers and how each column’s numbers compare with others. You can also notice that an outlier in the “Num2” distribution, as denoted by the bubble outside the distribution.

6. Pandas Boxplot Customizations

The pandas library provides multiple keyword arguments for providing customization of boxplots. Let’s see some of them and how they work with boxplots.

6.1 Customize the Color of Boxplot

We can improve the boxplot distribution by providing customized colors for that we need to pass the color argument into boxplot(), which will return the desired color of the boxplot.


# Customize the boxplot color
b_plot = df.boxplot(column = 'Num1', color = 'orange' ) 
b_plot.plot()
plot.show()

Yields below output.

Pandas boxplot
Boxplot of Num1 with orange grid lines.

6.2 Pandas Boxplot Title

By providing the title to the boxplot, users can understand quickly what they are seeing. You can add a title to your boxplot by using the title() function.


# Create the title of  the boxplot 
b_plot = df.boxplot(column = 'Num1') 
plot.title('Random Numbers')
b_plot.plot()
plot.show

Yield below output.

Pandas box plot
Boxplot of Num1

6.3 Pandas Boxplot Label Font Size

We can change the default font size by providing a customized size. This can help the boxplot more clearly and easier to read. For that, we need to pass the fontsize argument in to this function.


# Customize the font size of boxplot
b_plot = df.boxplot(column = 'Num1', fontsize = 15) 
b_plot.plot()
plot.show

Yields below output.

Pandas boxplot
Box plot of single column with customized fontsize

7. Conclusion

In this article, I have explained boxplot() function and using this how we can plot the data in a DataFrame in the form boxplot presentation. And also I explained the organization of the boxplot using various keyword arguments.

Happy learning !!

Related Articles

References

Leave a Reply

You are currently viewing How to Plot the Boxplot from DataFrame?