• Post author:
  • Post category:Pandas
  • Post last modified:November 25, 2024
  • Reading time:19 mins read
You are currently viewing How to Plot the Boxplot from DataFrame?

Pandas DataFrame boxplot() function is used to make a box plot from the given DataFrame columns. Boxplot is also called a Whisker plot that helps us better understand by providing the range of values in your data set and identifying any outliers in a format that’s easier to understand than the raw data.

Advertisements

In the boxplot graph, the x-axis represents the data we are going to plot and the y-axis represents frequency. In this article, I will explain how to plot the boxplot from DataFrame. The boxplot is also present in the Matplotlib library.

Key Points –

  • Boxplots in Pandas can be created using the .boxplot() method on a DataFrame.
  • The .boxplot() method generates boxplots for one or more columns in the DataFrame.
  • The column parameter specifies the column or columns for which the boxplot should be generated.
  • Boxplots provide a visual summary of data distribution, including the median, quartiles, and potential outliers.
  • You can customize the appearance of the boxplot by adjusting parameters such as color, fontsize, and width.

Quick Examples of Create Boxplot of DataFrame

If you are in a hurry below are some quick examples of how to create a box plot using boxplot().


# Quick examples of create boxplot of dataframe

# Create DataFrame
np.random.seed(10)
df = pd.DataFrame(np.random.rand(10, 3),
                  columns=['Num1', 'Num2', 'Num3' ])

# Example 1: Plot the box plot of single column of DataFrame
b_plot = df.boxplot(column = 'Num1') 
b_plot.plot()

# Example 2: Create plot box for multiple columns
b_plot = df.boxplot(column = ['Num1', 'Num2', 'Num3']) 
b_plot.plot() 

# Example 3: Customize the boxplot color
b_plot = df.boxplot(column = 'Num1', color = 'orange' ) 
b_plot.plot() 

# Example 4: Create the title of  the boxplot 
b_plot = df.boxplot(column = 'Num1') 
plot.title('Random Numbers')
b_plot.plot()

# Example 5: Customize the font size of boxplot
b_plot = df.boxplot(column = 'Num1', fontsize = 15) 
b_plot.plot() 
 

Syntax of Pandas boxplot()

Following is the syntax of the boxplot().


# Syntax of boxplot()
DataFrame.boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, backend=None, **kwargs)

Parameters of the boxplot()

Following are the parameters of the boxplot().

  • column: ( string, list of string) Column name or names.
  • by: (string, array) Column in the DataFrame to group by.
  • ax: object of class matplot.axes.Axes – The matplot axis to be used by a boxplot.
  • fontsize: (int or float) The font size of the label.
  • rot: (int or float) The degree by which the labels should be rotated.
  • grid: (bool) Whether or not to show the grid.
  • figsize: tuple (width, height)  The size of the output image.
  • **kwargs: tuple (rows, columns) All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot().

Return Value

When return_type is,

  • axes : Returns the matplot axes that the boxplot is drawn on.
  • dict: Returns the dictionary that is in the matplotlib Lines of the boxplo.
  • axes and dict : Returns a named tuple with the axes and dict.
  • Grouping with by : A series mapping columns to return_type is returned.
  • None : A NumPy array of axes with the same shape as layout is returned.

Usage of boxplot()

Box plot is a popular method for visualizing numerical data in pandas, which can be created by computing the quartiles of a data set, which divides the number range into four pieces based on their distribution. Following is the basic information of the quartile.

  • Median : Which is the value in the middle of the distribution.
  • Lower quartile : Midpoint between the median and lowest value in the range
  • Upper quartile : Midpoint between the median and highest value in the range
  • Lower boundary : Which is the lowest value in the distribution
  • Higher boundary : Which is the highest value in the distribution

Pandas Boxplot Single Column

We can visualize the given DataFrame in box plot chart by using boxplot() function, it will return the summarization of the given data in the form of boxplot. Let’s create a Pandas DataFrame with columns of randomly generated numbers using np.random.rand() function. In order to stop the repeating random numbers for every run time execution, we have to feed the random seed() function.


# Imports
import matplotlib.pyplot as plot
import pandas as pd
import numpy as np

# Create DataFrame
np.random.seed(10)
df = pd.DataFrame(np.random.rand(10, 3),
                  columns=['Num1', 'Num2', 'Num3' ])
print(df)

Yields below output.

Pandas boxplot

By using the above DataFrame, plot the Boxplot on random numbers. In the boxplot, the bottom line indicates the minimum number of random numbers and the top line indicates the maximum number of random numbers. Between the bottom and top, the middle 3 lines indicate 1st quartile, median, and 3rd quartile respectively.

Let’s create a boxplot for a single column of a given DataFrame using boxplot() function. It will generate a boxplot from the column of 'Num1'.


# Plot the box plot of single column of DataFrame
b_plot = df.boxplot(column = 'Num1')
b_plot.plot()
plot.show() 

Yields below output.

Pandas boxplot
Boxplot of the ‘Num1″

Pandas Boxplot Multiple Columns

Let’s create a boxplot() with multiple column names, for each column it creates a boxplot. It will generate multiple boxplots from the columns of 'Num1', 'Num2', 'Num3'. Boxplots are not limited to depicting single columns, A major use case for boxplots is to compare related distributions. For example,


# Create plot box for multiple columns
b_plot = df.boxplot(column = ['Num1', 'Num2', 'Num3']) 
b_plot.plot()
plot.show()

Yields below output.

Pandas boxplot chart
Boxplot of multiple columns

From the above, you can see the distributions of the random number for all columns of random numbers and how each column’s numbers compare with others. You can also notice that an outlier in the “Num2” distribution, as denoted by the bubble outside the distribution.

Pandas Boxplot Customizations

The pandas library provides multiple keyword arguments for providing customization of boxplots. Let’s see some of them and how they work with boxplots.

Customize the Color of Boxplot

We can improve the boxplot distribution by providing customized colors for that we need to pass the color argument into boxplot(), which will return the desired color of the boxplot.


# Customize the boxplot color
b_plot = df.boxplot(column = 'Num1', color = 'orange' ) 
b_plot.plot()
plot.show()

Yields below output.

Pandas boxplot
Boxplot of Num1 with orange grid lines.

Pandas Boxplot Title

By providing the title to the boxplot, users can understand quickly what they are seeing. You can add a title to your boxplot by using the title() function.


# Create the title of  the boxplot 
b_plot = df.boxplot(column = 'Num1') 
plot.title('Random Numbers')
b_plot.plot()
plot.show

Yield below output.

Pandas box plot
Boxplot of Num1

Pandas Boxplot Label Font Size

We can change the default font size by providing a customized size. This can help the boxplot more clearly and easier to read. For that, we need to pass the fontsize argument in to this function.


# Customize the font size of boxplot
b_plot = df.boxplot(column = 'Num1', fontsize = 15) 
b_plot.plot()
plot.show

Yields below output.

Pandas boxplot
Box plot of single column with customized fontsize

FAQ on Plot the Boxplot from DataFrame

How do I install the required libraries (Matplotlib and Seaborn)?

To install the required libraries, Matplotlib and Seaborn, you can use the following commands in your terminal or command prompt.

What is a boxplot, and what information does it provide?

A boxplot (box-and-whisker plot) is a graphical representation that displays the distribution of a dataset. It shows the median, quartiles, and potential outliers. The box represents the interquartile range (IQR), the line inside the box is the median, and the whiskers extend to show the range of the data. Outliers may be shown as individual points beyond the whiskers.

Can I plot a specific column from the DataFrame as a boxplot?

You can specify the column(s) you want to include in the boxplot. For example, if you have a DataFrame df and want to plot only the ‘Category1’ column.

How do I interpret outliers in a boxplot?

In a boxplot, outliers are individual points beyond the whiskers. They represent values that are significantly different from the majority of the data. Outliers can indicate potential errors in the data or interesting observations that merit further investigation.

Can I combine boxplots for different categories side by side?

You can create side-by-side boxplots for different categories by using the boxplot() function with the by parameter.

Conclusion

In this article, I have explained boxplot() function and using this how we can plot the data in a DataFrame in the form boxplot presentation. And also I explained the organization of the boxplot using various keyword arguments.

Happy learning !!

References