Pandas DataFrame boxplot()
function is used to make a box plot from the given DataFrame columns. Boxplot is also called a Whisker plot that helps us better understand by providing the range of values in your data set and identifying any outliers in a format that’s easier to understand than the raw data.
In the boxplot graph, the x-axis represents the data we are going to plot and the y-axis represents frequency. In this article, I will explain how to plot the boxplot from DataFrame. The boxplot is also present in the Matplotlib library.
Key Points –
- Boxplots in Pandas can be created using the
.boxplot()
method on a DataFrame. - The
.boxplot()
method generates boxplots for one or more columns in the DataFrame. - The
column
parameter specifies the column or columns for which the boxplot should be generated. - Boxplots provide a visual summary of data distribution, including the median, quartiles, and potential outliers.
- You can customize the appearance of the boxplot by adjusting parameters such as
color
,fontsize
, andwidth
.
Quick Examples of Create Boxplot of DataFrame
If you are in a hurry below are some quick examples of how to create a box plot using boxplot().
# Quick examples of create boxplot of dataframe
# Create DataFrame
np.random.seed(10)
df = pd.DataFrame(np.random.rand(10, 3),
columns=['Num1', 'Num2', 'Num3' ])
# Example 1: Plot the box plot of single column of DataFrame
b_plot = df.boxplot(column = 'Num1')
b_plot.plot()
# Example 2: Create plot box for multiple columns
b_plot = df.boxplot(column = ['Num1', 'Num2', 'Num3'])
b_plot.plot()
# Example 3: Customize the boxplot color
b_plot = df.boxplot(column = 'Num1', color = 'orange' )
b_plot.plot()
# Example 4: Create the title of the boxplot
b_plot = df.boxplot(column = 'Num1')
plot.title('Random Numbers')
b_plot.plot()
# Example 5: Customize the font size of boxplot
b_plot = df.boxplot(column = 'Num1', fontsize = 15)
b_plot.plot()
Syntax of Pandas boxplot()
Following is the syntax of the boxplot()
.
# Syntax of boxplot()
DataFrame.boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, backend=None, **kwargs)
Parameters of the boxplot()
Following are the parameters of the boxplot().
column
: ( string, list of string) Column name or names.by
: (string, array) Column in the DataFrame to group by.ax
: object of class matplot.axes.Axes – The matplot axis to be used by a boxplot.fontsize
: (int or float) The font size of the label.rot
: (int or float) The degree by which the labels should be rotated.grid
: (bool) Whether or not to show the grid.figsize
: tuple (width, height) The size of the output image.**kwargs
: tuple (rows, columns) All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot().
Return Value
When return_type is,
axes
: Returns the matplot axes that the boxplot is drawn on.dict
: Returns the dictionary that is in the matplotlib Lines of the boxplo.-
axes and dict
: Returns a named tuple with the axes and dict. Grouping with by
: A series mapping columns to return_type is returned.None
: A NumPy array of axes with the same shape as layout is returned.
Usage of boxplot()
Box plot is a popular method for visualizing numerical data in pandas, which can be created by computing the quartiles of a data set, which divides the number range into four pieces based on their distribution. Following is the basic information of the quartile.
Median :
Which is the value in the middle of the distribution.Lower quartile :
Midpoint between the median and lowest value in the rangeUpper quartile :
Midpoint between the median and highest value in the rangeLower boundary :
Which is the lowest value in the distributionHigher boundary :
Which is the highest value in the distribution
Pandas Boxplot Single Column
We can visualize the given DataFrame in box plot chart by using boxplot()
function, it will return the summarization of the given data in the form of boxplot. Let’s create a Pandas DataFrame with columns of randomly generated numbers using np.random.rand() function. In order to stop the repeating random numbers for every run time execution, we have to feed the random seed()
function.
# Imports
import matplotlib.pyplot as plot
import pandas as pd
import numpy as np
# Create DataFrame
np.random.seed(10)
df = pd.DataFrame(np.random.rand(10, 3),
columns=['Num1', 'Num2', 'Num3' ])
print(df)
Yields below output.
By using the above DataFrame, plot the Boxplot on random numbers. In the boxplot, the bottom line indicates the minimum number of random numbers and the top line indicates the maximum number of random numbers. Between the bottom and top, the middle 3 lines indicate 1st quartile, median, and 3rd quartile respectively.
Let’s create a boxplot for a single column of a given DataFrame using boxplot()
function. It will generate a boxplot from the column of 'Num1'
.
# Plot the box plot of single column of DataFrame
b_plot = df.boxplot(column = 'Num1')
b_plot.plot()
plot.show()
Yields below output.
Pandas Boxplot Multiple Columns
Let’s create a boxplot()
with multiple column names, for each column it creates a boxplot. It will generate multiple boxplots from the columns of 'Num1'
, 'Num2'
, 'Num3'
. Boxplots are not limited to depicting single columns, A major use case for boxplots is to compare related distributions. For example,
# Create plot box for multiple columns
b_plot = df.boxplot(column = ['Num1', 'Num2', 'Num3'])
b_plot.plot()
plot.show()
Yields below output.
From the above, you can see the distributions of the random number for all columns of random numbers and how each column’s numbers compare with others. You can also notice that an outlier
in the “Num2” distribution, as denoted by the bubble outside the distribution.
Pandas Boxplot Customizations
The pandas library provides multiple keyword arguments for providing customization of boxplots. Let’s see some of them and how they work with boxplots.
Customize the Color of Boxplot
We can improve the boxplot distribution by providing customized colors for that we need to pass the color argument into boxplot()
, which will return the desired color of the boxplot.
# Customize the boxplot color
b_plot = df.boxplot(column = 'Num1', color = 'orange' )
b_plot.plot()
plot.show()
Yields below output.
Pandas Boxplot Title
By providing the title
to the boxplot, users can understand quickly what they are seeing. You can add a title to your boxplot by using the title()
function.
# Create the title of the boxplot
b_plot = df.boxplot(column = 'Num1')
plot.title('Random Numbers')
b_plot.plot()
plot.show
Yield below output.
Pandas Boxplot Label Font Size
We can change the default font size by providing a customized size. This can help the boxplot more clearly and easier to read. For that, we need to pass the fontsize
argument in to this function.
# Customize the font size of boxplot
b_plot = df.boxplot(column = 'Num1', fontsize = 15)
b_plot.plot()
plot.show
Yields below output.
FAQ on Plot the Boxplot from DataFrame
To install the required libraries, Matplotlib and Seaborn, you can use the following commands in your terminal or command prompt.
A boxplot (box-and-whisker plot) is a graphical representation that displays the distribution of a dataset. It shows the median, quartiles, and potential outliers. The box represents the interquartile range (IQR), the line inside the box is the median, and the whiskers extend to show the range of the data. Outliers may be shown as individual points beyond the whiskers.
You can specify the column(s) you want to include in the boxplot. For example, if you have a DataFrame df
and want to plot only the ‘Category1’ column.
In a boxplot, outliers are individual points beyond the whiskers. They represent values that are significantly different from the majority of the data. Outliers can indicate potential errors in the data or interesting observations that merit further investigation.
You can create side-by-side boxplots for different categories by using the boxplot()
function with the by
parameter.
Conclusion
In this article, I have explained boxplot()
function and using this how we can plot the data in a DataFrame in the form boxplot presentation. And also I explained the organization of the boxplot using various keyword arguments.
Happy learning !!
Related Articles
- How to Change Pandas Plot Size?
- How to add title to Pandas plots?
- How to generate line plot in Pandas?
- How to add legends to plots in Pandas
- How to change Plot size in pandas?
- How to Plot Columns of Pandas DataFrame
- How to generate histograms in Pandas?
- How to create Pandas Series plot?
- How to Plot a Scatter Plot Using Pandas?
- How to Generate Time Series Plot in Pandas?
- Create Pandas Plot Bar Explained with Example
- Create Pandas Plot Bar Explained with Example