In Pandas one of the visualization plot is Histograms
are used to represent the frequency distribution for numeric data. It divides the values within a numerical variable into bins and counts the values that are fallen into a bin. Plotting a histogram is a good way to explore the distribution of our data. This is useful when the DataFrames Series is on a similar scale.
In this article, I will explain the concept of the histogram and using different histogram functions how we can plot the histogram from the given DataFrame.
Key Points –
- Use
DataFrame.hist()
to generate histograms for all numeric columns in a DataFrame. - Use the
bins
parameter to control the number of bins (intervals) in the histogram. - You can specify which column to plot using the
column
parameter or by selecting the column from the DataFrame. - Histograms can also be created using
DataFrame.plot(kind='hist')
for a more flexible plotting method. - Use the
subplots=True
parameter to create individual histograms for each column in a DataFrame. - You can add titles to your histograms by using the
title
parameter or adding titles to individual subplots. - Customize the histogram appearance (e.g., color, grid, and labels) by passing additional keyword arguments supported by
matplotlib
. - Use the
density=True
parameter to normalize the histogram, turning it into a probability distribution.
1. Quick Examples of Pandas Histogram
If you are in a hurry, below are some quick examples of how to plot a histogram using pandas.
# Quick examples of pandas histogram
# Example 1: Plot the histogram from DataFrame
df.hist()
# Example 2: Customize the bins of histogram
df.hist(bins = 3)
# Example 3: create histogram of specified column
df.hist(column = 'Maths')
# Example 4: Plot the histogram
# Using plot()
df.plot(kind = 'hist')
# Example 5: create histogram with title
df.plot(kind = 'hist', title = 'Students Marks')
# Example 6: Create multiple titles of histogram
df.plot(kind='hist', subplots=True, title=['Maths', 'Physics', 'Chemistry'])
# Example 7: Create histogram
# Using plot.hist()
df.plot.hist()
2. How to Plot Pandas Histogram
In Pandas a histogram is a graphical representation of data points, it can be organized into bins. Following are the multiple ways to make a histogram plot in pandas.
pd.DataFrame.hist(column)
pd.DataFrame.plot(kind='hist')
pd.DataFrame.plot.hist()
3. Plot Histogram Use hist() in Pandas
Create a histogram using pandas hist()
method, is a default method. For that we need to create Pandas DataFrame using Python Dictionary. Let’s create DataFrame.
# Create Pandas DataFrame
import pandas as pd
import numpy as np
# Create DataFrame
df = pd.DataFrame({
'Maths': [80.4, 50.6, 70.4, 50.2, 80.9],
'Physics': [70.4, 50.4, 60.4, 90.1, 90.1],
'Chemistry': [40, 60.5, 70.8, 90.88, 40],
'Students': ['Student1', 'Student1', 'Student1', 'Student2', 'Student2']
})
print("Create DataFrame:\n",df)
Yields below output.
In order to plot a histogram in pandas using hist()
function, DataFrame can call the hist()
. It will return the histogram of each numeric column in the pandas DataFrame.
# Plot the histogram from DataFrame
df.hist()
Yields below output.
3.1 Bins of a Histogram
In histogram bins
are the class intervals in which our data is grouped. We can create a plot based on the number of values in each interval. By default, the hist()
function takes 10 bins
. We can customize the number of bins using this function. We can Pass the number of bins directly which we want in the histogram.
# Customize the bins of histogram
df.hist(bins = 3)
3.2 Plot a Histogram For Specific Column
As we know from the above, by default, we can get a histogram for each column of given DataFrame. If we want plot histogram on a specific column, then we can go with the column
parameter of the hist()
function. For, that we need to pass which column we want to plot the histogram into hist()
function, it will plot the specified column histogram.
# Create histogram of specified column
df.hist(column = 'Maths')
3.3 Plot Histograms for Different Groups Along Specified Columns
In hist()
function using by
parameter we can plot separate histograms for different groups of data. For, that we have to specify which column groups we want to plot separate histograms. It will return separate histograms for each group.
For example, two histograms are created for the maths
column.
4. Plot Histogram use plot() Function
Histogram can also be created by using the plot()
function on pandas DataFrame. The main difference between the .hist()
and .plot()
functions is that the hist()
function creates histograms for all the numeric columns of the DataFrame on the same figure. No separate plots are made in the case of the .plot
function.
Plot function can also take in the bins
and by
parameter same as hist function. The plot
function can be used for histogram plotting in two ways.
4.1 Syntax of plot()
Following is the syntax of plot() function.
# Syntax of plot()
df.plot(kind='hist')
kind :
It takes in the kind of plot to be created. For histogram, you need to pass the value as hist
.
# Plot the histogram using plot()
df.plot(kind = 'hist')
4.2 Create Title of Histogram
Using plot()
function we are not able to construct histogram of all individual columns of DataFrame
# Create histogram with title
df.plot(kind = 'hist', title = 'Students Marks')
4.3 Create Multiple Titles for Individual Subplots
The following code shows how to create individual titles for subplots in pandas:
# Create multiple titles of histogram
df.plot(kind='hist', subplots=True, title=['Maths', 'Physics', 'Chemistry'])
5. Create Histogram use plot.hist() Function
Use plot.hist()
function we can find out the histogram of DataFrame. Directly access the histogram hist
method from the plot
function. Just add the .hist()
after .plot
function.
5.1 Syntax of Pandas plot.hist()
Following is the syntax of plot.hist().
# Syntax of plot.hist()
DataFrame.plot.hist(by=None, bins=10, **kwargs)
5.2 Parameters of the plot.hist()
Following are the parameters of the plot.hist().
by :
(str or sequence, optional)Column in the DataFrame to group by.bin :
(int, default 10)Number of histogram bins to be used.**kwargs :
Additional keyword arguments
# Create histogram using plot.hist()
df.plot.hist()
Frequently Asked Questions on Plot a Histogram Using Pandas
To install Pandas and Matplotlib, you can use the following commands in your terminal or command prompt.
Use the following import statements at the beginning of your script or Jupyter Notebook.
The ‘bins’ parameter in the hist
function of Pandas is used to specify the number of bins or intervals in the histogram. Bins are essentially the ranges of values that your data will be divided into. Each bin represents a specific range of values, and the histogram will display how many data points fall into each bin.
To add labels and a title to the histogram plot, you can use Matplotlib functions.
plt.xlabel(‘Values’): Adds a label to the X-axis.
plt.ylabel(‘Frequency’): Adds a label to the Y-axis.
plt.title(‘Histogram of Values’): Sets the title of the histogram plot.
To display the histogram in a Python script or Jupyter Notebook, you can use the plt.show()
function from Matplotlib.
You can customize the appearance of the histogram in Pandas by providing additional parameters to the hist
function.
Conclusion
In this article, I have explained concept of histogram and using various hist()
functions how we can plot the histogram from the DataFrame.
Happy learning !!
Related Articles
- How to Add Title to Pandas Plot?
- How to Plot the Pandas Series?
- How to Change Pandas Plot Size?
- How to generate line plot in Pandas?
- How to add legends to plots in Pandas
- How to Plot Columns of Pandas DataFrame
- How to make a histogram in Pandas Series?
- How to Generate Time Series Plot in Pandas
- How to Plot a Scatter Plot Using Pandas?
- How to Plot the Boxplot from DataFrame?
- Create Pandas Plot Bar Explained with Examples
- How to distribute column values in Pandas plot?