• Post author:
  • Post category:Pandas
  • Post last modified:December 3, 2024
  • Reading time:13 mins read
You are currently viewing Plot Distribution of Column Values in Pandas

We can use the DataFrame.plot() function to distribute column values in a Pandas DataFrame plot. It is an in-built function for data visualization. Using this function we can plot the given DataFrame in different ways. In this article, I will explain the plot() and using this function how to distribute the column values of a given DataFrame in different plots.

Advertisements

Key Points –

  • KDE Plot (Kernel Density Estimate) provides a smooth estimate of the distribution of a column’s values, useful for visualizing the probability density function.
  • Histogram displays the frequency of data points in different bins, allowing for easy visualization of the distribution’s shape (e.g., normal, skewed).
  • plot() method in Pandas allows for quick and flexible plotting of distributions, with options for different kinds of plots (e.g., 'kde', 'hist').
  • Edge color in histograms (edgecolor parameter) is often used to differentiate bars visually, making plots clearer.
  • Grouping by categories (using groupby()) allows for the comparison of distributions across different groups or categories.
  • Adjusting plot appearance (e.g., title, axis labels, color, line style) enhances the readability and interpretability of the distribution plots.

Quick Examples of Plot Distribution of Column Values

Following are quick examples of plot distribution of column values in pandas.


# Quick examples of plot distribution of column values

# Example 1: Plot distribution of values in marks column
df['Marks].plot(kind='kde')

# Example 2: Plot distribution of values in marks column 
# Using histogram
df['Marks'].plot(kind='hist', edgecolor='black')

# Example 3: Plot distribution of points by Students 
df.groupby('Students')['Marks'].plot(kind='kde')

# Example 4: Plot distribution of points by Students
# Using histogram
df.groupby('Students')['Marks'].plot(kind='hist')

Let’s create Pandas DataFrame using Python Dictionary where the columns are ‘Students’ and ‘Marks’. Apply the df.plot() function on DataFrame and distribute its column values on different types of visualization.


# Create Pandas DataFrame
import pandas as pd
import numpy as np
# Create DataFrame
df = pd.DataFrame({
    'Students':  ['Student1', 'Student1', 'Student1', 'Student2', 'Student2', 'Student1', 'Student1', 
 'Student1', 'Student2', 'Student2'],
      'Marks' : [80.4, 50.6, 70.4, 50.2, 80.5, 70.4, 50.4, 60.4, 90.1, 90.5]
              })
print("Create DataFrame:\n", df)

Yields below output.


# Output:
# Create DataFrame:
   Students  Marks
0  Student1   80.4
1  Student1   50.6
2  Student1   70.4
3  Student2   50.2
4  Student2   80.5
5  Student1   70.4
6  Student1   50.4
7  Student1   60.4
8  Student2   90.1
9  Student2   90.5

Plot Distribution of Column Values in Pandas

Using the df.plot() function we can distribute the specific column values in the form of a specified plot. For that we need to set the kind param as 'kde'(kernel density estimation) and then, pass it into the plot() function, it will distribute the column values in the form smooth curve.


# Plot distribution of values in Marks column
df['Marks].plot(kind='kde')
print(df)

Yields below output.

Pandas plot distibution
Plot distribution of column Values of Pandas DataFrame

Plot Distribution of Columns in Pandas using Histogram

In Pandas one of the visualization plots is Histograms, which is used to represent the frequency distribution for numeric data. It divides the values within a numerical variable into bins and counts the values that have fallen into a bin. Plotting a histogram is a good way to explore the distribution of our data. This is useful when the DataFrames Series is on a similar scale.

Pass kind=’hist’ into the plot() function and distribute the column values of the given DataFrame in the form of a histogram plot. This plot uses bars to represent the distribution of values in the 'Marks' column.


# Plot distribution of values in Marks column using histogram
df['Marks'].plot(kind='hist', edgecolor='black')
print(df)

Yields below output.

Pandas plot distribution
Plot distribution of values in Marks column using histogram

Plot Distribution of Column Values Grouped by Another Column

Using the df.plot() function and df.groupby() function we can distribute one column values grouped by another column values. The following syntax will show a plot distribution of values in the 'Marks' column, grouped by the 'Students' column. We can add labels and title to the distribution plot using the plt.legend() function, and using the plt.xlabel() function we can add the label of the x-axis. These functions are provided by the matplotlib library.  


import matplotlib.pyplot as plt
# Plot distribution of points by Students 
df.groupby('Students')['Marks'].plot(kind='kde')
print(df)

# Add legend to plot
plt.legend(['Student1', 'student2'], title='Students')

# Add x-axis label
plt.xlabel('Marks') 

Yields below output.

Pandas plot distribution
Plot distribution of one column, grouped by another column

Plot Distribution of Column values Grouped by Another Column using Histogram

The following syntax will show a plot distribution of values in the 'Marks' column, grouped by the 'Students' column in the form of a histogram. For example,


# Plot distribution of points by Students using histogram
df.groupby('Students')['Marks'].plot(kind='hist')
print(df)
# Add legend to plot
plt.legend(['Student1', 'student2'], title='Students')

# Add x-axis label
plt.xlabel('Marks') 

Yields below output.

Pandas plot distribution
Plot distribution of one column, grouped by another column using histogram

FAQ on Plot Distribution of Column Values in Pandas

How do I plot the distribution of a column in a Pandas DataFrame?

You can use the .plot() function in Pandas along with kind='hist' to create a histogram, which visualizes the distribution of a numeric column.

How can I change the number of bins in the histogram?

The bins parameter controls how many bins are used to divide the range of the data. Adjust it to get a more granular or broader view of the distribution.

Can I plot the distribution of multiple columns?

You can plot the distribution of multiple columns at once by selecting those columns and calling .plot(kind='hist').

How do I plot a Kernel Density Estimate (KDE) instead of a histogram?

You can use kind='kde' to plot the density estimation, which is a smooth curve representing the data distribution.

Can I customize the appearance of the plot?

You can customize your plots using various Matplotlib features such as changing colors, adding titles, labels, or even adjusting the plot style.

Conclusion

In this article, I have explained Pandas DataFrame plot() and using this function how we can distribute the column values of a given Pandas DataFrame in different plots of visualization.

References