• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:9 mins read
You are currently viewing Plot Distribution of Column Values in Pandas

We can use the DataFrame.plot() function to distribute column values in a Pandas DataFrame plot. It is an in-built function for data visualization. Using this function we can plot the given DataFrame in different ways. In this article, I will explain the plot() and using this function how to distribute the column values of a given DataFrame in different plots.

Advertisements

1. Quick Examples of Plot Distribution of Column Values

Following are quick examples of plot distribution of column Values in pandas.


# Below are the quick examples.

# Example 1: Plot distribution of values in Marks column
df['Marks].plot(kind='kde')

# Example 2: Plot distribution of values in Marks column using histogram
df['Marks'].plot(kind='hist', edgecolor='black')

# Example 3: Plot distribution of points by Students 
df.groupby('Students')['Marks'].plot(kind='kde')

# Example 4:  Plot distribution of points by Students using histogram
df.groupby('Students')['Marks'].plot(kind='hist')

Let’s create Pandas DataFrame using Python Dictionary where the columns are ‘Students’ and ‘Marks’. Apply the df.plot() function on DataFrame and distribute its column values on different types of visualization.


# Create Pandas DataFrame
import pandas as pd
import numpy as np
# Create DataFrame
df = pd.DataFrame({
    'Students':  ['Student1', 'Student1', 'Student1', 'Student2', 'Student2', 'Student1', 'Student1', 
 'Student1', 'Student2', 'Student2'],
      'Marks' : [80.4, 50.6, 70.4, 50.2, 80.5, 70.4, 50.4, 60.4, 90.1, 90.5]
              })
print("Create DataFrame:\n", df)

Yields below output.


# Output:
# Create DataFrame:
   Students  Marks
0  Student1   80.4
1  Student1   50.6
2  Student1   70.4
3  Student2   50.2
4  Student2   80.5
5  Student1   70.4
6  Student1   50.4
7  Student1   60.4
8  Student2   90.1
9  Student2   90.5

2. Plot Distribution of Column Values in Pandas

Using the df.plot() function we can distribute the specific column values in the form of a specified plot. For that we need to set the kind param as 'kde'(kernel density estimation) and then, pass it into the plot() function, it will distribute the column values in the form smooth curve.


# Plot distribution of values in Marks column
df['Marks].plot(kind='kde')
print(df)

Yields below output.

Pandas plot distibution
Plot distribution of column Values of Pandas DataFrame

4. Plot Distribution of Columns in Pandas using Histogram

In Pandas one of the visualization plots is Histograms, which is used to represent the frequency distribution for numeric data. It divides the values within a numerical variable into bins and counts the values that have fallen into a bin. Plotting a histogram is a good way to explore the distribution of our data. This is useful when the DataFrames Series is on a similar scale.

Pass kind=’hist’ into the plot() function and distribute the column values of the given DataFrame in the form of a histogram plot. This plot uses bars to represent the distribution of values in the 'Marks' column.


# Plot distribution of values in Marks column using histogram
df['Marks'].plot(kind='hist', edgecolor='black')
print(df)

Yields below output.

Pandas plot distribution
Plot distribution of values in Marks column using histogram

5. Plot Distribution of Column values Grouped by Another Column

Using the df.plot() function and df.groupby() function we can distribute one column values grouped by another column values. The following syntax will show a plot distribution of values in the 'Marks' column, grouped by the 'Students' column. We can add labels and title to the distribution plot using the plt.legend() function, and using the plt.xlabel() function we can add the label of the x-axis. These functions are provided by the matplotlib library.  


import matplotlib.pyplot as plt
# Plot distribution of points by Students 
df.groupby('Students')['Marks'].plot(kind='kde')
print(df)

# Add legend to plot
plt.legend(['Student1', 'student2'], title='Students')

# Add x-axis label
plt.xlabel('Marks') 

Yields below output.

Pandas plot distribution
Plot distribution of one column, grouped by another column

6. Plot Distribution of Column values Grouped by Another Column using Histogram

The following syntax will show a plot distribution of values in the 'Marks' column, grouped by the 'Students' column in the form of a histogram. For example,


# Plot distribution of points by Students using histogram
df.groupby('Students')['Marks'].plot(kind='hist')
print(df)
# Add legend to plot
plt.legend(['Student1', 'student2'], title='Students')

# Add x-axis label
plt.xlabel('Marks') 

Yields below output.

Pandas plot distribution
Plot distribution of one column, grouped by another column using histogram

7. Conclusion

In this article, I have explained Pandas DataFrame plot() and using this function how we can distribute the column values of a given Pandas DataFrame in different plots of visualization.

References