• Post author:
  • Post category:Pandas
  • Post last modified:July 8, 2024
  • Reading time:15 mins read

In pandas, the info() function is used to quickly get a summary of the DataFrame. It provides essential details such as the number of non-null entries in each column, the data type of each column, and the memory usage of the DataFrame. This function is very helpful for understanding the structure of your data at a glance, especially when dealing with large datasets.

Advertisements

In this article, I will explain the Pandas DataFrame info() function, covering its syntax, parameters, usage, and the details it provides. The info() function in pandas offers a concise summary of a DataFrame, including essential details such as index dtype, column dtypes, non-null values, and memory usage.

Key Points –

  • The info() function provides a concise summary of the DataFrame, including the index dtype and column types.
  • It displays the number of non-null (non-missing) entries for each column, helping to quickly identify missing data.
  • The function lists the data type (dtype) of each column, aiding in data type verification and ensuring consistency in the dataset.
  • It provides an estimate of the memory usage of the DataFrame, with an option for a deep introspection mode to give a detailed analysis.
  • The function includes parameters such as verbose, buf, max_cols, and memory_usage to customize the output according to specific needs.

Syntax of Pandas DataFrame info() Function

Following is the syntax of the Pandas DataFrame info().


# Syntax of Pandas DataFrame info()
DataFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)

Parameters of the DataFrame info()

Following are the parameters of the DataFrame info() function.

  • verbose (bool, optional) – Whether to print the full summary. By default, the output is short form when the DataFrame is larger than the maximum columns (max_cols). When set to True, it will show the full summary. Default is None.
  • buf (writable buffer, optional) – The writable buffer where to send the output. By default, it prints to sys.stdout. This can be useful if you want to capture the output in a string or file.
  • max_cols (int, optional) – When to switch from the verbose to the truncated output. If None, it defaults to pandas’ global settings.
  • memory_usage (bool, str, optional) – Specifies whether to include memory usage in the output. If True, the memory usage of the DataFrame is displayed. If set to deep, a deep introspection mode is used, which includes a detailed analysis of the memory usage. Default is None.
  • null_counts (bool, optional) – Whether to show the count of missing values. This parameter is deprecated and will be removed in future versions.

Return Value

The info() function in pandas does not return any value. Instead, it prints a summary of the DataFrame to the console or the specified buffer. This summary includes details such as the number of non-null entries, data types of the columns, and memory usage.

Usage of Pandas DataFrame info() Function

The info() function in pandas is commonly used to gain a quick understanding of the structure and content of a DataFrame.

To run some examples of the Pandas DataFrame info() function, let’s create a Pandas DataFrame using data from a dictionary.


# Create Pandas DataFrame
import pandas as pd
import numpy as np

technologies= {
    'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
    'Fee' :[22000, 25000, 23000, 24000, 26000],
    'Discount':[1000, 2300, 1500, 1200, 2500],
    'Duration':['35days', '50days', '40days', '30days', '25days']
          }

df = pd.DataFrame(technologies)
print("Original DataFrame:\n", df)

Yields below output.

pandas dataframe info

Alternatively, to demonstrate the basic usage of the info() function with the DataFrame you’ve created, you can call the info() method on the DataFrame. This will provide a concise summary of the DataFrame, including the number of entries, the column names, non-null counts, and data types.


# Basic usage of info() function
print("DataFrame Info:\n")
df.info()

Here,

  • <class 'pandas.core.frame.DataFrame'> indicates the type of the object.
  • RangeIndex: 5 entries, 0 to 4 shows the number of entries (rows) and their index range.
  • Column Details: Lists the column names. Shows the number of non-null (non-missing) entries for each column. Indicates the data type of each column.
  • dtypes: int64(2), object(2) provides a summary of the data types present in the DataFrame.
  • memory usage: 288.0+ bytes gives an estimate of the memory usage of the DataFrame.
pandas dataframe info

Memory Usage (Deep Analysis)

To perform a deep memory usage analysis using the info() function in pandas, you can use the memory_usage=deep parameter. This will provide a detailed breakdown of memory usage for each column, which can be especially useful for large datasets or when you need to optimize memory consumption.


# Memory usage analysis with deep introspection
print("DataFrame Info with Deep Memory Usage:\n")
df.info(memory_usage='deep')

# Output:
# DataFrame Info with Deep Memory Usage:

# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 4 columns):
#  #   Column    Non-Null Count  Dtype 
# ---  ------    --------------  ----- 
#  0   Courses   5 non-null      object
#  1   Fee       5 non-null      int64 
#  2   Discount  5 non-null      int64 
#  3   Duration  5 non-null      object
# dtypes: int64(2), object(2)
# memory usage: 838.0 bytes

Using info() Function with Verbose Option

The verbose option in the info() function controls whether to print a full summary or a truncated one. When verbose=True, the function prints detailed information about the DataFrame, including all columns. When verbose=False, the output is shorter and might omit some columns if there are many.

When you call df.info(verbose=True), the output includes detailed information about the DataFrame, listing all columns, their data types, and memory usage.


# Using info() function with verbose=True
print("DataFrame Info with Verbose=True:\n")
df.info(verbose=True)

# Output:
# DataFrame Info with Verbose=True:

# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 4 columns):
#  #   Column    Non-Null Count  Dtype 
# ---  ------    --------------  ----- 
#  0   Courses   5 non-null      object
#  1   Fee       5 non-null      int64 
#  2   Discount  5 non-null      int64 
#  3   Duration  5 non-null      object
# dtypes: int64(2), object(2)
# memory usage: 288.0+ bytes

When you call df.info(verbose=False), the output might be shorter. In this example, since the DataFrame is small, there is no significant difference, but for larger DataFrames, it would truncate the output to show fewer details.


# Using info() function with verbose=False
print("DataFrame Info with Verbose=False:\n")
df.info(verbose=False)

# Output:
# DataFrame Info with Verbose=False:

# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Columns: 4 entries, Courses to Duration
# dtypes: int64(2), object(2)
# memory usage: 288.0+ bytes

Using info() Function to Missing Values

Similarly, using the info() function is particularly helpful when dealing with DataFrames that contain missing values. It helps you quickly identify which columns have missing data and how many entries are missing.


# Using the info() function to identify missing values
print("DataFrame info with missing values:\n")
df.info()

# Output:
# DataFrame info with missing values:

# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 4 columns):
#  #   Column    Non-Null Count  Dtype  
# ---  ------    --------------  -----  
#  0   Courses   4 non-null      object 
#  1   Fee       4 non-null      float64
#  2   Discount  4 non-null      float64
#  3   Duration  4 non-null      object 
# dtypes: float64(2), object(2)
# memory usage: 288.0+ bytes

FAQ on Pandas DataFrame info() Function

What does the info() function do?

The info() function provides a concise summary of a DataFrame, including the number of entries, column names, non-null counts, data types, and memory usage.

What does the memory_usage parameter do?

The memory_usage parameter, when set to True or 'deep', includes memory usage information in the summary. Setting it to 'deep' provides a detailed memory usage analysis.

What is the purpose of the verbose parameter?

The verbose parameter controls the verbosity of the output. When set to True, it provides a full summary of the DataFrame. When set to False, it provides a truncated summary.

Can info() be used to check for missing values?

The info() function shows the number of non-null (non-missing) entries for each column, which helps in identifying columns with missing values.

What does max_cols parameter do?

The max_cols parameter limits the number of columns to be printed. If not specified, it defaults to pandas’ global settings.

Conclusion

In this article, you have explored the Pandas DataFrame info() function, including its syntax, parameters, and usage. The info() method does not return a value; instead, it prints a summary to the console or a specified buffer.

Happy Learning!!

Reference