In pandas, the info()
function is used to quickly get a summary of the DataFrame. It provides essential details such as the number of non-null entries in each column, the data type of each column, and the memory usage of the DataFrame. This function is very helpful for understanding the structure of your data at a glance, especially when dealing with large datasets.
In this article, I will explain the Pandas DataFrame info()
function, covering its syntax, parameters, usage, and the details it provides. The info()
function in pandas offers a concise summary of a DataFrame, including essential details such as index dtype, column dtypes, non-null values, and memory usage.
Key Points –
- The
info()
function provides a concise summary of the DataFrame, including the index dtype and column types. - It displays the number of non-null (non-missing) entries for each column, helping to quickly identify missing data.
- The function lists the data type (dtype) of each column, aiding in data type verification and ensuring consistency in the dataset.
- It provides an estimate of the memory usage of the DataFrame, with an option for a deep introspection mode to give a detailed analysis.
- The function includes parameters such as
verbose
,buf
,max_cols
, andmemory_usage
to customize the output according to specific needs.
Syntax of Pandas DataFrame info() Function
Following is the syntax of the Pandas DataFrame info().
# Syntax of Pandas DataFrame info()
DataFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)
Parameters of the DataFrame info()
Following are the parameters of the DataFrame info() function.
verbose (bool, optional)
– Whether to print the full summary. By default, the output is short form when the DataFrame is larger than the maximum columns (max_cols
). When set toTrue
, it will show the full summary. Default isNone
.buf (writable buffer, optional)
– The writable buffer where to send the output. By default, it prints tosys.stdout
. This can be useful if you want to capture the output in a string or file.max_cols (int, optional)
– When to switch from the verbose to the truncated output. IfNone
, it defaults to pandas’ global settings.memory_usage (bool, str, optional)
– Specifies whether to include memory usage in the output. IfTrue
, the memory usage of the DataFrame is displayed. If set todeep
, a deep introspection mode is used, which includes a detailed analysis of the memory usage. Default isNone
.null_counts (bool, optional)
– Whether to show the count of missing values. This parameter is deprecated and will be removed in future versions.
Return Value
The info()
function in pandas does not return any value. Instead, it prints a summary of the DataFrame to the console or the specified buffer. This summary includes details such as the number of non-null entries, data types of the columns, and memory usage.
Usage of Pandas DataFrame info() Function
The info()
function in pandas is commonly used to gain a quick understanding of the structure and content of a DataFrame.
To run some examples of the Pandas DataFrame info() function, let’s create a Pandas DataFrame using data from a dictionary.
# Create Pandas DataFrame
import pandas as pd
import numpy as np
technologies= {
'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"],
'Fee' :[22000, 25000, 23000, 24000, 26000],
'Discount':[1000, 2300, 1500, 1200, 2500],
'Duration':['35days', '50days', '40days', '30days', '25days']
}
df = pd.DataFrame(technologies)
print("Original DataFrame:\n", df)
Yields below output.
Alternatively, to demonstrate the basic usage of the info()
function with the DataFrame you’ve created, you can call the info()
method on the DataFrame. This will provide a concise summary of the DataFrame, including the number of entries, the column names, non-null counts, and data types.
# Basic usage of info() function
print("DataFrame Info:\n")
df.info()
Here,
<class 'pandas.core.frame.DataFrame'>
indicates the type of the object.RangeIndex: 5 entries, 0 to 4
shows the number of entries (rows) and their index range.Column Details
: Lists the column names. Shows the number of non-null (non-missing) entries for each column. Indicates the data type of each column.dtypes: int64(2), object(2)
provides a summary of the data types present in the DataFrame.memory usage: 288.0+ bytes
gives an estimate of the memory usage of the DataFrame.
Memory Usage (Deep Analysis)
To perform a deep memory usage analysis using the info()
function in pandas, you can use the memory_usage=deep
parameter. This will provide a detailed breakdown of memory usage for each column, which can be especially useful for large datasets or when you need to optimize memory consumption.
# Memory usage analysis with deep introspection
print("DataFrame Info with Deep Memory Usage:\n")
df.info(memory_usage='deep')
# Output:
# DataFrame Info with Deep Memory Usage:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 4 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Courses 5 non-null object
# 1 Fee 5 non-null int64
# 2 Discount 5 non-null int64
# 3 Duration 5 non-null object
# dtypes: int64(2), object(2)
# memory usage: 838.0 bytes
Using info() Function with Verbose Option
The verbose
option in the info()
function controls whether to print a full summary or a truncated one. When verbose=True
, the function prints detailed information about the DataFrame, including all columns. When verbose=False
, the output is shorter and might omit some columns if there are many.
When you call df.info(verbose=True)
, the output includes detailed information about the DataFrame, listing all columns, their data types, and memory usage.
# Using info() function with verbose=True
print("DataFrame Info with Verbose=True:\n")
df.info(verbose=True)
# Output:
# DataFrame Info with Verbose=True:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 4 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Courses 5 non-null object
# 1 Fee 5 non-null int64
# 2 Discount 5 non-null int64
# 3 Duration 5 non-null object
# dtypes: int64(2), object(2)
# memory usage: 288.0+ bytes
When you call df.info(verbose=False)
, the output might be shorter. In this example, since the DataFrame is small, there is no significant difference, but for larger DataFrames, it would truncate the output to show fewer details.
# Using info() function with verbose=False
print("DataFrame Info with Verbose=False:\n")
df.info(verbose=False)
# Output:
# DataFrame Info with Verbose=False:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Columns: 4 entries, Courses to Duration
# dtypes: int64(2), object(2)
# memory usage: 288.0+ bytes
Using info() Function to Missing Values
Similarly, using the info()
function is particularly helpful when dealing with DataFrames that contain missing values. It helps you quickly identify which columns have missing data and how many entries are missing.
# Using the info() function to identify missing values
print("DataFrame info with missing values:\n")
df.info()
# Output:
# DataFrame info with missing values:
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 5 entries, 0 to 4
# Data columns (total 4 columns):
# # Column Non-Null Count Dtype
# --- ------ -------------- -----
# 0 Courses 4 non-null object
# 1 Fee 4 non-null float64
# 2 Discount 4 non-null float64
# 3 Duration 4 non-null object
# dtypes: float64(2), object(2)
# memory usage: 288.0+ bytes
FAQ on Pandas DataFrame info() Function
The info()
function provides a concise summary of a DataFrame, including the number of entries, column names, non-null counts, data types, and memory usage.
The memory_usage
parameter, when set to True
or 'deep'
, includes memory usage information in the summary. Setting it to 'deep'
provides a detailed memory usage analysis.
The verbose
parameter controls the verbosity of the output. When set to True
, it provides a full summary of the DataFrame. When set to False
, it provides a truncated summary.
The info()
function shows the number of non-null (non-missing) entries for each column, which helps in identifying columns with missing values.
The max_cols
parameter limits the number of columns to be printed. If not specified, it defaults to pandas’ global settings.
Conclusion
In this article, you have explored the Pandas DataFrame info()
function, including its syntax, parameters, and usage. The info()
method does not return a value; instead, it prints a summary to the console or a specified buffer.
Happy Learning!!
Related Articles
- Pandas DataFrame sum() Method
- Pandas DataFrame corr() Method
- Pandas DataFrame assign() Method
- Pandas DataFrame insert() Function
- Pandas Select Rows Based on List Index
- Pandas Add Column with Default Value
- Pandas Get Total / Sum of Columns
- Pandas DataFrame mode() Method
- Pandas DataFrame nunique() Method
- Pandas DataFrame clip() Method
- Pandas DataFrame median() Method
- Pandas DataFrame div() Function