pandas header() function is used to get the top N rows from DataFrame or top N elements from a Series. When used negative number it returns all except the last N rows. This function is mainly used for testing to check if the object contains the right type of Data.
When you wanted to extract only top N rows after all your filtering and transformations, you can use the head() method, which is defined in the Pandas library.
pandas head() Key Points –
- Reurns top N elements.
- head() function exists in Series and DataFrame.
- When no param is used on head(), by default returns top 5 rows.
- Use negative number used, it ignores the last N rows.
- It returns the same object as caller.
1. head() Syntax
Following is the syntax of the head() method of the DataFrame and Series.
# Syntax of head() method
DataFrame.head(n)
Series.head(n)
Let’s create a DataFrame with Dict
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Courses' :['Spark','Python','Java','C++','Hadoop','R','C#','AWS'],
'Fee' :[22000,25000,23000,22000,30000,22000,32000,40000],
'Duration':['30days','50days','30days','35days','40days','45days','50days','60days']
})
print(df)
# Output:
# Courses Fee Duration
# 0 Spark 22000 30days
# 1 Python 25000 50days
# 2 Java 23000 30days
# 3 C++ 22000 35days
# 4 Hadoop 30000 40days
# 5 R 22000 45days
# 6 C# 32000 50days
# 7 AWS 40000 60days
2. DataFrame.head() Example
pandas DataFrame.head() method is used to get the top or bottom N rows of the DataFrame. When a positive number is used, it returns top N rows.
For negative numbers, it returns the rows except last N.
This function is mostly used for testing purposes to check if it contains the right data.
By default, without N value as param, it returns the top 5 rows.
# Default return 5 rows
print(df.head())
# Output:
# Courses Fee Duration
# 0 Spark 22000 30days
# 1 Python 25000 50days
# 2 Java 23000 30days
# 3 C++ 22000 35days
# 4 Hadoop 30000 40days
To get the top 3 rows, use value 3 for N param.
# Top N rows
print(df.head(3))
# Output:
# Courses Fee Duration
# 0 Spark 22000 30days
# 1 Python 25000 50days
# 2 Java 23000 30days
Use negative numbers for N to get the rows except for N rows from the last. You can also achieve the same result by using df[:n]. The below example ignores the last 3 records and returns the remaining.
# Except last N rows
print(df.head(-3))
# Output:
# Courses Fee Duration
# 0 Spark 22000 30days
# 1 Python 25000 50days
# 2 Java 23000 30days
# 3 C++ 22000 35days
# 4 Hadoop 30000 40days
# 5 R 22000 45days
3. pandas Series.head() Example
pandas series.head() function is used to get the top N elements. When used negative integer it returns elements except for the last N. Since each column of a DataFrame is a Series, I will use one column from above DataFrame to explain.
By default, without N value as param, it returns the top 5 elements.
# head() example
print(df['Fee'].head())
# Output:
# 0 22000
# 1 25000
# 2 23000
# 3 22000
# 4 30000
# Name: Fee, dtype: int64
Top N elements from Series
# Top N elements from Series
print(df['Fee'].head(3))
# Output:
# 0 22000
# 1 25000
# 2 23000
# Name: Fee, dtype: int64
To get except the last N elements, use the negative N value.
# Except last n rows
print(df['Fee'].head(-3))
# Output:
# 0 22000
# 1 25000
# 2 23000
# 3 22000
# 4 30000
# Name: Fee, dtype: int64
Conclusion
In this article, you have learned the syntax and usage of the head() function with examples. Also learned head() is used to get the top N elements and it is available in both DataFrame and Series. This returns the same object as the caller.
Related Articles
- How to Iterate over DataFrame Rows in pandas
- Pandas Iterate Over Columns of DataFrame
- How to Rename Column(s) in pandas DataFrame
- Get First N Rows of Pandas DataFrame
- Pandas Groupby Aggregate Explained
- Pandas Get First Column of DataFrame as Series?
- Pandas Groupby Sort within Groups
- How to add/insert row to Pandas DataFrame?