pandas header() function is used to get the top N rows from DataFrame or top N elements from a Series. When used negative number it returns all except the last N rows. This function is mainly used for testing to check if the object contains the right type of Data.
When you wanted to extract only top N rows after all your filtering and transformations, you can use the head() method, which is defined in the Pandas library.
pandas head() Key Points –
- Reurns top N elements.
- head() function exists in Series and DataFrame.
- When no param is used on head(), by default returns top 5 rows.
- Use negative number used, it ignores the last N rows.
- It returns the same object as caller.
1. head() Syntax
Following is the syntax of the head() method of the DataFrame and Series.
# Syntax of head() method
DataFrame.head(n)
Series.head(n)
Let’s create a DataFrame with Dict
# Create a pandas DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Courses' :['Spark','Python','Java','C++','Hadoop','R','C#','AWS'],
'Fee' :[22000,25000,23000,22000,30000,22000,32000,40000],
'Duration':['30days','50days','30days','35days','40days','45days','50days','60days']
})
print(df)
#Outputs
# Courses Fee Duration
#0 Spark 22000 30days
#1 Python 25000 50days
#2 Java 23000 30days
#3 C++ 22000 35days
#4 Hadoop 30000 40days
#5 R 22000 45days
#6 C# 32000 50days
#7 AWS 40000 60days
2. DataFrame.head() Example
pandas DataFrame.head() method is used to get the top or bottom N rows of the DataFrame. When a positive number is used, it returns top N rows.
For negative numbers, it returns the rows except last N.
This function is mostly used for testing purposes to check if it contains the right data.
By default, without N value as param, it returns the top 5 rows.
# Default return 5 rows
print(df.head())
# Outputs
# Courses Fee Duration
#0 Spark 22000 30days
#1 Python 25000 50days
#2 Java 23000 30days
#3 C++ 22000 35days
#4 Hadoop 30000 40days
To get the top 3 rows, use value 3 for N param.
# Top N rows
print(df.head(3))
# Outputs
# Courses Fee Duration
#0 Spark 22000 30days
#1 Python 25000 50days
#2 Java 23000 30days
Use negative numbers for N to get the rows except for N rows from the last. You can also achieve the same result by using df[:n]. The below example ignores the last 3 records and returns the remaining.
# Except last N rows
print(df.head(-3))
# Outputs
# Courses Fee Duration
#0 Spark 22000 30days
#1 Python 25000 50days
#2 Java 23000 30days
#3 C++ 22000 35days
#4 Hadoop 30000 40days
#5 R 22000 45days
3. pandas Series.head() Example
pandas series.head() function is used to get the top N elements. When used negative integer it returns elements except for the last N. Since each column of a DataFrame is a Series, I will use one column from above DataFrame to explain.
By default, without N value as param, it returns the top 5 elements.
# head() example
print(df['Fee'].head())
# Outputs
#0 22000
#1 25000
#2 23000
#3 22000
#4 30000
#Name: Fee, dtype: int64
Top N elements from Series
# Top N elements from Series
print(df['Fee'].head(3))
#0 22000
#1 25000
#2 23000
#Name: Fee, dtype: int64
To get except the last N elements, use the negative N value.
# Except last n rows
print(df['Fee'].head(-3))
# Outputs
#0 22000
#1 25000
#2 23000
#3 22000
#4 30000
#Name: Fee, dtype: int64
Conclusion
In this article, you have learned the syntax and usage of the head() function with examples. Also learned head() is used to get the top N elements and it is available in both DataFrame and Series. This returns the same object as the caller.
Related Articles
- How to Iterate over DataFrame Rows in pandas
- Pandas Iterate Over Columns of DataFrame
- How to Rename Column(s) in pandas DataFrame
- Get First N Rows of Pandas DataFrame
- Pandas Groupby Aggregate Explained
- Pandas Get First Column of DataFrame as Series?
- Pandas Groupby Sort within Groups