• Post author:
  • Post category:Pandas
  • Post last modified:March 27, 2024
  • Reading time:7 mins read
You are currently viewing pandas head() – Returns Top N Rows

pandas header() function is used to get the top N rows from DataFrame or top N elements from a Series. When used negative number it returns all except the last N rows. This function is mainly used for testing to check if the object contains the right type of Data.

Advertisements

When you wanted to extract only top N rows after all your filtering and transformations, you can use the head() method, which is defined in the Pandas library.

pandas head() Key Points –

  • Reurns top N elements.
  • head() function exists in Series and DataFrame.
  • When no param is used on head(), by default returns top 5 rows.
  • Use negative number used, it ignores the last N rows.
  • It returns the same object as caller.

1. head() Syntax

Following is the syntax of the head() method of the DataFrame and Series.


# Syntax of head() method
DataFrame.head(n)
Series.head(n)

Let’s create a DataFrame with Dict


# Create a pandas DataFrame.
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Courses' :['Spark','Python','Java','C++','Hadoop','R','C#','AWS'],
    'Fee' :[22000,25000,23000,22000,30000,22000,32000,40000],
    'Duration':['30days','50days','30days','35days','40days','45days','50days','60days']
          })
print(df)

# Output:
#  Courses    Fee Duration
# 0   Spark  22000   30days
# 1  Python  25000   50days
# 2    Java  23000   30days
# 3     C++  22000   35days
# 4  Hadoop  30000   40days
# 5       R  22000   45days
# 6      C#  32000   50days
# 7     AWS  40000   60days

2. DataFrame.head() Example

pandas DataFrame.head() method is used to get the top or bottom N rows of the DataFrame. When a positive number is used, it returns top N rows.

For negative numbers, it returns the rows except last N.

This function is mostly used for testing purposes to check if it contains the right data.

By default, without N value as param, it returns the top 5 rows.


# Default return 5 rows
print(df.head())

# Output:
#  Courses    Fee Duration
# 0   Spark  22000   30days
# 1  Python  25000   50days
# 2    Java  23000   30days
# 3     C++  22000   35days
# 4  Hadoop  30000   40days

To get the top 3 rows, use value 3 for N param.


# Top N rows
print(df.head(3))

# Output:
#  Courses    Fee Duration
# 0   Spark  22000   30days
# 1  Python  25000   50days
# 2    Java  23000   30days

Use negative numbers for N to get the rows except for N rows from the last. You can also achieve the same result by using df[:n]. The below example ignores the last 3 records and returns the remaining.


# Except last N rows
print(df.head(-3))

# Output:
#  Courses    Fee Duration
# 0   Spark  22000   30days
# 1  Python  25000   50days
# 2    Java  23000   30days
# 3     C++  22000   35days
# 4  Hadoop  30000   40days
# 5       R  22000   45days

3. pandas Series.head() Example

pandas series.head() function is used to get the top N elements. When used negative integer it returns elements except for the last N. Since each column of a DataFrame is a Series, I will use one column from above DataFrame to explain.

By default, without N value as param, it returns the top 5 elements.


# head() example
print(df['Fee'].head())

# Output:
# 0    22000
# 1    25000
# 2    23000
# 3    22000
# 4    30000
# Name: Fee, dtype: int64

Top N elements from Series


# Top N elements from Series
print(df['Fee'].head(3))

# Output:
# 0    22000
# 1    25000
# 2    23000
# Name: Fee, dtype: int64

To get except the last N elements, use the negative N value.


# Except last n rows
print(df['Fee'].head(-3))

# Output:
# 0    22000
# 1    25000
# 2    23000
# 3    22000
# 4    30000
# Name: Fee, dtype: int64

Conclusion

In this article, you have learned the syntax and usage of the head() function with examples. Also learned head() is used to get the top N elements and it is available in both DataFrame and Series. This returns the same object as the caller.

References

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium