Site icon Spark By {Examples}

pandas head() – Returns Top N Rows

pandas head

pandas header() function is used to get the top N rows from DataFrame or top N elements from a Series. When used negative number it returns all except the last N rows. This function is mainly used for testing to check if the object contains the right type of Data.

When you wanted to extract only top N rows after all your filtering and transformations, you can use the head() method, which is defined in the Pandas library.

pandas head() Key Points –

1. head() Syntax

Following is the syntax of the head() method of the DataFrame and Series.


# Syntax of head() method
DataFrame.head(n)
Series.head(n)

Let’s create a DataFrame with Dict


# Create a pandas DataFrame.
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Courses' :['Spark','Python','Java','C++','Hadoop','R','C#','AWS'],
    'Fee' :[22000,25000,23000,22000,30000,22000,32000,40000],
    'Duration':['30days','50days','30days','35days','40days','45days','50days','60days']
          })
print(df)

# Output:
#  Courses    Fee Duration
# 0   Spark  22000   30days
# 1  Python  25000   50days
# 2    Java  23000   30days
# 3     C++  22000   35days
# 4  Hadoop  30000   40days
# 5       R  22000   45days
# 6      C#  32000   50days
# 7     AWS  40000   60days

2. DataFrame.head() Example

pandas DataFrame.head() method is used to get the top or bottom N rows of the DataFrame. When a positive number is used, it returns top N rows.

For negative numbers, it returns the rows except last N.

This function is mostly used for testing purposes to check if it contains the right data.

By default, without N value as param, it returns the top 5 rows.


# Default return 5 rows
print(df.head())

# Output:
#  Courses    Fee Duration
# 0   Spark  22000   30days
# 1  Python  25000   50days
# 2    Java  23000   30days
# 3     C++  22000   35days
# 4  Hadoop  30000   40days

To get the top 3 rows, use value 3 for N param.


# Top N rows
print(df.head(3))

# Output:
#  Courses    Fee Duration
# 0   Spark  22000   30days
# 1  Python  25000   50days
# 2    Java  23000   30days

Use negative numbers for N to get the rows except for N rows from the last. You can also achieve the same result by using df[:n]. The below example ignores the last 3 records and returns the remaining.


# Except last N rows
print(df.head(-3))

# Output:
#  Courses    Fee Duration
# 0   Spark  22000   30days
# 1  Python  25000   50days
# 2    Java  23000   30days
# 3     C++  22000   35days
# 4  Hadoop  30000   40days
# 5       R  22000   45days

3. pandas Series.head() Example

pandas series.head() function is used to get the top N elements. When used negative integer it returns elements except for the last N. Since each column of a DataFrame is a Series, I will use one column from above DataFrame to explain.

By default, without N value as param, it returns the top 5 elements.


# head() example
print(df['Fee'].head())

# Output:
# 0    22000
# 1    25000
# 2    23000
# 3    22000
# 4    30000
# Name: Fee, dtype: int64

Top N elements from Series


# Top N elements from Series
print(df['Fee'].head(3))

# Output:
# 0    22000
# 1    25000
# 2    23000
# Name: Fee, dtype: int64

To get except the last N elements, use the negative N value.


# Except last n rows
print(df['Fee'].head(-3))

# Output:
# 0    22000
# 1    25000
# 2    23000
# 3    22000
# 4    30000
# Name: Fee, dtype: int64

Conclusion

In this article, you have learned the syntax and usage of the head() function with examples. Also learned head() is used to get the top N elements and it is available in both DataFrame and Series. This returns the same object as the caller.

References

Exit mobile version