Get First N Rows of Pandas DataFrame

DataFrame.head() function is used to get the first N rows of Pandas DataFrame. It allows an argument N to the method (which is the first n number of rows we want to get from the start). If the argument is not specified, this function returns the topmost 5 rows from the given DataFrame.

1. Quick Examples of Get First N Rows of DataFrame

If you hurry below are quick examples of how to get the first N Rows of DataFrame.


# Below are some quick examples

# Example 1: Get the first n rows of 
# DataFrame using head()
print( df.head())

# Example 2: Get first n rows of DataFrame
# using head() 
print( df.head(2))

# Example 3: Get the first n rows of specified columns 
print(df[['Courses', 'Fee', 'Duration', 'Discount']].head(3))

# Example 4: Get first n rows using range index
print(df.iloc[:4]) 

# Example 5: Get first n rows and last n columns from DataFrame
print(df.iloc[:4, -4:])

# Example 6: Get first n rows using values[] attribute
print(df.values[:3])

Let’s create DataFrame using data from the Python dictionary and run the above examples to get the first row of DataFrame.


# Import pandas library
# Create pandas DataFrame
import pandas as pd
technologies = {
    'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
    'Fee' :[20000,25000,26000,22000,24000],
    'Duration':['30day','40days','35days','40days','60days'],
    'Discount':[1000,2300,1200,2500,2000]
              }
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies, columns = ['Courses', 'Fee', 'Duration', 'Discount'], index = index_labels)
print("Create DataFrame:\n", df)

Yields below output.

2. Pandas Get the First N Rows of DataFrame using head()

When you want to extract only the top N rows after all your filtering and transformations from the Pandas DataFrame use the head() method. This function is used to get the top N rows from DataFrame or the top N elements from a Series. When using a negative number it returns all except the last N rows. When no param is used on head(), by default returns the top 5 rows.


# By default get first n rows of DataFrame
# Using head() function
df2 = df.head()
print("Get the first N rows of DataFrame:\n", df2)

Yield below output.

We can also customize the default param of the head() function to get the top N rows from the Pandas DataFrame.


# Get first n rows of DataFrame  
df2 = df.head(2))
print("Get the first 2 rows of DataFrame:\n", df2)

# Output:
# Get the first 2 rows of DataFrame:
#     Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300

2.1 Get the First N Rows of a Particular Column

If we want to get the value of the first n rows of selected columns, we can pass the columns into DataFrame [] notation and then call the head() function. It will return the first n rows based on specified columns.


# Get first n rows of specified columns 
df2 = df[['Courses', 'Fee', 'Duration', 'Discount']].head(3)
print("Get the first 3 rows of DataFrame:\n", df2)

# Output:
# Get the first 3 rows of DataFrame:
#     Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200

3. Get the First N Rows of Pandas using iloc[]

We can get the first N rows of Pandas DataFrame by providing an index range i.e.[:n] to iloc[] attribute. This syntax will select the rows from 0 to n and return the first n rows in the form of DataFrame. For example,

Related: You can use df.iloc[] attribute to get the first row of DataFrame and Last row of DataFrame.


# Get first n rows using range index
df2 = df.iloc[:4]
print("Get the first 4 rows of DataFrame:\n", df2)

# Output:
# Get the first 4 rows of DataFrame:
#     Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300
# r3   Hadoop  26000   35days      1200
# r4   Python  22000   40days      2500

Alternatively, we can get the first n rows of selected columns using the index range of the iloc[] attribute.


#  Get first row value using index range
df2 = df[['Courses', 'Fee', 'Duration', 'Discount']].iloc[:3]
print("Get the first 2 rows of DataFrame:\n", df2)

# Output:
# Get the first 2 rows of DataFrame:
#     Courses    Fee Duration  Discount
# r1    Spark  20000    30day      1000
# r2  PySpark  25000   40days      2300

We can also get the first n records from the last n columns using the below syntax. This syntax will display the first n records for the last n columns.


# Get first n rows and last n columns from DataFrame
df2 = df.iloc[:4, -4:]
print("Get the first 4 rows of DataFrame:\n", df2)

Yields below output.


# Output:
# Get the first 4 rows of DataFrame:
    Courses    Fee Duration  Discount
r1    Spark  20000    30day      1000
r2  PySpark  25000   40days      2300
r3   Hadoop  26000   35days      1200
r4   Python  22000   40days      2500

4. Get the First Row of Pandas using values[]

Pandas DataFrame.values attribute is used to return a Numpy representation of the given DataFrame. Using this attribute we can get the first n rows of DataFrame in the form of a NumPy array. Let’s get the first n rows,


# Get first n rows using values[] attribute
df2 = df.values[:3]
print("Get the first N rows of DataFrame:\n", df2)

# Output:
# [['Spark' 20000 '30day' 1000]
# ['PySpark' 25000 '40days' 2300]
# ['Hadoop' 26000 '35days' 1200]]

5. Conclusion

In this article, I have explained how to get the first n rows of Pandas DataFrame using the head() function. Also, learned how to get the rows by using iloc[] and the values[].

References

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html