DataFrame.head()
function is used to get the first N
rows of Pandas DataFrame. It allows an argument N to the method (which is the first n number of rows we want to get from the start). If the argument is not specified, this function returns the topmost 5 rows from the given DataFrame.
We can also use Pandas.iloc[:n]
to get the first n rows. iloc[]
is a property that is used to select rows and columns by position/index. If the position/index does not exist, it gives an index error. Pandas loc[] is another property that is used to operate on the column and row labels. For a better understanding of these two learn the differences and similarities between Pandas loc[] vs iloc[].
1. Quick Examples of Get First N Rows of DataFrame
If you hurry below are quick examples of how to get the first N Rows of DataFrame.
# Below are some quick examples
# Example 1: Get the first n rows of
# DataFrame using head()
print( df.head())
# Example 2: Get first n rows of DataFrame
# using head()
print( df.head(2))
# Example 3: Get the first n rows of specified columns
print(df[['Courses', 'Fee', 'Duration', 'Discount']].head(3))
# Example 4: Get first n rows using range index
print(df.iloc[:4])
# Example 5: Get first n rows and last n columns from DataFrame
print(df.iloc[:4, -4:])
# Example 6: Get first n rows using values[] attribute
print(df.values[:3])
Let’s create DataFrame using data from the Python dictionary and run the above examples to get the first row of DataFrame.
# Import pandas library
# Create pandas DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30day','40days','35days','40days','60days'],
'Discount':[1000,2300,1200,2500,2000]
}
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies, columns = ['Courses', 'Fee', 'Duration', 'Discount'], index = index_labels)
print("Create DataFrame:\n", df)
Yields below output.
2. Pandas Get the First N Rows of DataFrame using head()
When you want to extract only the top N rows after all your filtering and transformations from the Pandas DataFrame use the head() method. This function is used to get the top N rows from DataFrame or the top N elements from a Series. When using a negative number it returns all except the last N rows. When no param is used on head(), by default returns the top 5 rows.
# By default get first n rows of DataFrame
# Using head() function
df2 = df.head()
print("Get the first N rows of DataFrame:\n", df2)
Yield below output.
We can also customize the default param of the head() function to get the top N rows from the Pandas DataFrame.
# Get first n rows of DataFrame
df2 = df.head(2))
print("Get the first 2 rows of DataFrame:\n", df2)
# Output:
# Get the first 2 rows of DataFrame:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
2.1 Get the First N Rows of a Particular Column
If we want to get the value of the first n rows of selected columns, we can pass the columns into DataFrame [] notation and then call the head() function. It will return the first n rows based on specified columns.
# Get first n rows of specified columns
df2 = df[['Courses', 'Fee', 'Duration', 'Discount']].head(3)
print("Get the first 3 rows of DataFrame:\n", df2)
# Output:
# Get the first 3 rows of DataFrame:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
3. Get the First N Rows of Pandas using iloc[]
We can get the first N rows of Pandas DataFrame by providing an index range i.e.[:n]
to iloc[]
attribute. This syntax will select the rows from 0 to n and return the first n rows in the form of DataFrame. For example,
Related: You can use df.iloc[] attribute to get the first row of DataFrame and Last row of DataFrame.
# Get first n rows using range index
df2 = df.iloc[:4]
print("Get the first 4 rows of DataFrame:\n", df2)
# Output:
# Get the first 4 rows of DataFrame:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r4 Python 22000 40days 2500
Alternatively, we can get the first n rows of selected columns using the index range of the iloc[]
attribute.
# Get first row value using index range
df2 = df[['Courses', 'Fee', 'Duration', 'Discount']].iloc[:3]
print("Get the first 2 rows of DataFrame:\n", df2)
# Output:
# Get the first 2 rows of DataFrame:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
We can also get the first n records from the last n columns using the below syntax. This syntax will display the first n records for the last n columns.
# Get first n rows and last n columns from DataFrame
df2 = df.iloc[:4, -4:]
print("Get the first 4 rows of DataFrame:\n", df2)
Yields below output.
# Output:
# Get the first 4 rows of DataFrame:
Courses Fee Duration Discount
r1 Spark 20000 30day 1000
r2 PySpark 25000 40days 2300
r3 Hadoop 26000 35days 1200
r4 Python 22000 40days 2500
4. Get the First Row of Pandas using values[]
Pandas DataFrame.values
attribute is used to return a Numpy representation of the given DataFrame. Using this attribute we can get the first n rows of DataFrame in the form of a NumPy array. Let’s get the first n rows,
# Get first n rows using values[] attribute
df2 = df.values[:3]
print("Get the first N rows of DataFrame:\n", df2)
# Output:
# [['Spark' 20000 '30day' 1000]
# ['PySpark' 25000 '40days' 2300]
# ['Hadoop' 26000 '35days' 1200]]
5. Conclusion
In this article, I have explained how to get the first n rows of Pandas DataFrame using the head()
function. Also, learned how to get the rows by using iloc[]
and the values[]
.
Related Articles
- Pandas Difference Between loc[] vs iloc[]
- Pandas Select Multiple Columns in DataFrame
- Pandas iloc[] Usage with Examples
- Pandas Get Index from DataFrame
- How to append DataFrames using for loop?
- Get unique rows in Pandas DataFrame
- How to get row numbers in a Pandas DataFrame?
- How to drop first row from the Pandas DataFrame?
- How to add/insert row to Pandas DataFrame?
- Pandas get the number of rows from DataFrame
- Pandas Drop Rows Based on Column Value