DataFrame.head()
function is used to get the first N
rows of Pandas DataFrame. It allows an argument N to the method (which is the first n number of rows we want to get from the start). If the argument is not specified, this function returns the topmost 5 rows from the given DataFrame.
We can also use Pandas.iloc[:n]
to get the first n rows. iloc[]
is a property that is used to select rows and columns by position/index. If the position/index does not exist, it gives an index error. Pandas loc[] is another property that is used to operate on the column and row labels. For a better understanding of these two learn the differences and similarities between Pandas loc[] vs iloc[].
1. Quick Examples of Get First N Rows of DataFrame
If you hurry below are quick examples of how to get the first N Rows of DataFrame.
# Below are some quick examples
# Example 1: Get the first n rows of
# DataFrame using head()
print( df.head())
# Example 2: Get first n rows of DataFrame
# using head()
print( df.head(2))
# Example 3: Get the first n rows of specified columns
print(df[['Courses', 'Fee', 'Duration', 'Discount']].head(3))
# Example 4: Get first n rows using range index
print(df.iloc[:4])
# Example 5: Get first n rows and last n columns from DataFrame
print(df.iloc[:4, -4:])
# Example 6: Get first n rows using values[] attribute
print(df.values[:3])
Let’s create DataFrame using data from the Python dictionary and run the above examples to get the first row of DataFrame.
# Import pandas library
# Create pandas DataFrame
import pandas as pd
technologies = {
'Courses':["Spark","PySpark","Hadoop","Python","pandas"],
'Fee' :[20000,25000,26000,22000,24000],
'Duration':['30day','40days','35days','40days','60days'],
'Discount':[1000,2300,1200,2500,2000]
}
index_labels=['r1','r2','r3','r4','r5']
df = pd.DataFrame(technologies, columns = ['Courses', 'Fee', 'Duration', 'Discount'], index = index_labels)
print(df)
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r4 Python 22000 40days 2500
# r5 pandas 24000 60days 2000
2. Pandas Get the First N Rows of DataFrame using head()
When you wanted to extract only the top N rows after all your filtering and transformations from the Pandas DataFrame use the head() method. This function is used to get the top N rows from DataFrame or the top N elements from a Series. When using a negative number it returns all except the last N rows. When no param is used on head(), by default returns the top 5 rows.
# By default get first n rows of DataFrame
# Using head() function
print( df.head())
Yield below output.
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30day 1000
r2 PySpark 25000 40days 2300
r3 Hadoop 26000 35days 1200
r4 Python 22000 40days 2500
r5 pandas 24000 60days 2000
We can also customize the default param of the head() function to get the top N rows from the Pandas DataFrame.
# Get first n rows of DataFrame
print( df.head(2))
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
2.1 Get the First N Rows of a Particular Column
If we want to get the value of the first n rows of selected columns, we can pass the columns into DataFrame [] notation and then call the head() function. It will return the first n rows based on specified columns.
# Get first n rows of specified columns
print(df[['Courses', 'Fee', 'Duration', 'Discount']].head(3))
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
3. Get the First N Rows of Pandas using iloc[]
We can get the first N rows of Pandas DataFrame by providing an index range i.e.[:n]
to iloc[]
attribute. This syntax will select the rows from 0 to n and returns the first n rows in the form of DataFrame. For example,
Related: You can use df.iloc[] attribute to get the first row of DataFrame and Last row of DataFrame.
# Get first n rows using range index
print(df.iloc[:4])
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
# r3 Hadoop 26000 35days 1200
# r4 Python 22000 40days 2500
Alternatively, we can get the first n rows of selected columns using the index range of the iloc[]
attribute.
# Get first row value using index range
print(df[['Courses', 'Fee', 'Duration', 'Discount']].iloc[:3])
# Output:
# Courses Fee Duration Discount
# r1 Spark 20000 30day 1000
# r2 PySpark 25000 40days 2300
We can also get the first n records from the last n columns using the below syntax. This syntax will display the first n records for the last n columns.
# Get first n rows and last n columns from DataFrame
print(df.iloc[:4, -4:])
Yields below output.
# Output:
Courses Fee Duration Discount
r1 Spark 20000 30day 1000
r2 PySpark 25000 40days 2300
r3 Hadoop 26000 35days 1200
r4 Python 22000 40days 2500
4. Get the First Row of Pandas using values[]
Pandas DataFrame.values
attribute is used to return a Numpy representation of the given DataFrame. Using this attribute we can get the first n rows of DataFrame in the form of a NumPy array. Let’s get the first n rows,
# Get first n rows using values[] attribute
print(df.values[:3])
# Output:
# [['Spark' 20000 '30day' 1000]
# ['PySpark' 25000 '40days' 2300]
# ['Hadoop' 26000 '35days' 1200]]
5. Conclusion
In this article, I have explained how to get the first n rows of Pandas DataFrame using the head()
function. Also, learned how to get the rows by using iloc[] and the values[].
Related Articles
- Pandas Difference Between loc[] vs iloc[]
- Python Dictionary Values()
- Apache Kafka Producer and Consumer in Scala
- Pandas Select Multiple Columns in DataFrame
- Pandas iloc[] Usage with Examples
- Pandas Get Index from DataFrame
- How to get first row of DataFrame Pandas ?
- How to get last row of Pandas DataFrame?
- How to append DataFrames using for loop?
- Get unique rows in Pandas DataFrame
- How to get row numbers in a Pandas DataFrame?
- How to drop first row from the Pandas DataFrame?