Hadoop Count Command – Returns HDFS File Size and File Counts

  • Post author:
  • Post category:Apache Hadoop
  • Post last modified:October 5, 2023
  • Reading time:7 mins read

Hadoop HDFS count option is used to count a number of directories, number of files, number of characters in a file and file size. Below is a quick example how to use count command.


$ hadoop fs -count /hdfs-file-path
or
$ hdfs dfs -count /hdfs-file-path

On above screenshot command hadoop fs -count /tmp/data.txt, returns 0 1 52 (0 – directory, 1- file count , 52 – character count in a data.txt). Below example demonstrates using -count on a directory.

The /data directory contains 2 files hence it returns 1 2 775 ( 1- directory, 2- Files and 775 characters in 2 files). If you have a sub directories, this command returns count of all files with in a subdirectories as well.

Related: Hadoop HDFS Commands with Examples

Now let’s check other count options.

Hadoop fs -count Option

The hadoop fs shell option count returns the number of directories, number of files and a number of file bytes under the paths that match the specified file pattern.

hadoop fs -count Option gives following information. Alternatively you can also use hdfs dfs -count

  • Directory count
  • File count
  • Content size
  • Filename

$ hadoop fs -count [-q] [-u] [-t] [-h] [-v] [-x] [-e] /hdfs-file-path
or
$ hdfs dfs -count [-q] [-u] [-t] [-h] [-v] [-x] [-e] /hdfs-file-path

Options:

HDFS Count Options Description
-q Shows quotas QUOTA, REMAINING_QUOTA, DIR_COUNT, SPACE_QUOTA, FILE_COUNT, CONTENT_SIZE, REMAINING_SPACE_QUOTA, PATHNAME
-uLimits the output to show quotas usage only. QUOTA, REMAINING_QUOTA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, PATHNAME
-t Shows the quota and usage for each storage type. “all”, “ram_disk”, “SSD”, “disk” or “archive”.
-h Shows sizes in a human-readable format.
-vDisplays header line.
-xExcludes snapshots from the result calculation.
-eShows the erasure coding policy for each file. DIR_COUNT, FILE_COUNT, CONTENT_SIZE, ERASURECODING_POLICY, PATHNAME
Hadoop HDFS File count Options

Hadoop fs -count Options Examples

Below are the examples of how to use hadoop hdfs count with several options.

Example 1: Shows Quotas

The quota is the hard limit on the number of names and the amount of space used for individual directories.


$ hadoop fs -count -q /hdfs-file-path
or
$ hdfs dfs -count -q /hdfs-file-path

Example 2: Limits the Output to Show Quotas and Usage only


$ hadoop fs -count -u /hdfs-file-path
or
$ hdfs dfs -count -u /hdfs-file-path

Example 3: Shows the Quota and Usage for Each Storage Type

-f shows the quota and usage for each storage type.


$ hadoop fs -count -t /hdfs-file-path
or
$ hdfs dfs -count -t /hdfs-file-path

Example 4: Shows Sizes in a Human-Readable Format

-h shows the file sizes in human readable format (M – for Mega byte, G – for Giga bytes e.t.c)


$ hadoop fs -count -h /user
   62          232            216.9 M /user

Example 5: Displays Header Line for command output

Displays header line which includes (DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME)


$ hadoop fs -count -v /tmp/data.txt
   DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
           0            1                 52 /tmp/data.txt

Example 6 : Excludes Snapshots from the Result Calculation

Excludes snapshots from the result. It always calculated from all Nodes.


$ hadoop fs -count -x /hdfs-file-path
or
$ hdfs dfs -count -x /hdfs-file-path

Example 7: Shows the Erasure Coding Policy

Shows details with replicated.


$ hadoop fs -count -e /tmp/data.txt
           0            1                 52 Replicated /tmp/data.txt

Naveen

Naveen (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn LinkedIn

Leave a Reply