Hadoop Count Command – Returns HDFS File Size and File Counts

Hadoop HDFS File Count

Hadoop HDFS count option is used to count a number of directories, number of files, number of characters in a file and file size. Below is a quick example how to use count command.


$ hadoop fs -count /hdfs-file-path
or
$ hdfs dfs -count /hdfs-file-path
hadoop hdfs file size

On above screenshot command hadoop fs -count /tmp/data.txt, returns 0 1 52 (0 – directory, 1- file count , 52 – character count in a data.txt). Below example demonstrates using -count on a directory.

hadoop hdfs file size

The /data directory contains 2 files hence it returns 1 2 775 ( 1- directory, 2- Files and 775 characters in 2 files). If you have a sub directories, this command returns count of all files with in a subdirectories as well.

Related: Hadoop HDFS Commands with Examples

Now let’s check other count options.

Hadoop fs -count Option

The hadoop fs shell option count returns the number of directories, number of files and a number of file bytes under the paths that match the specified file pattern.

hadoop fs -count Option gives following information. Alternatively you can also use hdfs dfs -count

  • Directory count
  • File count
  • Content size
  • Filename

$ hadoop fs -count [-q] [-u] [-t] [-h] [-v] [-x] [-e] /hdfs-file-path
or
$ hdfs dfs -count [-q] [-u] [-t] [-h] [-v] [-x] [-e] /hdfs-file-path

Options:

HDFS Count Options Description
-qShows quotas QUOTA, REMAINING_QUOTA, DIR_COUNT, SPACE_QUOTA, FILE_COUNT, CONTENT_SIZE, REMAINING_SPACE_QUOTA, PATHNAME
-uLimits the output to show quotas usage only. QUOTA, REMAINING_QUOTA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, PATHNAME
-tShows the quota and usage for each storage type. “all”, “ram_disk”, “SSD”, “disk” or “archive”.
-h Shows sizes in a human-readable format.
-vDisplays header line.
-xExcludes snapshots from the result calculation.
-eShows the erasure coding policy for each file. DIR_COUNT, FILE_COUNT, CONTENT_SIZE, ERASURECODING_POLICY, PATHNAME
Hadoop HDFS File count Options

Hadoop fs -count Options Examples

Below are the examples of how to use hadoop hdfs count with several options.

Example 1: Shows Quotas

The quota is the hard limit on the number of names and the amount of space used for individual directories.


$ hadoop fs -count -q /hdfs-file-path
or
$ hdfs dfs -count -q /hdfs-file-path

Example 2: Limits the Output to Show Quotas and Usage only


$ hadoop fs -count -u /hdfs-file-path
or
$ hdfs dfs -count -u /hdfs-file-path

Example 3: Shows the Quota and Usage for Each Storage Type

-f shows the quota and usage for each storage type.


$ hadoop fs -count -t /hdfs-file-path
or
$ hdfs dfs -count -t /hdfs-file-path

Example 4: Shows Sizes in a Human-Readable Format

-h shows the file sizes in human readable format (M – for Mega byte, G – for Giga bytes e.t.c)


$ hadoop fs -count -h /user
   62          232            216.9 M /user

Example 5: Displays Header Line for command output

Displays header line which includes (DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME)


$ hadoop fs -count -v /tmp/data.txt
   DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
           0            1                 52 /tmp/data.txt

Example 6 : Excludes Snapshots from the Result Calculation

Excludes snapshots from the result. It always calculated from all Nodes.


$ hadoop fs -count -x /hdfs-file-path
or
$ hdfs dfs -count -x /hdfs-file-path

Example 7: Shows the Erasure Coding Policy

Shows details with replicated.


$ hadoop fs -count -e /tmp/data.txt
           0            1                 52 Replicated /tmp/data.txt

Conclusion

Reference

Leave a Reply