Hadoop – How To Get HDFS File Size(DU)

Hadoop HDFS DU File Size

Hadoop -du command is used to get the hdfs file and directory size. The size is the base size of the file or directory before replication. This shows the amount of space in bytes that have been used by the files that match the specified file pattern.Hadoop fs -du Command

Hadoop fs -du command displays the sizes of files and files contained in the given directory or the size of a file in case its just a file.


$ hadoop fs -du [-s] [-h] [-v] [-x] URI [URI] /HDFS-Directory 
or
$ hadoop fs -du [-s] [-h] [-v] [-x] URI [URI] /HDFS-Directory 

Options:

DU OptionsDescription
-sShow the size of each individual file that matches the pattern, show the total (summary) size.
-hUsed to format the sizes of the files in a human-readable manner rather than the number of bytes.
-vDisplay the names of columns as a header line.
-xExclude snapshots from the result calculation
Hadoop HDFS du Options

Related: Hadoop HDFS Commands with Examples

Hadoop fs -du Command Examples

Below are the examples of how to get the file and directory size using hadoop fs -du and hdfs dfs -du command with several options.


$ hadoop fs -du /tmp/
52         52          /tmp/data.txt
0          0           /tmp/export
0          0           /tmp/export_csv
283279596  2476986504  /tmp/hadoop-yarn
224        224         /tmp/hive

On above example, the data.txt file contains 52 characters hence it shows as 52 as size.

Example 1: shows the total (summary) size


$ hadoop fs -du -s /tmp/
283279872  2476986780  /tmp

Example 2: sizes of the files in a human-readable

The first column shows the actual raw size of the files that users have placed in the various HDFS directories. The second column shows the actual space consumed by those files in HDFS.


$ hadoop fs -du -h /tmp/
52       52     /tmp/data.txt
0        0      /tmp/export
0        0      /tmp/export_csv
270.2 M  2.3 G  /tmp/hadoop-yarn     <====== Shows in Mega & Giga bytes
224      224    /tmp/hive

Example 3: Display the names of columns as a header line

Displays the header on the output. header includes SIZE, DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS, FULL_PATH_NAME


[email protected]:~$ hadoop fs -du -v /tmp/
SIZE       DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS  FULL_PATH_NAME
52         52                                     /tmp/data.txt
0          0                                      /tmp/export
0          0                                      /tmp/export_csv
283279596  2476986504                             /tmp/hadoop-yarn
224        224                                    /tmp/hive
 

Example 4: Exclude snapshots from the result calculation


$ hadoop fs -du [-x] URI [URI]/HDFS-Directory 
or
$ hadoop fs -du  [-x] URI [URI] /HDFS-Directory 

Reference

Leave a Reply