Hadoop - How To Get HDFS File Size(DU)

| *** Please Subscribe for Ad Free & Premium Content ***

Post author:Malli
Post category:Apache Hadoop
Post last modified:October 5, 2023
Reading time:6 mins read

Hadoop -du command is used to get the hdfs file and directory size. The size is the base size of the file or directory before replication. This shows the amount of space in bytes that have been used by the files that match the specified file pattern.Hadoop fs -du Command

Options:

DU Options	Description
-s	Show the size of each individual file that matches the pattern, show the total (summary) size.
-h	Used to format the sizes of the files in a human-readable manner rather than the number of bytes.
-v	Display the names of columns as a header line.
-x	Exclude snapshots from the result calculation

Hadoop HDFS du Options

Hadoop fs -du Command Examples

Below are the examples of how to get the file and directory size using hadoop fs -du and hdfs dfs -du command with several options.


$ hadoop fs -du /tmp/
52         52          /tmp/data.txt
0          0           /tmp/export
0          0           /tmp/export_csv
283279596  2476986504  /tmp/hadoop-yarn
224        224         /tmp/hive

On above example, the data.txt file contains 52 characters hence it shows as 52 as size.

Example 1: shows the total (summary) size


$ hadoop fs -du -s /tmp/
283279872  2476986780  /tmp

Example 2: sizes of the files in a human-readable

The first column shows the actual raw size of the files that users have placed in the various HDFS directories. The second column shows the actual space consumed by those files in HDFS.


$ hadoop fs -du -h /tmp/
52       52     /tmp/data.txt
0        0      /tmp/export
0        0      /tmp/export_csv
270.2 M  2.3 G  /tmp/hadoop-yarn     <====== Shows in Mega & Giga bytes
224      224    /tmp/hive

Example 3: Display the names of columns as a header line

Displays the header on the output. header includes SIZE, DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS, FULL_PATH_NAME


prabha@namenode:~$ hadoop fs -du -v /tmp/
SIZE       DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS  FULL_PATH_NAME
52         52                                     /tmp/data.txt
0          0                                      /tmp/export
0          0                                      /tmp/export_csv
283279596  2476986504                             /tmp/hadoop-yarn
224        224                                    /tmp/hive

Example 4: Exclude snapshots from the result calculation


$ hadoop fs -du [-x] URI [URI]/HDFS-Directory 
or
$ hadoop fs -du  [-x] URI [URI] /HDFS-Directory

Reference

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#du

Tags: HDFS DU Command