• Post author:
  • Post category:Apache Hadoop
  • Post last modified:September 25, 2024
  • Reading time:24 mins read

Apache Hadoop hadoop fs or hdfs dfs are file system commands to interact with HDFS, these commands are very similar to Unix Commands. Note that some Syntax and output formats may differ between Unix and HDFS Commands.

Advertisements

Hadoop is a open-source distributed framework that is used to store and process a large set of datasets. To store data, Hadoop uses HDFS, and to process data, it uses MapReduce & Yarn. In this article, I will mainly focus on Hadoop HDFS commands to interact with the files.

Hadoop provides two types of commands to interact with File System; hadoop fs or hdfs dfs. Major difference being hadoop commands are supported with multiple file systems like S3, Azure and many more.

What is HDFS?

HDFS is a distributed file system that stores data on commodity machines and provides very high aggregate bandwidth across the cluster.

Once written you cannot change the contents of the files on HDFS. It’s a write once read many numbers of times.

Start Hadoop Services

In order to run hdfs dfs or hadoop fs commands, first, you need to start the Hadoop services by running the start-dfs.sh script from the Hadoop installation. If you don’t have a Hadoop setup, follow Apache Hadoop Installation on Linux guide.


ubuntu@namenode:~$ start-dfs.sh
Starting namenodes on [namenode.socal.rr.com]
Starting datanodes
Starting secondary namenodes [namenode]
ubuntu@namenode:~$

Note that start-dfs.sh commands starts, name node, secondary name node, and data nodes.

Basic HDFS DFS Commands

Below are basic hdfs dfs or hadoop fs Commands.

Command Description
-lsList files with permissions and other details
-mkdirCreates a directory named path in HDFS
-rmTo Remove File or a Directory
-rmrRemoves the file that identified by path / Folder and subfolders
-rmdirDelete a directory
-putUpload a file / Folder from the local disk to HDFS
-catDisplay the contents for a file
-duShows the size of the file on hdfs.
-dusDirectory/file of total size
-getStore file / Folder from HDFS to local file
-getmergeMerge Multiple Files in an HDFS
-countCount number of directory, number of files and file size
-setrepChanges the replication factor of a file
-mvHDFS Command to move files from source to destination
-moveFromLocalMove file / Folder from local disk to HDFS
-moveToLocalMove a File to HDFS from Local
-cpCopy files from source to destination
-tailDisplays last kilobyte of the file
-touchcreate, change and modify timestamps of a file
-touchzCreate a new file on HDFS with size 0 bytes
-appendToFileAppends the content to the file which is present on HDF
-copyFromLocalCopy file from local file system
-copyToLocal Copy files from HDFS to local file system
-usageReturn the Help for Individual Command
-checksumReturns the checksum information of a file
-chgrpChange group association of files/change the group of a file or a path
-chmodChange the permissions of a file
-chownchange the owner and group of a file
-dfDisplays free space
-headDisplays first kilobyte of the file
-Create Snapshots Create a snapshot of a snapshottable directory
-Delete SnapshotsDelete a snapshot of from a snapshottable directory
-Rename SnapshotsRename a snapshot
-expungecreate new checkpoint
-StatPrint statistics about the file/directory
-truncateTruncate all files that match the specified file pattern to the specified length
-findFind File Size in HDFS
HDFS Basic Commands

ls – List Files and Folder

HDFS ls command is used to display the list of Files and Directories in HDFS, This ls command shows the files with permissions, user, group, and other details. For more information follow ls- List Files and Folder


$hadoop fs -ls
or
$hdfs dfs -ls

mkdir – Make Directory

HDFS mkdir command is used to create a directory in HDFS. By default, this directory would be owned by the user who is creating it. By specifying “/” at the beginning it creates a folder at root directory.


$hadoop fs -mkdir /directory-name
or
$hdfs dfs -mkdir /directory-name 

rm – Remove File or Directory

HDFS rm command deletes a file and a directory from HDFS recursively.


$hadoop fs -rm /file-name
or
$hdfs dfs -rm /file-name

rmr – Remove Directory Recursively

Rmr command is used to deletes a file from Directory recursively, it is a very useful command when you want to delete a non-empty directory.


$hadoop fs -rmr /directory-name
or
$hdfs dfs -rmr /directory-name

rmdir – Delete a Directory

Rmdir command is used to removing directories only if they are empty.


$hadoop fs -rmdir /directory-name
or
$hdfs dfs -rmdir /directory-name

put – Upload a File to HDFS from Local

Copy file/folder from local disk to HDFS. On put command specifies the local-file-path where you wanted to copy from and then hdfs-file-path where you wanted to copy to on hdfs.


$ hadoop fs -put /local-file-path /hdfs-file-path
or
$ hdfs dfs -put /local-file-path /hdfs-file-path

cat – Displays the Content of the File

The cat command reads the specified file from HDFS and displays the content of the file on console or stdout.


$ hadoop fs -cat /hdfs-file-path
or 
$ hdfs dfs -cat /hdfs-file-path

du – File Occupied in Disk

Du command is used to How much file Occupied in the disk. The field is the base size of the file or directory before replication.


$ hadoop fs -du  /hdfs-file-path
or
$ hdfs dfs -du  /hdfs-file-path

dus – Directory/file of the total size

Dus command is used to will give the total size of directory/file.


$ hadoop fs -dus  /hdfs-directory 
or
$ hdfs dfs -dus  /hdfs-directory 

get – Copy the File from HDFS to Local

Get command is used to store filess from HDFS to the local file. HDFS file gets the local machine.


$ hadoop fs -get /local-file-path /hdfs-file-path
or
$ hdfs dfs -get /local-file-path /hdfs-file-path

getmerge – Merge Multiple Files in an HDFS

If you have multiple files in an HDFS, use -getmerge option command. All these multiple files merged into one single file and downloads to local file system.


$ hadoop fs -getmerge [-nl] /source /local-destination
or
$ hdfs dfs -getmerge [-nl] /source /local-destination

count – Number of Directory

The count command is used to count a number of directories, a number of files, and file size on HDFS.


$ hadoop fs -count /hdfs-file-path
or
$ hdfs dfs -count /hdfs-file-path

mv – Moves Files from Source to Destination

MV (move) command is used to move files from one location to another location in HDFS. Move command allows multiple sources as well in which case the destination needs to be a director.


$ hadoop fs -mv /local-file-path /hdfs-file-path
or
$ hdfs dfs -mv /local-file-path /hdfs-file-path

moveFromLocal – Move file / Folder from Local disk to HDFS

Similar to the put command, moveFromLocal moves the file or source from the local file path to the destination in the HDFS file path. After this command, you will not find the file on the local file system.


$ hadoop fs -moveFromLocal /local-file-path /hdfs-file-path
or
$ hdfs dfs -moveFromLocal /local-file-path /hdfs-file-path

moveToLocal – Move a File to HDFS from Local

Similar to the get command, moveToLocal moves the file or source from the HDFS file path to the destination in the local file path.


$ hadoop fs -moveToLocal /hdfs-file-path /local-file-path 
or
$ hdfs dfs -moveToLocal /hdfs-file-path /local-file-path 

Cp – Copy Files from Source to Destination

Copy File-one location to another location in HDFS. Copy files from source to destination, Copy command allows multiple sources as well in which case the destination must be a directory.


$ hadoop fs -cp /local-file-path /hdfs-file-path
or
$ hdfs dfs -cp /local-file-path /hdfs-file-path

setrep – Changes the Replication Factor of a File

This HDFS command is used to change the replication factor of a file. If the path is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at the path.


$ hadoop fs -setrep /number /file-name 
or
$ hdfs dfs -setrep /number /file-name 

tail – Displays Last Kilobyte of the File

Tail command is used to Display last kilobyte of the file to stdout.


$ hadoop fs -tail /hdfs-file-path
or
$ hdfs dfs -tail /hdfs-file-path

touch – Create and Modify Timestamps of a File

It is used to create a file without any content. The file created using the touch command is empty. updates the access and modification times of the file specified by the URI to the current time, the file does not exist then a zero-length file is created at URI with the current time as the timestamp of that URI.


$ hadoop fs -touch /hdfs-file-path
or
$ hdfs dfs -touch /hdfs-file-path

touchz – Create a File of zero Length

Create a new file on HDFS with size 0 bytes. create a file of zero length, an error is returned if the file exists with non-zero length.


$ hadoop fs -touchz /hdfs-file-path
or
$ hdfs dfs -touchz /hdfs-file-path

appendToFile – Appends the Content to the File

Appends the content to the file which is present on HDFS. Append single source. or multiple sources from the local file system to the destination file system. this command appends the contents of all the given local files to the provided destination file on the HDFS filesystem.


$ hadoop fs -appendToFile /hdfs-file-path
or
$ hdfs dfs -appendToFile /hdfs-file-path

copyFromLocal – Copy File from Local file System

Copying file from a local file to HDFS file system. Similar to the fs -put command and copyFromLocal command both are Store files from local disk to HDFS. Except that the source is restricted to a local file reference.


$ hadoop fs -copyToLocal /hdfs-file-path /local-file-path
or
$ hdfs dfs -copyToLocal  /hdfs-file-path /local-file-path

copyToLocal – Copy Files from HDFS to Local file System

Copying files from HDFS file to local file system. Similar to the fs -get command and copyToLocal command both are Store files from hdfs to local files. Except that the destination is restricted to a local file reference.


$ hadoop fs -copyToLocal /hdfs-file-path /local-file-path
or
$ hdfs dfs -copyToLocal  /hdfs-file-path /local-file-path

usage – Return the Help for Individual Command

Usage command is used to Provide you help for indidual commands.


$ hadoop fs -usage mkdir 
or
$ hdfs dfs -usage mkdir

checksum -Returns the Checksum Information of a File

The checksum command is used to Returns the Checksum Information of a File. Returns the checksum information of a file.


$ hadoop fs -checksum [-v] URI
or
$ hdfs dfs -checksum [-v] URI

chgrp – Change Group Association of Files

chgrg command is used to change the group of a file or a path. The user must be the owner of files, or else a super-user.


$ hadoop fs -chgrp [-R] groupname
or
$ hdfs dfs -chgrp [-R] groupname

chmod – Change the Permissions of a File

This command is used to change the permissions of a file. With -R Used to modify the files recursively and it is the only option that is being supported currently.


$ hadoop fs -chmod [-R] hdfs-file-path 
or
$ hdfs dfs -chmod [-R] hdfs-file-path

chown – Change the Owner and Group of a File

Chown command is used to change the owner and group of a file. This command is similar to the shell’s chown command with a few exceptions.


$ hadoop fs -chown [-R] [owner][:[group]] hdfs-file-path
or
$ hdfs dfs -chown [-R] [owner][:[group]] hdfs-file-path

df – Displays free Space

Df is the Displays free space. This command is used to show the capacity, free and used space available on the HDFS filesystem. Used to format the sizes of the files in a human-readable manner rather than the number of bytes.


$ hadoop fs -df /user/hadoop/dir1
or
$ hdfs dfs -df /user/hadoop/dir1

head – Displays first Kilobyte of the File

Head command is use to Displays first kilobyte of the file to stdout.


$ hadoop fs -head /hdfs-file-path
or
$ hdfs dfs -head /hdfs-file-path

createSnapshots – Create Snapshottable Directory

This operation requires owner privilege of the snapshot table directory. The path of the snapshot table directory, snapshot name is The snapshot name a default name is generated using a timestamp.


$ hadoop fs -createSnapshot /path /snapshotName
or
$ hdfs dfs -createSnapshot /path /snapshotName

deleteSnapshots – Delect Snapshottable Directory

This operation requires owner privilege of the snapshot table directory. The path of the snapshot table directory, snapshot name is The snapshot name.


$ hadoop fs -deleteSnapshot /path /snapshotName
or
$ hdfs dfs -deleteSnapshot /path /snapshotName

renameSnapshots – Rename a Snapshot

This operation requires owner privilege of the snapshottable directory.


$ hadoop fs -renameSnapshot /path /oldName /newName
or
$ hdfs dfs  -renameSnapshot /path /oldName /newName

expunge – Create New Checkpoint

This command is used to empty the trash available in an HDFS system. Permanently delete files in checkpoints older than the retention threshold from the trash directory.


$ hadoop fs –expunge -immediate -fs  /hdfs-file-path
or
$ hdfs dfs –expunge -immediate -fs  /hdfs-file-path

Stat – File/Directory Print Statistics

This command is used to print the statistics about the file/directory in the specified format. Print statistics about the file/directory at in the specified format.


$ hadoop fs -stat /format
or
$ hdfs dfs -stat /format

Truncate – Specified File Pattern and Length

Truncate all files that match the specified file pattern to the specified length.


$ hadoop fs -truncate [-w] /length /hdfs-file-path
or
$ hdfs dfs -truncate [-w] /length /hdfs-file-path

Find – Find File Size in HDFS

In Hadoop, hdfs dfs -find or hadoop fs -find commands are used to get the size of a single file or size for all files specified in an expression or in a directory. By default, it points to the current directory when the path is not specified.


$hadoop fs -find / -name test -print
or
$hdfs dfs -find / -name test -print