16 Hadoop fs Commands Every Data Engineer Must Know

Commands in Hadoop

The Hadoop shell is the CLI for the Hadoop cluster. Most of the time Hadoop Administrators will find themselves using the Hadoop CLI just as much as the HDP, Ambari, or CDH management interface. Learning how to navigate and run commands in the Hadoop Shell is essential for any Data Engineer. Whether you need to move data in HDFS or change the configuration file of the cluster all of these task can be done from the Hadoop Shell. Get ready to learn the 16 commands every Data Engineer must know.

Hadoop fs vs. HDFS dfs

Hadoop shell commands are shown as Hadoop fs or HDFS dfs commands. The HDFS dfs commands were developed to differentiate when the Hadoop’s file system HDFS was being used. However, there is a good bit of cross over between the two commands and many administrators use both Hadoop fs and HDFS dfs. Here is a breakdown of the HDFS DFS commands.

Hadoop fs Basics

Running Hadoop fs commands use the URL of the namenode if not running local. Locally the reason the full URL(hdfs://<ip or dns name>:50070) isn’t required is because the URL is set in the HDFS configuration file. When logged in to the Hadoop cluster you can just type Hadoop fs – <command> and the Hadoop configuration file will direct all commands to the default URL. For commands outside of the cluster make sure to use the full URL.

Hadoop fs Commands

For the commands below I will use a data from NASA’s Data World public posting. If you haven’t used the Data.World site then make sure you check it out for great data sets.

Hadoop Commands General Task

Hadoop fs – ls – List out files and directories in the URI path. Just like the UNIX version on ls.

1

2

3

4

Hadoop fs-ls

....

-rw-r--r--1nasa hdfs49781512017-07-1820:23/user/nasa/meteorite.csv

drwxr-xr-x-nasa hdfs02017-07-0820:08/user/nasa/test

Hadoop fs -mkdir – Create new a directory in the corresponding directory path.

1

2

3

Hadoop fs-mkdir/user/nasa/results

....

creates newdirectory results inuser/nasa directory

Hadoop fs -touchz – Creates an empty file taking up no space. Primarily used when testing permission and new directories to create a file quickly.

1

2

3

hadoop fs-touchz/user/nasa/results/file

....

Creates file inside the nasa/result directory.

Hadoop Commands for Reading Files

Hadoop fs -help – Display help for commands or list of commands available in the Hadoop shell.

Learn More Commands

Did you enjoy my list of Hadoop fs commands? Be sure to let me know any commands I left off the list. If you would like to continue to learn more about Hadoop and Data Engineering be sure to sign up for the thomashenson.com email list.