Backup Node:

Backup Node in hadoop is an extended checkpoint node that performs checkpointing and also supports online streaming of file system edits.

The advantage over the checkpoint node is that the namespace (meta-data) present in it’s main memory is always in sync with primary namenode file system namespace, Since it maintains an in-memory, up-to-date copy of file system namespace and accepts a real time online stream of file system edits and applies these edits on its own copy of namespace in its main memory.

Thus, at any point of time, it maintains a latest backup of current file system namespace.

In Secondary Namenode and Checkpoint Node, checkpoints are created on their local files systems by downloading fsimage and edits log files from active primary namenode and merges these two files and new fsimage copy is saved on their local file systems. But unlike Secondary NameNode or Checkpoint Node, the Backup node does not need to download fsimage and edits files from the active NameNode to create a checkpoint, as it already has an up-to-date state of the namespace in it’s own main memory. So, creating checkpoint in backup node is just saving a copy of file system meta-data (namespace) from main-memory to its local files system.

So, obviously checkpoint creation in backup node will always be faster than that of in secondary namenode or checkpoint node.

As the Backup node keeps a copy of the namespace in main memory similar to NameNode, its main memory (hardware) specifications should be same as the NameNode.

Unlike Checkpoint nodes, there is only one Backup node is allowed to be registered with namenode at any time but multiple checkpoint nodes registration is possible. if a Backup node is in use, then there might not be need for checkpoint nodes and these may not be required to register with namenode.

Backup Node in hadoop can be started with below command on the dedicated node configured in the cluster.

1

2

$hdfs namenode-backup

Below two configuration variables are used for specifying the addresses of the Backup node and its web interface

dfs.namenode.backup.address

0.0.0.0:50100

The backup node server address and port. If the port is 0 then the server will start on a free port.

dfs.namenode.backup.http-address

0.0.0.0:50105

The backup node http server address and port. If the port is 0 then the server will start on a free port.

Note:One of the main advantage of a Backup node is that, it provides the option of running the NameNode with no persistent storage, delegating all responsibility for persisting the namespace to the Backup node.

To do this, NameNode needs to be started with below command and by not specifying edits directory dfs.namenode.edits.dir in hdfs-site.xml.

Post navigation

Review Comments

I have attended Siva’s Spark and Scala training. He is good in presentation skills and explaining technical concepts easily to everyone in the group. He is having excellent real time experience and provided enough use cases to understand each concepts. Duration of the course and time management is awesome. Happy that I found a right person on time to learn Spark. Thanks Siva!!!

DharmeswaranETL / Hadoop DeveloperSpark Nov 2016September 21, 2017

I really like your explanations.

Sylvain Nzeyanghadoop developer December/2016November 23, 2016

Siva , your teaching's are great and indeed very useful for the people who are interested in hadoop. Your sessions are more close to real-time and helps every one to get clear in interviews. Thanks for your support.

kalpana BhemireddyHadoop developerSpark jul/2016September 26, 2016

Course content is well structured. I like Siva's explanation of topics using slide decks & virtual machine (CDH cluster) at the same time,this will help audience to learn not only theory behind a topic but also practical aspect of it. Overall, I would recommend this course.

KumarBig Data DeveloperHadoop&Aug/2016September 26, 2016

Course content is well structured. I like Siva's explanation of topics using slide decks & virtual machine (CDH cluster) at the same time,this will help audience to learn not only theory behind a topic but also practical aspect of it. Overall, I would recommend this course.

KumarBig Data DeveloperHadoop&Aug/2016September 26, 2016

One of the best trainer is Siva Kumar, his way of communication and explantion superb,he teaches excellent as theratical and practically also,I suggest he is the Excellent trainer for Spark and Scala.

purushothamSr.Software EngineerSpark August/2016September 15, 2016

Here is 2 cents
1. Got More exercises and provide feedback. (also a final project)
2. Support (may be you need a part time person)

LexmanArchitectHadoop/SparkSeptember 13, 2016

Siva will give excellent training for Hadoop,spark. He has 4 years real time experience. His teaching is will go close to real time.

sriniwaasHadoop consultantJune 2016September 13, 2016

Excellent Training, classes were so interactive,I never got bored,Siva has Immense Knowledge in all the Hadoop tools.He explained everything so near to real-time . You can never find Hadoop course so pure in the market.

AkhilaHadoop DeveloperHadoop/sparkSeptember 13, 2016

Siva did an excellent job in explaining each topic patiently, gave many real-time examples
And he was really patient enough in answering each of our doubts,responds well in time when needed.
He has Immense knowledge in all the Hadoop/spark eco-system tools. Never felt bored in his classes he makes the classes so interactive
He has an excellent blog..got addicted to it.

AkhilaHadoop DeveloperHadoop/sparkSeptember 13, 2016

Spark and Hadoop course content is really apt for the beginners. Concept articulation gives clarity on the subject and recording are quite handy for reference. my request is to start an advance level course where it takes very close to real time feel