Data Nodes role in the cluster
Data Nodes play an important role in HDFS since it is in data nodes that the actual information is present.
- The data nodes can communicate with the Name Node; they can send block reports and heartbeats to the Name Node.
- The data nodes can communicate with the client code and also with the other data nodes present in the cluster.
- Here are some of the functions of the data nodes:
- The data nodes can allow clients to read the blocks from them. When the client is reading the blocks they don’t go through the Name Node; the clients can directly go to the data nodes and read the blocks.
- The data nodes allow clients to write data to them.
- Data nodes can copy blocks, according to instructions from Name Node from one directory to another. The data nodes can also delete blocks as per the instructions from the Name Node. Hence, data nodes follow instructions of the Name node.
- On start up the data nodes send block reports to the Name node; block report is nothing but the information regarding the data blocks that the particular name node contains. Depending on this information, the Name node can calculate whether the blocks were under replicated or over replicated.
- Data nodes can also receive blocks from other data nodes during the pipeline process.
These are some of the functions of the data node.
- During the start-up of the data node, it will first register to the Name node. Then it will send some heartbeats, signals and block reports to the name node.
- The heartbeat signal is an indication that is sent to the Name node regularly, at some prescribed time interval to indicate that that particular data node is alive and working. In response to these heartbeat signals, the Name node will send some commands back to the data node, which are then executed by the data node. The heartbeat signals are sent every 3 seconds by default.
- Block report is the list of healthy blocks currently present in the data node. The block report is sent by the data node to the Name node. The blocks report is sent to the Name node every one hour by default.
- When a new block is received by the data node it again sends a short report as the acknowledgment that the block has been received. This is called a block received signal.