Core components of Hadoop
Here we are going to understand the core components of the Hadoop Distributed File system, HDFS. The two main components of HDFS are the Name node and the Data node. HDFS basically follows the master-slave architecture where the Name Node is the master node and the Data node is the slave node.
Here are some of the important functions of the Name Node:
- It manages the blocks that are present on the data node.
- It manages the file system tree and other meta-data.
- It is responsible for maintaining the namespace image and edits the entire hierarchical file system.
- If there are any changes made to the file system namespace or its properties, the changes will be recorded by the Name node. All the changes that are made will be logged.
- And finally the Name node maps the file block to the data node. The mapping is done as the block to location, block to the file.
Here are some of the things that the data node daemon at the slave node does:
- They are basically the workhorses that are supposed to actually serve our read/write requests and actually handle the storage; because the disks are on those machines. So, the slaves are deployed on each machine and provide the actual storage.
- Additionally they are responsible for serving read/write requests for the clients as well. So when we want to read a file or write a file they are the actual machines that are processing those requests and they are the ones that are storing the data.
- So, they store and retrieve blocks when they are asked to.
- And they also have to report back to the Name Node periodically with the list of blocks they have. So they have to be continuously communicating with the Name Node via the heartbeat mechanism and the information being communicated is about the list of data blocks that are available in their disks.