The Name Node
In Hadoop we have the concept of masters and slaves; basically it is one master and many slaves. There is no limit to the number of slaves the master can handle and there is no need to maintain any specific ratio between the master and slaves as well. The master node is called the Name Node and it is denoted as NN and all the slaves individually are called Data Nodes, denoted as DN.
So, why is the Name Node called so and what it is that the Name Node actually does?
- It is a physical machine that manages both directory namespace manager and inode table for HDFS.
- A Name Node can interact with a client, it can interact with a Data Node and it can also interact with a secondary Name Node.
- The Name Node contains two critical tables: the first one is filename to block sequence mapping and this is called the namespace. The indicates that if you store a file in HDFS it is split into blocks and this namespace contains the information about which file is split into which blocks. The second one is blocks to data node mapping; this is called as the node. Depending on the replication factor, each block is stored in 3 different data nodes by default. So, this inode contains the information pertaining to which block is stored in which data nodes. So, these are the two critical tables that are controlled by the Name Node.
- The Name Node filesystem metadata contains FSimage and edit log. FSimage represents the snapshot of the filesystem namespace. Edit logs are the files into which the transactions or modifications to the filesystem are written once the latest FSimage is created. Hence, both the FSimage and editlog will be in the disk before the Name Node starts; on starting the Name Node both the FSimage and edit logs will be loaded into the Name Node’s RAM.
- FSimage and the dialog together constitute the filesystem; hence the Name Node tries to merge both these together and this process is called checkpoint. The Name Node does the checkpoint only during start up.
- After the completion of a checkpoint, it will constitute a new FSimage by merging the previous FSimage and edit log and the new FSimage is written to disk again. Then a new editlog file is created, which is an empty file. Whenever you are writing new modifications to the filesystem they will be entered in the editlog.