How Does Hadoop provide hardware failure protection?
Here we are going to talk about replicating data to provide hardware failure protection. It is a known fact that Hadoop has its own system of storing files. In HDFS files are broken down into blocks; each block will be of fixed size and it will be stored on a target computer in a cluster. The target computer will be decided by the Hadoop randomly. So it is understood that multiple computers are playing a role in storing the file.
Since we are generally using commodity hardware for storage, say if we have ten computers, the chances that at least one fails are quite high. Hardware failure can be as small as hard disk not working to as big as the computer not working. So, the bottom line is even if one computer that is holding the blocks fails the whole file is unavailable. So how does Hadoop address this issue?
The issue of hardware failure is the address in Hadoop through a process called replication. Hadoop has been designed bottom up to address hardware failure protection; this is not something that was built on top of the software. The reason for taking care of the problem of hardware failure is because Hadoop deals with Big Data and hence multiple computers are required to store the Big Data and hence hardware failure protection is a must-have feature.
So this is how it works. A file that is to be stored is broken down into blocks says block 1, block 2, block 3 and block 4. Generally 3 copies will be made to each block and each copy will go onto a different computer that is selected randomly by the software. The number of copies is three by default; you can change the settings is required. If the number is changed to one then that will be zero protection in case of hardware failure. You can afford to increase the number to a higher value in case you want more protection and if there are more computers in your cluster.
In this example, block 1 is stored on three computers, say computer 1, computer 4 and computer 6. Now if, for example, a computer fails; say computer 4 fails. The file will still be accessible since block 1 is still available on the other blocks 1 and 6. In addition to computer 4, even if another node say computer 1 fails, the block 1 is still available on computer 6 and hence the file will be accessible.
Thus, the problem of hardware failure is taken care of through the concept of replication of blocks on different nodes and racks.