The hardware failures are bound to happen; they will happen and the good thing about Hadoop is that it is built keeping hardware failures in mind. It has a built-in fall tolerance. By default Hadoop maintains three copies of each file and these copies are scattered along different computers. This way when a computer fails the system keeps on running because data is available from different nodes and once you fix the failed node then Hadoop will take care of that and it will copy some other data to that node. That is one of the very important features of Hadoop File System – the fault-tolerance.
The fault tolerance is not limited to this failing at one of the slave nodes or even that the master node. It is also applicable to task tracker services that are running on slave computers; if any of the computers fail even if just the service fails the job tracker will detect the failure and it will ask some other task tracker to perform the same job. The fault-tolerant is clear, but one can argue, the fault-tolerant is there only as far as the slave computers are concerned; if the master computer dies then that would be a single point of failure. But Hadoop has taken care of back to as well; the tables that are maintained by the name node that has all the index, where the data is residing, on this computer where the data residing, all those tables are backed up and the backup copies are copied over two different computers. And enterprise version of Hadoop also keeps two masters – one as the main master and one as a backup master in case if the master dies. That is not a single point of failure.