+91 70951 67689 datalabs.training@gmail.com

Hadoop can overcome the limitations in Network File system


It is understood that distributed file system was created to hold the large amount of data and to serve large number of clients. Though Network File system is the most commonly used distributed file system, it has some limitations. It could hold an only limited amount of data, cannot offer any protection against hardware failures and it could result in possible network overload. So to overcome all such limitations Hadoop was created. Hadoop was created such that it could address all the limitations of the Network File System. We can store the large amount of data using Hadoop, it provides accident hardware failure protection and also it provides very fast access to the data. So, how could this be made possible?


Hadoop is designed to address Big Data issues so the data you can store using Hadoop could go t terabytes and petabytes. The way it lets you store the large amount of data is by spreading the data across a large number of machines. Even the file size that you can store is much bigger that what you can store using distributed file system. Next, scalability is there; for more storage you can just keep on adding more machines. With Hadoop, we are talking about multiple machines in a cluster unlike distributed file system, which is going against only one computer.


Since Hadoop uses multiple computers in a cluster, hardware failure is unavoidable; computers might sometimes fail randomly across the cluster and sometimes the whole rack might fail too. for your information, a rack holds more than one computer and some hardware failures are related to the rack itself. So, as a way to cope with this problem Hadoop is designed to provide excellent hardware failure protection. It splits the file that you are storing in a Hadoop file system into equal parts/pieces and it will send these parts to different computers. Not only that, it will make copies of each part; by default the number of copies is set to three, which can be increased or decreased as per requirement. Now the original pieces and the clones are spread all over the cluster. Hence, if any computer fails there is a chance to fix them and there will not be any problem of losing data.


Finally, it is designed to provide super fast access to the data and to serve large number of clients. Even here it has scalability; if you want to serve more clients you can add more computers without any performance issues. And scalability means the performance is directly proportional to the number of computers.


So this is how Hadoop overcomes all the limitations that are generally encountered when using commonly distributed file systems like NFS.