Network file system VS HDFS – Limitations of NFS
To understand Hadoop and the need for its introduction better we first need to understand what type of technology existed before and why there was a need to create a file system like Hadoop. Hadoop actually belongs to a category of file systems called distributed file systems and these were mainly created to serve two purposes: to hold the large amount of data and to serve multiple clients over the network. So, to understand better, let us look into the oldest and the most commonly used distributed file system NFS. Here we are going to understand why NFS couldn’t serve the needs, which is why Hadoop had to be created.
NFS or network file system provides remote access to a single volume stored on a single machine; the access is provided to clients over the network. This is how it works: the network file system makes a portion of the local file system visible to the external client. So, the client can mount this remote file system directly on to his machine. This way the client can interact with the network file system as if it is his local file system, local to his device/machine. In other words, the file system that is residing on a server will appear local to the client’s machine.
The benefits that come out of the network file system are:
- Transparency: the clients don’t have to be aware that they are dealing with an external, remote file system. They can work with it as if it is a part of their own machine. And hence he can use the same commands to open, write, etc as he does to a local file system.
What are the limitations of the network file system?
- The major drawback was limited storage; only limited data could be stored because network file system is going to the local volume. Since a single volume is stored on a single machine, the amount of data that can be stored is limited to the disk capacity of the machine.
- Secondly, there was no protection against hardware failure. If the main machine used to store the data goes down all the machines that use the data lose the connection and all the applications will stop working.
- Thirdly there could be a network overload: all the clients who have access will be connecting to the same server that is hosting the logical drive mounted on the clients’ local machines, which could cause a burden on the server.
These are some of the reasons why there was a requirement for new technology that could overcome all the above problems and this lead to the development of Hadoop.