+91 70951 67689 datalabs.training@gmail.com

Relational databases Vs Hadoop


As more and more data becomes available, companies and organizations want to embark on Big Data projects but they are running into limits on using relational databases. What are those limits?

  1. Scalability: Many companies have projects that are in the gigabytes; the new world of Big Data is moving into Terabytes or even Petabytes. Although it is possible to scale traditional databases to multi-TB or even single PB ranges it is very complex, difficult and very expensive.
  2. Speed: Also some of these Big Data projects have different kinds of needs around data ingest or speed. A lot of times customers require real-time results, which again relational database systems are not built at scale and speed to be successful in bringing all that data in.
  3. And then there are other complications around queryability, applications of sophisticated processing like machine learning, etc.

Relational database technology is not going away; the Hadoop ecosystem is designed to solve a different set of data problems than relational databases.


There are a number of database choices available today. Before the Hadoop ecosystem was broadly available it was common to find information in file systems, maybe even in XML. This does play directly into the creation of Hadoop and the ecosystem because one of the core components of Hadoop is an alternate file system called the HDFS; it really is designed as a replacement to whatever the file system that people were using. It has got a level of sophistication around management of data in a file system.


In addition to that, along with relational databases we now have the ability to look at other databases. So, in the NoSQL world there are a number of groups of databases that can be broadly categorized as key/value stores, column stores, etc. For some Big Data projects NoSQL databases are an appropriate solution. It is important to understand that while some implementations of portions of the Hadoop ecosystem can be categorized as NoSQL Hadoop itself is not a database; it is an alternative file system with a processing library.


And of course as mentioned previously relational databases are still around and they are not going to go away as they are designed to solve a specific set of data problems that Hadoop cannot handle. So, when thinking of bringing Hadoop as a solution it is going to be in addition to existing RDBMS – not as a replacement for it.