Hadoop – Some Myths Busted (4 to 6)
Fact Number 4: One thing people need to remember is that Hadoop is a file-system and in fact it is not a database management system. If you look at the acronym the FS in HDFS stands for file-system! A lot of people refer to it as a database but though it certainly can manage data, file-based data of a wide variety and there are a lot of great things about data management in Hadoop tools but that does not make Hadoop database management systems. So, one has to understand that Hadoop tools don’t have the same capability and functionality of a Database Management system. And this actually is the good thing about Hadoop and its attended tools. They just do a different list of things for us as opposed to say a DBMS based data warehouse. So, in fact the Hadoop technologies are actually good compliment to BI and data warehousing stuff that we are doing. So one has to be careful about it and one shouldn’t describe HDFS technology as a database as it’s typically not the case.
Now that we are clear about HDFS, one would be wondering, what about HBase? Well, actually the way it works is typically there will be a combination of products; one typically starts with HDFS as the base and then one would layer other Hadoop products on top of it. So if one layers HBase on top of HDFS it does bring some of the capabilities of a database management system. But still, it’s nothing compared to the rich maturity that one would associate with the older DBMS brands. And typically HBase is really helping turn HDFS into a database but it’s a pretty rudimentary one.
Fact Number 5: Now moving on to fact number five: There is high Query Language available to the Hive product which one would layer on top of HDFS and one might use Hive with other products like HBase just mentioned above and hive does have its own query language and Hive QL resembles SQL, which is the standard query language we all depend upon, and yet Hive QL is not standards SQL.
Fact Number 6: People often think Hadoop and MapReduce require each other. In particular people think that MapReduce is one of those products layered on HDFS and it cannot be used anywhere else. But that’s not quite true either; so let us look at the facts. It is originally people at Google who developed MapReduce and may even have developed MapReduce as an open-source product. They developed that before HDFS even existed; so that alone tells us they don’t require each other.
Then also there are a lot of variations of MapReduce that work with various storage technologies, including HDFS, but also other filesystem and even some database management systems. So, you get the idea Hadoop and MapReduce don’t necessarily go together but quite often they do, but that doesn’t necessarily mean that MapReduce cannot work without Hadoop.