The Basic Concepts Behind Hadoop
There are many misconceptions around Hadoop these days, creating a cloud that sometimes inhibits us from engaging in it for fear of getting tangled with all of the complexities that as we believe for sure must follow. In great part, the myth around Hadoop may be a side effect of its name, which is new and whose meaning cannot be easily explained. So we decided to start demystifying Hadoop by simply explaining, well what is in a name? Hadoop was created by Doug Cutting who named his new project after his son’s favourite toy, a stuffed elephant for whom the boy had given a name he himself had made up; the elephant’s name was Hadoop.
So, now you can understand that the name doesn’t have some very deep meaning or stands for some acronym. Now that we understand the origin of Hadoop’s name, we’re going to emphasize that Hadoop was not designed with the typical enterprising mind. Hadoop was designed for cluster architecture built out of commodity hardware. Trying to understand Hadoop concepts in the context of the wrong architecture design can indeed hurt our minds. It’s important to remember that cluster architecture is based on a set of very simple and basic components that can be available in thousands or hundreds of thousands and can be easily assembled together. It all starts with a node consisting of a set of commodity processing cores and main memory attached to a set of commodity discs. Then a stack of nodes forms a rack and a group of racks form a cluster, all connected via high-speed network to enable an exchange of information. Hadoop was designed to leverage the resources in an architectural layout made available within a cluster architecture. So, when thinking Hadoop put your mind in the right context and think about cluster architecture first.
Now before we dive deeper into the concepts and elements that make up Hadoop, we need to add one more layer of context to these discussions which answer the vital question of why; why was Hadoop created? It is actually very simple; people were looking for a way to continuously index massive amounts of data. Specifically the initial research was done by a little company called Google who needed a way to index the entire World Wide Web every day. Needless to say, their investment has paid off! Doug Cutting got the inspiration for Hadoop after reading Google papers on the Google file system and Google’s Map reviews, which describe the design prototypes that used commodity hardware to effectively process massive amounts of data in a fraction of the time previously required. Why is that critical to us? It is because just about all components have their own Big Data challenge which, no matter how hard they try, cannot be solved with traditional enterprise approach to analytics. And as a result of the breakthroughs made by Google and others who joined them early on, including Yahoo and Apache, to date there is a fast-growing industry emerging inside of hi-tech, focused on helping achieve things never thought possible before. FaceBook, for example, uses Hadoop for processing and analyzing data patterns across its 845 million users, almost one-seventh of the world’s population, worth personal data.