929 5555 929 Online | 929 5555 929 Classroom hadoopwebmasters@gmail.com

How Did Hadoop come into being?

 

Data and its possibilities are going Big in the coming future. So, today we are going to discuss what triggered Big Data technologies and how things unfolded from the time when Hadoop wasn’t even Hadoop. It all started in the year 1998 when the internet created a new boom and search engines were looking at huge amounts of data to process. They were the first to be get hit by the enormous amount of data to be processed; they were looking to categorize all the web pages on the World Wide Web. If you remember the web was chaotic place then; there were just text index for links and they when from one sub-category to another sub-category and finding something meaningful and relevant was very difficult.

 

That situation was just 10-15 years ago but now there is Google that provides quite accurate results in a few mille-seconds. Back then, Yahoo was the leader of search engines; then itself industry has realized that a cluster of computers was needed to process such big amount of data as the data was too large to be processed by a single powerful server. Moreover, the data was constantly increasing and changing. And so, yahoo started to face problems as they were looking at data from the whole World Wide Web. After a while, they even made their own version of distributed computing.

 

And then of course Google came into the picture and showed the world how they achieved the most accurate results by deploying crawlers and page ranking algorithms. Google was one of the first to made distributed computing framework which solved their search engine problems with accuracy. It was at that time itself that the creator of Hadoop Doug Cutting and his colleague Mike were working on the project Nutch, which also was a search engine deployed on distributed clusters.

 

Actually at the start of this century, most projects in Silicon Valley were related to search engines only as it was the main big idea or pain point at that point in time. But obviously Google’s design was the most superior and accurate one. In the year 2003 Google Labs published the paper Google Filesystem and in December of the following year published another paper on MapReduce which explained the underlying framework they have used. These were only high-level ideas on how their clusters were managed and not the low-level designs. At this point of the time, Doug and Mike saw these papers and found them very interesting, and by improving on these ideas they made Nutch Distributed File system in 2004 and changed to MapReduce framework by 2005.

 

Doug saw this as a possibility to create a framework that can solve a lot many problems related to computation of data rather than being useful just for web searches. By the end of 2005 and start of 2006 this project was moved out and named Hadoop with an idea that a framework would be created which could generically be applicable in various Big Data problems and not just be limited to search engines. So Hadoop was actually Nutch before it was named Hadoop.