+91 70951 67689 datalabs.training@gmail.com

Does your company need to get into Hadoop?


Have you ever wondered how Google does their queries into their mountains of data or how FaceBook could so quickly deal with such quantities of data? To understand this we need to understand the concept of Big Data; though people might not have heard about this term now, you can be sure to hear about it in the coming years. This is because 90% of the world’s data has been created in the last 2 years and this accelerated trend is going to increase. All this data is coming from machines, social networking platforms, trading platforms, smart phones and other sources.


Since all this data is already available the question is whether to use it or not. In the past when larger and larger quantity of data needed to be interrogated businesses would require to write larger and larger cheques to their database vendors of choice. However, in early 2000s companies like Google were running into a wall; large quantities of data were simply too large to pump through a simple database bottle neck and they simply couldn’t write a large enough cheque. To address this Google labs team developed an algorithm that allowed for their large data calculations to be chopped up into smaller chunks and mapped to many computers. Then when the calculations were done they would be brought back together to become the resulting dataset. And they called this algorithm MapReduce.


Then this algorithm was later used to develop an open source project called Hadoop which allows the applications to run sing the MapReduce algorithm. Now with all these new terms it is easy to get lost about what is going on here. So, simply put, we are processing data in parallel rather than in serial. Well all this development is great but even though the MapReduce algorithm was released 11 years ago it’s still relying on Java code to be successfully implemented. The market is rapidly evolving and tools are coming available to help business adopt this powerful architecture without the major learning curve of Java code.


So, should your business be getting into Hadoop? Actually there are two ingredients that are driving businesses into investigating Hadoop. One is a lot of data, generally larger than 10 TB. The other is high calculation complexity like statistical simulations. Any combination of these two ingredients with the need to get results faster and cheaper will drive your return on investment. In the long run Hadoop will become a part of our day-to-day information architecture. We shall start to see Hadoop play a central role in statistical analysis, ETL processing and Business intelligence.