Why do we need Hadoop to analyze Big Data?
You can’t turn a corner without seeing this phrase Big Data; data has got big and there are all sorts of evidence and backup information about that. You only have to look inside internal organizations to figure that out. One of the things that the Big Data term misses out is about the scale of the data and about the size of the data as well; unstructured data is sort of the most undervalued data in an organization. Additionally, it is a fact that unstructured data is growing 5-6 times more as compared to structured data.
Many people are concluding that the answer to working with any amount of any type of data is Hadoop. Why is it so? Why are people reaching the conclusion that Hadoop is the answer? What can we actually do with Hadoop? Why should we believe some of the hype that is around Hadoop? How should we think about it and how should we get started with it? These are some of the questions that are in people’s minds today and we are going to answer them here.
In the context of using Hadoop, when talking about data we should think about data in two vectors: one, it isn’t just about scale as mentioned, it is also about the degree of unstructuredness. Second, it is also about the sophistication of the analytics that we have to apply to the data. If we think about the world of data before it got Big, it was small initially and then it was small with more sophisticated analytics or more complex analytics on that small data. But as data got big, even with Moore’s Law enabling, the single threaded CPUs got really complex analytics to move along; it got really complicated. So, today customers are coming to the conclusion that for Big Data we need Big analytics.