Hadoop – The Big Data game-changer


We hear a lot about data science and science is an endeavor to discover knowledge. And so a lot of people assume that the implication is the knowledge that we discover from Big Data is insights about the real world. Though that is quite true there is something else that is of more importance to your competitive advantage and that is that these techniques, procedures and the input data and knowing how to string them together provide impact to your business. Knowing that we can take some data and put them in a database and provide them to a customer or an analyst and analyze it to write the information obtained down on the back of a business card is much more valuable since that is something that can be optimized and can have an impact on the bottom line.


Hence, data science is exploration and discovery of what the data is and why Hadoop has been so game changing is frankly because it has dramatically lowered the time and cost required to conduct basic experiments. You have a hypothesis that a certain dataset or a certain mash-up will have a certain impact and you have a hypothesis of not only about the statistical properties of the data, the size of the data and whether or not we can do something with it but also hypothesis about whether or not if anybody cares about it.


Another point is that Hadoop may not be perfect for a lot of things it is a great multipurpose tool for exploring the Big Data space and how it can have an impact on business. And it isn’t just about Hadoop; it is about NoSQL database and all the tools in the ecosystem. Additionally, being able to lower the working costs and being able to scale the data, that is deal with such huge data that is crippling say your oracle database and also being able to scale with respect to the complexity of the data and also being able to put all the different datasets in a single system and having it at your fingertips to be able to explore that quickly is why Hadoop has been so meaningful to us data scientists. Hadoop is a game changer since it can maximize disk aisle, run on commodity hardware and it is built to scale; having these three principles gives assurance that it can be used long into the future.