929 5555 929 Online | 929 5555 929 Classroom hadoopwebmasters@gmail.com

What is Data Science?

Data science is basically understanding data and being able to build necessary mathematical models in order to classify it and hence make any required predictions. For example, in financial domain, whenever a transaction comes into your system, we first have to analyze it to ensure that it isn’t fraudulent; in case it is fraudulent we need to immediately flag it and contact the customer or else take further steps as required. Hence, the classification of whether a transaction is or not fraudulent is a machine learning problem and a data science problem.

A similar example is the classification of emails and the detection of spam. Previously the spam messages also used to be delivered to the inbox. But now spam filters have matured to such an extent that we hardly see any spam in our inbox these days. In reality, there are occasions where an even useful message that might resemble spam to some extent might go into a spam folder. Though this is an example of the filters being too extreme, these spam filters are essentially doing machine-learning, which is like data science; all the emails that are regularly coming in from all the people in that particular company and using that data a specific email is classified as spam or not spam. That is a classification problem which is again relevant to data science.

So, given the above example, one can see what data science really is; essentially data science is understanding data and extracting meaningful and useful information from raw datasets. To show what data science we can come up with a number of examples. Take Google, for example, which we use every day; what Google is that it is essentially a search engine. We type in a search query and we get relevant results in a few seconds. So is Google search data science?

Well, back in the days when Google was just created and even before that they were a few search engines like Alta Vista, which is actually the first search engine in the internet space. What these people did was they simply indexed each webpage; each term was indexed as say term A appeared once, in webpage 1, 5 times in document 2, etc. and when a search was made for the term A they simply searched in which document the term appears and retrieved it for the searcher. That was how it worked.