what is a data scientist? why does this appear to be such a hot job with the skill set today scientists need now? There’re two reasons, why organizations are asking this question.
The first one is Big Bear, so we know that the date is a big trend organizations want to get all their data. They have a lot of data, they’re not you doing as much as they could be doing with it decides his job is to find that mean. So if you want to two big titted, you have to have a data scientist.
The second reason they’re asking this question, if it were not the skills are a data scientist because they’re finding it hard to hire so do you need to hire a Ph.D. in mathematics or statistics or perhaps you can grow a data scientist with your existing organization.
Let’s start with what is a data scientist? fundamentally a data scientist is someone who finds you discovers that’s where scientists, they make a hypothesis, and they try to investigate that hypothesis in the case. But data scientist they do it with Dana they look for meaning knowledge in the data and they do that in a couple a different ways. One is they visualize the data they look at the data. They create reports and then look for patterns in the data, that’s very similar to what you might think of as a traditional business intelligence analyst or data analyst.
So that’s one of the tools today site issues but what really distinguishes a data scientist do you serve algorithms advanced algorithms that actually run through the data looking for all this mean. you may have heard things like machine learning algorithms. You may have heard out with them such as neural networks are regression our the church he means there’re dozens of these algorithms out there and essentially they won through the data looking for the meeting. That is one of the fundamental tools a data scientist to use those algorithms.
The data scientist has to have a strong foundational knowledge in mathematics and statistics in some cases computer science and domain knowledge. So a data scientist given an enormous dataset I’ll or maybe not such a big data set in a question. So the question might be like a typical business question might be I love what customers are likely to charm that is what customers are likely to up you know go to a competitor. That may be a very good question for data scientist to answer and the data scientist would go about that by gathering all the data running out for them till they can find some reliable patent answer that question.
Another question a data scientist my answer I a recommendation they may try to answer the question I how can I improve the recommendations in a movie recommendation in fact that’s exactly what Netflix did on Netflix had a I ran a contest where they would pay a million dollars to anyone who can improve their recommendation engine by just percent I five.
Data scientist actually came up with algorithm sign ensemble diagrams to do that so data scientist aren’t you answering questions and the using data to answer those questions using combination of data in the Alps when you have large datasets you rely on many more algorithms . So the fundamental knowledge for data scientist is to try to understand those out now when you think I’ve who is a data scientist there’s a lot of myths out there. what data scientist is not a data scientist is not a programmer another Java programmer who knows who do.
Many people are billing themselves as data scientist. Because they have certain technical skills that’s on a data scientist that someone who may be known how to run do knows how to integrate data knows how to run it in less than a also understand some of these discovery techniques in these algorithms in my mind I don’t qualify as a data scientist .
Many software developers actually have taken classes in artificial intelligence machine learning and understand some of the techniques that this yeah thing a data scientist is not. It is not a Business Intelligence Analyst Business Intelligence Analyst may create reports and dashboards, but they’re doing not based upon what they think is important in the data. Adidas scientist might hypothesize what they think is important, but they’re going to use these algorithms to kind of confirm that hypothesis, so there’s kind of a gap here between a business intelligence person and a programmer. The programmer might have the technical skills. The business intelligence person may have a lot of business domain knowledge.
The data scientist, however, needs both have those skills they need some technical skill, some domain knowledge but they need is fundamental knowledge about how crime statistics mathematics a scientific Matt. So that’s why it’s a little difficult to find data scientist right now because there’s a it’s a very unique skill set now having said that their ways that you can’t do data science without getting a master’s up with Fr this by using tools alright more sophisticated tools that are doing some the heavy lifting metadata scientists are normal would have to do. You can use the software tools and or more getting developed on every day because the market demand. This is going to give the business person the ability to do some type of data science work will you be able to do it all now I mean a large organization is going to need a hardcore data science scientist but I heart for data scientist is not going to be able to handle all over the world that’s available someone at work can be farmed out to some business analysts now.
About the data scientist is point is the fundamental lifecycle the discovery life cycle data science .there’s a couple stage
the first stage is understanding the date and prepping the data that is getting all the data you need before you run the algorithms on it, to see if you can find some of us now survey after survey of data scientists a show that between and percent data scientists time is spent on assembling that data that is not a highly efficient way to use those unique skills for data science, because often that assembly have data Mike making a sequel statement or are doing .
Text mining get map data set together is a job that can be done by data integration specialist or technical specialist so one other things that you want to do .when I read data scientist in your organization is to assign other roles which can help leverage that dated the data scientist skill set, so you want people who have who understand the data in your organization and who understand the techniques for integrating and whether to ETL whether to do getting that dataset together, so the day is so the data scientist can spend most time . What’s most valuable which is focusing on the discovery process focusing on running the algorithms and finding predictive models or new knowledge within that dataset. Higher data scientist who has their skills but make sure that you have internal people who understand Davis dataset if you hire it is scientist wrong look your company house lost some of today’s just in a data warehouse for database. You may have hundreds I’ve database for hundreds of database tables with different field names for that data scientist to be productive on his or her own would take months just understand all this data sources so that’s why if you hire data scientist from the outside team of with someone on internally who knows your day resources and who can’t put all those data sources together I hope this explains not serve what you need to look for data scientist and how best to leverage this data scientist skill .