The Role of a Data Scientist in Today’s World
Now currently we have this new role of data scientist where just like the webmaster of the 90s data scientist is supposed to understand everything about data. And understanding data isn’t sufficient; rather every challenge that comes up in connection with data has to be dealt with by a data scientist. He is supposed to understand how data is stored in the backend, how to manage the Hadoop cluster, how to analyze data using machine learning statistics and ultimately visualizations so that it can be presented to top-level management. It is expected that all these aspects are understood by a data scientist; he is expected to be the jack-of-all-trades of data science.
Probably, after a couple of decades, as data science gets more matured, they will be more roles carved out of this field. But currently, data science is a confluence of many components and as of today we don’t have a clue about what the role of a data scientist should be. If you are trying to enter the field of data science today it is strongly recommended that one gets some knowledge from all of the various components that data science comprises of. Say, if one is already working in data warehousing, they would have some idea of data engineering and advanced computing so one should go ahead and get some knowledge in statistics, machine learning and visualization.
Or let us say there is a BI person who creates reports and visualizes reports day in and day out, and that person wants to get into the field of data science, some knowledge about data engineering which is the backend processing and statistics need to be acquired. And in case of machine-learning experts and core mathematical statisticians who want to get into data science system aspects like data engineering and advanced computing knowledge needs to be acquired. And for people having domain expertise, which is like a functional person, to become a good data scientist knowledge about all the other components viz., statistics, visualization, data engineering and advanced computation need to be acquired.
Thus, for people who are planning to get into the field of data science today it is a good idea to get some exposure in each of these components – statistics, domain expertise, visualization, advanced computing and data engineering. Though all this seems unrealistic, that is, expecting one single person to have expertise in such diverse field, but that is what is expected currently from anybody who has the designation of a data scientist. To get the job of a data scientist, you should have as much knowledge as possible about all these components. So people need to understand Hadoop platform or Cassandra whichever is being used as a part of data engineering along with classification models and recommendation engines, etc, which is about statistics. And depending on the company, the visualization aspect might also be required.
Though this is the current situation, over the time it might be bettered and individual roles for different components might be carved out. In the future, we might come up with data scientist teams with different skill sets to fulfill the requirements. And this is what is being done with big companies like Google, IBM, and other big companies.