The era of Big Data gives rise to “data-driven” organizations that are very effective in collecting and processing large amounts of data as a means of optimizing their business processes and their managerial decision making. Data-driven organizations are therefore provided with opportunities to improve their competitiveness and gain strategic advantages that set them apart from their competitors. To this end, data-driven enterprises need to build powerful and effective data science teams, which enable them to fully leverage Big Data in their operations. However, building a powerful data science team is a very challenging task, since it asks for attracting and bringing together several people with different profiles and skillsets, which is quite difficult given the known talent gap in data science technologies and skills.
Data Science: A Changing Landscape
The advent of Big Data has drastically changed the data science landscape and the skillsets required from the members of data science teams. In the near past, data science was mostly about traditional business intelligence in enterprise environments. Data scientists had to deal with conventional data types and very common data analysis techniques, including standard and ad-hoc reporting, the development of dashboards, as well as the formulation of queries over enterprise databases. Likewise, they had to deal with structured datasets, which comprised manageable data volumes that usually resided in data warehouses. In this context, data scientists were usually offered with tools for querying and exploring these warehouses as a means of answering questions about what happened and whether things keep up or deviate from a plan (e.g., planned sales). The latter tools could also support data mining tasks towards forecasting future activities and creating future plans.
In the Big Data era, this conventional data processing is no longer the case. Data scientists have to deal with much larger and diverse datasets in order to dynamically predict future business scenarios and evaluate different alternatives. Data are no longer structured, but rather stem from a variety of heterogeneous sources that include unstructured data sets such as data from social media sources. Likewise, data may arrive in databases with very high ingestion rates, as is, for example, the case of data stemming from sensors and internet-of-things devices. Most important, the processing of the data is no longer confined to reporting and the use of simple data models. Rather, an enterprise data science team needs to be competent in formulating and solving complex optimization problems, which are typically based on predictive modeling, forecasting, and statistical analysis. This is because businesses are not only concerned about finding out whether things are working well. On the contrary, they want to identify optimal business scenarios and hidden trends, while also explaining why something happens in a certain way. Furthermore, the availability of very large amounts of data gives rise to advanced data analysis techniques (e.g., deep neural networks) that were hardly used in the past. Such techniques are at the heart of Artificial Intelligence (AI), which is nowadays trending.
When considering the assembly of a data science team, enterprises must have in mind the contemporary Big Data environment, rather than the traditional business intelligence one.
Profiles and skills of Data Scientists
In this context, a data science team needs to bring together individuals with knowledge and skills in the following areas:
- Machine Learning: Data science involves automating decisions based on machine learning agents, which are able to process data and take a decision on one’s behalf. A data science team requires experts on various machine learning models and techniques, which will allow the creation of programs that learn based on the data.
- Statistics: Statistics provide an alternative way of creating machines that learn based on data. They use statistical rules in order to identify, verify and extract patterns within datasets. As a result, data science teams must comprise statisticians or mathematicians, who can statistically analyze datasets.
- Deep Learning: Deep learning can be a considered as a subset of machine learning. Nevertheless, deep learning experts are usually specialized in the design and implementation of deep neural network models, which are very different from conventional machine learning techniques such as decision trees, regression, and Bayesian statistics-based models. Therefore, data science teams that need to explore and apply AI techniques for knowledge discovery are likely to employ deep learning experts in addition to experts in conventional machine learning.
- Big Data Infrastructures and Databases: As already outlined, enterprises data reside in corporate databases and data warehouses. Moreover, Big Data applications leverage a new wave of scalable databases such as NoSQL databases and associated distributed filesystems such as the Hadoop Distributed File System (HDFS). Experts on all these database technologies are therefore indispensable members of a data science team. They will make sure that the data are stored and managed in a secure and reliable way, which scales in a cost-effective fashion. At the same time, they will ensure that data becomes easily accessible to analysts, such as machine learning experts and statisticians.
- Visualization: Data-intensive applications must present their results in an intuitive and user-friendly way. This asks for ergonomic visualizations of very large datasets, which can only be designed and developed by relevant experts. The latter should be part of the enterprise data science team.
- Programming: The tasks of accessing datasets and implementing machine learning models and statistical processing techniques require software development expertise. Data science teams must, therefore, include competent programmers, who should be able to master programming languages, frameworks, and platforms for data-intensive applications such as R, Python, Java, Julia and more. This is another set of skills that should be available within the data science team.
- Business/Domain Knowledge: The development of data mining applications cannot be based on statistics or machine learning only, without considering the business domain of the problem at hand. Domain knowledge makes a difference in data science, through alleviating issues stemming from overfitting on the data or from the poor expressiveness of machine learning models. It can provide insights beyond what is visible on the datasets being processed. Thus, a data science team should comprise business experts who will establish the problem domain
The members of a data science team are likely to possess more than one of the above-listed skills. For example, it quite common for programmers and software engineers to have a very good knowledge of databases as well. Likewise, the machine learning and deep learning experts are usually competent on statistics as well. However, it’s highly unlikely for a member of the team to be proficient in all of the above areas, which makes evident the complexity of the team assembly task.
Beyond their skills, the members of a data science team should be characterized by the following general characteristics:
- They should possess excellent technical knowledge in a variety of IT and data-related areas.
- They should be able to quantify concepts using mathematics, statistics and analytical formulas.
- They must be curious, creative and always interested in experimenting with the data thereby leading towards new discoveries.
- They should be excellent team players
Best Practices
While looking for the above profiles and properties, enterprises can take advantage of the following best practices:
- Seek for proof and credentials about the required skillsets. Proof may lie on a proper STEM degree or even on a certificate from an online training program (e.g., Udemy or Coursera).
- Seek for practical experience and expertise. In addition to theoretical knowledge, the members of the team should have practical experience. This can be proven based on previous employment or even through participation in real-life problem solving on-line (e.g., as part of the Kaggle platform).
- Domain knowledge may reside on the company or in partners’ in the same industry. When seeking domain experts, look inside the company or within partners’ in the same industry.
- Consider developing and enriching in-house skill sets. Sometimes it’s easier to advance the skills of existing employees than hiring new ones.
- Pay emphasis on the interview and hiring processes. The latter processes should be tailored to what you need in terms of skillsets while providing the means of assessing the soft skills of the candidates, including their team spirit, analytic ability, and creativity.
- Ensure executive engagement. Commitment from the senior management is essential towards designing and executing the data science team assembly process, including some of the above-listed steps.
In the coming years, more and more organizations will be trying to create highly effective data science teams. The task is challenging, but there are best practices and solution guidelines for putting it on the right track.