One of the most popular quotes of our time was authored in a recent article of the Economist: “The world’s most valuable resource is no longer oil, but data”. This quote is very true as data are generated in an unprecedented pace. Research firms estimate that by 2020 more than 1.7MB of data will be created every second for every person on the planet. Data generation is nowadays combined with advances in data storage and processing in order to enable a wealth of business opportunities. The latter are usually based on the development and deployment of data-driven applications in a variety of vertical sectors (e.g., trade, commerce, industry and healthcare), and for many different purposes (e.g., marketing, sales, production, finance, accounting and human resources). The majority of modern enterprises are currently trying to ride the wave of BigData analytics applications, as a means of making their processes more intelligent and more agile, while at the same time improving their decision making. This holds true not only for large industries, but also for Small and Midsize Businesses (SMBs) as well.
Both large enterprises and SMBs face the challenges of deploying and operating BigData analytics applications. These challenges tend to be common regardless of the type of enterprise. They include the establishment of a secure and cost-effective infrastructure, the build-up of a competent data science team, access to proper datasets, as well as experimentation and selection of proper machine learning and data mining models. While the challenges are common, SMBs have typically harder times to confront them, as they usually lack the in-house knowledge and the equity capital needed to cope with them. Fortunately, the Big Data ecosystem offers some smart solutions for SMBs that wish to adopt and fully leverage Big Data analytics in their business operations.
Building the Big Data Analytics Team
Establishing a competent Big Data analytics team is a very important challenge, given the talent gap on Big Data technologies and the need to involve business experts that are usually busy with their primary business tasks. Large companies are in most cases able to hire competent full time data scientists, as they can pay for the required human resources costs. On the other hand, SMBs might have to resort to alternative less costly solutions, such as:
- Hiring in-house consultants based on short term contracts. Such consultants can work closely with the employees of the SMBs for a specified time period. In this period, consultants get the work done, while at the same time helping SMBs to accumulate in-house experience in a cost-effective way.
- Big Data Outsourcing the BigData analytics tasks to some off-shore experts. This may not be a good idea for building in-house expertise, but it is usually a very cost-effective option. Moreover, there are nowadays many companies offering high quality Big Data analytics services, which provides a wide range of alternatives that can match many different requirements.
In both cases, SMBs will also have to involve some business experts in the team. This is a costly process as it reduces the time allocated by the expert in other business critical tasks. Nevertheless, there is no easy remedy to this, as a data science team without a business expert can hardly produce any useful result.
Addressing the Infrastructure Challenges
Big Data analytics require a scalable IT infrastructure that comprises proper tools. Large enterprises are in several cases building this infrastructure on premise as part of their data centers. That’s not however always possible for SMBs, which typically operate smaller scale data centers and in some cases do not have on premise IT services at all. That’s the reason why SMBs are considering hosting their data in some cloud infrastructure. That’s a very good option, as cloud providers offer a variety of high quality infrastructure services, including elastic provisioning of resources, access to storage as needed, as well as backup and recovery services. Furthermore, SMBs can nowadays benefit from cloud-based machine learning libraries, which form the core of the so called Machine Learning as a Service (MLaaS) services. MLaaS services ease experimentation with different machine learning and deep learning libraries, while obviating the need for SMBs to deploy data mining tools. Hence, they allow SMBs to use machine learning tools without paying costs for installing, deploying and upgrading machine learning libraries and toolkits.
Overall, cloud based services for Big Data analytics extend the known benefits of cloud computing to Machine Learning applications. These benefits include flexibility as cloud services are provided based on a pay as you go paradigm, which alleviates the need for significant capital investments on IT infrastructure and ML tools. In the long run, SMBs can leverage this payment modality in order to invest in Big Data analytics infrastructure as they grow, rather than paying upfront.
Data Sets Availability
Big Data analytics is all about processing large volumes of data from a variety of heterogeneous sources, which feature velocity and veracity as well. Big organizations tend to produce more data, which they can use for training algorithms and experimentation. On the other hand, SMBs may not be able to collect very large amounts of training data in order to kick-start their Big Data Analytics developments. To their rescue, they can take advantage of open datasets, which they can access freely and use for testing models and gaining experience.
Finding the Right Model
One of the main challenges of BigData analytics involves finding the right machine learning or deep learning models. This entails experimentation with different models on real datasets. In order to do such experimentation SMBs had better adopt standards-based iterative methodologies for evaluating and comparing alternative models. Two of the most popular iterative methodologies that are fit for this purpose are the Cross Industry Standard Process for Data Mining (CRISP-DM) and the Knowledge Discovery in Databases (KDD). Both of them entail the steps of defining a business question, preparing the available datasets and evaluating alternative ML/DL models in an iterative way. Therefore, SMBs should acquaint themselves with these methodologies and their use for discovering the best model for the problem at hand.
SMBs cannot afford to ignore Big Data analytics, as sooner or later most of their business operations will be data driven. In particular, Big Data analytics will be applied in all functions of a company such as marketing, finance and human resources. Thus, SMBs had better plan their Big Data analytics investments, even when operating on a budget. In this direction, the above-listed guidelines can help them take smart and cost-effective decisions.