Two of the most important trends that shape the era of digitization and the fourth industrial revolution are data modernization and cloud computing. Data modernization refers to the use of advanced data management tools for handling very large amounts of data from a variety of heterogeneous sources, including both well-structured and unstructured datasets. Specifically, data modernization tools enable organizations to collect and analyze the proliferating volumes of social media and Internet of Things (IoT) datasets, which can be hardly stored and processed within conventional databases and data warehouses. The use of such tools facilitates the development of Big Data and Artificial Intelligence (AI) applications. On the other hand, cloud computing enables access to large amounts of IT resources such as storage and computing cycles, which are also among the prerequisites for the development, deployment and operation of data-driven applications at scale.
These two trends complement and reinforce each other. Data modernization tools run on the cloud in order to take advantage of its scalability, capacity, elasticity and quality of service. Hence, the expanded use of cloud computing boosts data modernization. At the same time data modernization drives the development of cloud computing, given that the execution of data-centric applications like AI hinges on the availability of large amounts of computing resources. It is no accident that some of the most popular cloud applications like Machine Learning as a Service (MLaaS) are directly related to data modernization tools. In this context, it’s not clear whether data modernization drives cloud computing or the other way round. To explore this concept, one has to take a closer look on these two technological trends.
The Process of Data Modernization
Traditionally, organizations have been focusing on transactional data and reporting applications based on legacy databases. This is no longer the case, as modern enterprises have to deal with unstructured data as well, such as images, voice audio, comments on social networks, content of e-mails, as well as data from sensors and internet connected devices. Conventional databases fall short when it comes to handling large volumes of such unstructured data. That’s where data modernization infrastructures and tools come in. As a first step, they complement legacy transaction databases with other datastores like Big Data databases, Data Lakes and NoSQL databases. The latter offer significant scalability and cost effectiveness advantages over conventional data management infrastructures.
Note however that legacy databases and data warehouses are still present in the enterprise data management infrastructures. This is because such databases are superior when it comes to working with structured data in transactional applications. However, data modernization infrastructures provide the means for developing novel Big Data applications that enable new opportunities for improving business processes and generating new revenue streams.
The Role of the Cloud
The cloud is nowadays the enabling infrastructure for data modernization, as it offers a number of compelling features such as:
- Capacity: Cloud computing infrastructures enable organizations to access any amount of resources (e.g., computing cycles) needed by Big Data applications.
- Elasticity: Cloud infrastructures provide the means for automatically provisioning more or less resources as required by the application. There is no longer a need for processing manually placed requests for provisioning or deprovisioning resources.
- Pay as you Go: With the cloud companies can pay for the amount of the resources they use i.e. the resources needed for storing and processing Big Data. This provides flexibility and obviates the need for expensive capital expenditure.
- Advanced Tools as a Service: Nowadays cloud services providers offer data modernization tools (e.g., data management and machine learning tools) as a service. This facilitates the deployment of data modernization applications as it allows companies to access the tools they need without disposing with conventional software licensing schemes.
- Data Freshness and Automated pipelines: Cloud infrastructures provide the means for implementing high performance, automated data pipelines. In particular, they enable enterprises to consolidate data sources and their processing in a single cloud repository, while at the same time easing the process of implementing analytics workflows on them. This is essential for many Big Data applications where data are refreshed at very short timescales and their processing should be performed instantly.
Based on the above features, the cloud is an essential infrastructure for data modernization. This does not mean that it is not possible to implement data modernization as part of an on-premise data center infrastructures. There are many companies that dispose with their own private cloud for their Big Data applications. In several cases they have good reasons for doing so, such as the need to alleviate trust concerns, as well as the need to comply with privacy and data protection regulations. Nevertheless, the cloud offers important scalability, automation and cost effectiveness advantages that data modernization experts can hardly ignore.
Four Guidelines for Successful Data Modernization
There is no silver bullet about how enterprises should distribute their budget among cloud and data modernization investments. There are however some best practices for successful data modernization in the cloud:
- The importance of a strategy: In most cases it’s probably best to avoid a quick-and-dirty migration of data modernization functions to the cloud. Rather companies should better invest in creating a strategy for data modernization, while also reengineering processes prior to deploying them to the cloud. Such a strategy should aim at avoiding transferring ill processes to the cloud.
- Business Objectives First: Sometimes companies get fascinated by the opportunities of automating data processing workflows and deploying them at scale in the cloud. However, they should also have in mind that cloud costs can be considerable. Hence, it’s important to deploy only what needed to meet the business objectives of the data modernization projects at hand. This asks for the active engagement of key business stakeholders when taking cloud and data modernization decisions.
- Going beyond business intelligence: The goal of any data modernization project is to provide insights and reports that traditional datastores cannot offer. Thus, it’s important to invest on machine learning techniques in order to derive value from the large volumes of unstructured data. The knowledge that can be extracted from such data can be combined with traditional reports in order to provide enterprises with advanced predictive and prescriptive capabilities that are not possible based on traditional databases and analytics.
- Invest in education, training and change management: Data modernization projects in the cloud won’t succeed unless stakeholders are properly trained to create, use and fully leverage advanced data analytics Likewise, it’s important to pay attention in change management, as data modernization is likely to impact processes and employees, including their roles, responsibilities and day to day activities.
Overall data modernization is a core element of digital transformation, while the cloud is both a means and a consequence for data modernization. The two trends go hand-in-hand and it seems that more data modernization gives rise to increased cloud spending and vice versa. Enterprises had better consider some best practices for planning their investments in cloud and data modernization. Best practices like the above-listed ones can help them in maximizing the value for money for their data modernization projects and their deployment in the cloud.