Running AI models in the cloud-platforms has been the norm in recent years. Nevertheless, with the recent rise of edge computing, it is increasingly being asked whether running AI in then cloud is necessary all the time. Indeed, it is not always required to run artificial intelligence (AI) in the cloud. Rather there are many cases where machine learning algorithms had better by deployed and executed within edge devices. This has given rise to the concept of Edge AI, a novel AI paradigm where some computations are performed at the “edge” or boundary-line of a network rather than moving all the data to a centralized cloud location. Edge AI is one of the most popular buzzwords right now: It promises to make AI accessible to more enterprises, while improving the energy consumption, the privacy, and the cybersecurity of AI solutions.
The rise of Edge AI is propelled by advancements in neural network architectures, which facilitate the offloading of certain computations to edge devices. Leveraging such architectures edge AI platforms can execute a variety of machine learning and deep learning algorithms within different types of edge devices, which range from edge clusters to embedded devices and microcontrollers. In the scope of the Edge AI paradigm, it is possible to combine local intelligence with cloud capacity. For instance, there are neural network-based inference engines that can be distributed across cloud and edge infrastructures.
Introducing Neural Architectures and Edge AI Configurations
There are a variety of neural network architectures that can be used for Edge AI applications. These architectures can address different functional and non-functional requirements of AI applications. Some common types of neural network architectures for Edge AI include convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and gated recurrent unit (GRU) networks. These types of neural networks are well-suited to Edge AI applications because they can process data in real-time and can be implemented on devices with limited computational resources.
Nowadays, there are also frameworks (e.g., TensorFlow Lite) that enable the deployment of neural networks on devices with very limited memory and computing resources. In general, when designing a neural network architecture for Edge AI, it is important to consider the size and complexity of the network, the available computational resources, and the specific requirements of the application. It may also be necessary to produce designs that trades off accuracy for efficiency in order to meet the real-time processing requirements of the system.
EdgeAI applications can be deployed in a variety of configurations across the cloud/edge computing continuum. For instance, machine learning algorithms that must be trained on many data points should be run on the cloud because they require significant computational power. Some image analysis use cases (e.g., visual scene analysis and face recognition use cases) make use of such cloud-based algorithms, since they require training on millions of images in order to achieve high accuracy levels. Other typical examples include recommendation systems and natural language processing systems (NLP). Nevertheless, there are also NLP systems that require much less training data points (e.g., sentiment analysis). These systems can be run directly on devices without any preprocessing or data cleaning requirements.
On the other hand, there are many use cases where an edge computing deployment is preferred over a cloud deployment. This is the case when one or more of the following requirements are prioritized:
- Low-Latency: Edge deployments provide lower latency than cloud deployments, as there is no need to transfer large amounts of data over wide area networks i.e., from the edge to the cloud. This makes them more suitable for real-time use cases such as real-time defect detection in industrial quality applications.
- Energy efficiency: When computation occurs at the edge, energy is saved by reducing the amount of traffic that travels over networked connections to the cloud.
- Stronger protection of sensitive data: In the scope of AI deployments at the edge, sensitive information is processed locally and never travels to another location. In this way there is no risk of an unencrypted file being intercepted by a hacker at an intermediate location. Thus, edge AI boosts data protection and helps meeting regulatory requirements such as compliance to the European General Data Protection Regulation (GDPR).
As already outlined, there is usually a need to combine cloud services with edge services. For example, if you have a service that predicts when your car engine needs maintenance, it makes sense to do the training locally and then send the results back to the cloud for analysis. This is because training complex models often requires a lot of data, which can’t be stored on the device. Moreover, a trained model needs to be stored in the cloud where it can be accessed by other applications. Training deep neural networks is one of the most energy intensive operations in AI. It’s also one of the most computationally expensive ones because you need to compute large matrices whose size depends on how many parameters are used. This means that training a deep neural network takes a lot of time and resources, especially if you have millions of records that need to be processed. Nevertheless, once a model is trained, it turns out that it can be executed on an edge device such as an embedded system. In this direction, the size of the model should sometimes be shrunk in order to fit within the device.
The Trade Offs of Neural Network Architectures at the Edge
There are many factors that should be considered when designing a neural network for an edge AI application. This is because most edge AI applications have different constraints compared to traditional datacenter applications. These requirements include limited resources (e.g., limited memory and compute power) and bandwidth constraints. Moreover, latency requirements, security requirements, power efficiency requirements, as well as the volumes of training data must be considered as well. In most cases it is almost impossible to satisfy all the different requirements at the same time, as some of them tend to be conflicting. Therefore, relevant trade-offs must be considered and resolved. For instance, when there is a need to train and execute an AI model on very large volume of data, the model should be deployed in the cloud to leverage the capacity, scalability, and quality of service of cloud computing infrastructures. Big Data use cases can hardly be supported at the edge, as edge devices fall short in collecting, storing, and processing the required data points. On the other hand, real-time use cases require neural architectures that can be executed within the device. This is a key to avoiding time-consuming operations associated with moving data back and forth between edge and cloud. There are also many use cases that require splitting the neural architecture across cloud data centers, edge clusters and edge devices. Such use cases include for example near real-time use cases that require accurate analytics. The latter include training models in the cloud and deploying them at the edge.
In summary, the edge deployment of neural network systems creates unique optimization challenges. Therefore, designing neural networks that can be split across different systems of the edge/cloud continuum is preferable to monolithic networks where each layer cannot be isolated and optimized. This allows simulating the performance of AI models towards optimization and accordingly deploying the optimized models across different cloud and edge layers.
As the Edge AI ecosystem builds up, it is critical to understand and manage the tradeoffs of various types of neural networks. There are deep neural networks that are very efficient in supporting inferences at the edge. With proper network configuration, a near identical performance can be achieved on both cloud and edge. This opens an avenue for new ways of thinking about how Edge AI applications may be built when data-transfer bandwidth is costly and constrained.
Overall, optimizing neural network architectures for edge deployment is a new and growing field, yet many companies are still stumped by the optimization decisions that need to be made. As the edge environment becomes more powerful, we will likely see more networks designed specifically for this environment. Until edge AI matures, it’s smart to design your network with multiple options in mind. This will provide you with opportunities to exploit the deployment configuration that makes the most sense at the time.