Blog | Data Analytics

Federated Machine Learning: Enabling Collaborative learning

Federated Machine Learning: Enabling Collaborative learning
share on
by Sanjeev Kapoor 04 Nov 2022

Machine learning systems are one of the most powerful tools for extracting insights from large datasets. These systems are usually built on top of a centralized model that can be accessed by different users and applications. In most cases the development of high-quality machine learning systems requires access to large volumes of data, which are usually collected by a single organization and are integrated within a centralized database. Nevertheless, it is also possible to collect and aggregate large amounts of data from multiple distributed systems based on the collaboration of different organizations.

Unfortunately, most enterprises do not adequately consider this opportunity as they tend to be reluctant to share data with other organizations of their industry.  This is because they have considerable data privacy and data protectionconcerns. Nowadays, it is however possible to alleviate such concerns based on Federated Machine Learning (FML) systems that preserve user privacy and boost data protection. Specifically, FML provides a framework that allows companies to easily assemble and share powerful machine learning models while preserving the privacy of their data sources. In this way, FML enables the creation of more accurate and more valuable machine learning models (e.g., models that contain less noise or irrelevant data), yet they are created based on a decentralized approach that does not threaten the privacy of individual data providers. FML obviates the need for collecting and integrating data from different sources in a centralized data repository. Rather, companies share the burden of creating a large volume of high quality data for machine learning, which lowers the data protection barriers to their collaboration.

 

Federated Learning Explained

FML is a form of distributed machine learning in which many users from different organizations provide small amounts of data that are used for collaborative learning tasks. FML trains models at the edge of a network or locally on devices themselves, instead of relying on centralized server farms to perform these tasks. Hence FML produces many “local models” (i.e., models trained with local data). These models are accordingly combined to more accurate global models. The latter combination is based on the aggregation of their local counterparts rather than on their training based on aggregated data. For instance, an FML infrastructure for deep learning and artificial intelligence will produce the weights of a centralized (“global”) deep neural network based on the averaging of the values of the parameters of locally trained neural networks. It turns out that the global model is much more accurate than any of the locally trained models.

Data Analytics or something else.
Let's help you with your IT project.

In practice, the FML process consists of two parts, namely model training and model deployment. Model training refers to the process where each individual user trains their own (“local”) model using their own data. Likewise, model deployment refers to the sharing of these trained models across users in order to create a collective intelligence system. This collective intelligence system enables everyone to benefit from each other’s models without sharing any private information with each other.

FML is positively incentivizing data providers to share their data, given that it enables them to build and use improved services without any essential privacy and data protection risks. This paradigm is particularly useful when there is no trust between parties, or when users don’t want their data to leave their devices or networks.

Overall, the advantages of federated learning models when compared to conventional centralized approaches can be summarized as follows:

  • Scalability and Flexibility: They allow for flexible and scalable training on any number of devices, even if there are thousands or millions of them.
  • Reduced Communication Overhead: They reduce communication overhead because only relevant information is exchanged between devices during training. Moreover, they do not require high bandwidth communication channels, because only updates are sent over them i.e., there are no full model updates.
  • Privacy Protection: They offer privacy protection as they prevent others from seeing what has been learned by individual device or even which parameters have been updated.

Nowadays, the implementation of FML infrastructures takes advantage of state of the art privacy preserving techniques such as multi-party computation (MPC) methods and algorithms. The latter provide an efficient way to share computation across multiple parties and to preserve privacy. Specifically, they facilitate analytics and queries over decentralized data in ways that expose the analytics results without disclosing the source data. MPC techniques are fully in-line with the FML concept and objectives. This is the reason why they are very commonly used in conjunction with FML towards secure and privacy friendly collective intelligence.

 

FML Applications

The FML paradigm enables a wide range of value-added applications that rely on the sharing of data across organizations. Two of the most characteristics examples are the sharing of data for accurate fraud detection in credit cards, as well as data sharing for improved clinical decisions.

 

Credit Card Fraud

FML enables banks to share data in order to improve their credit card fraud detection capabilities. The idea is that two or more banks can use FML to collaborate in the development of an accurate model for predicting credit card fraud based on the traits, behavior and transactions of their customers. In this case, each bank trains a local model for fraud detection based on their own customer data. Accordingly, the FML infrastructure allows the combination of the local models to a more accurate global model that can shared across all banks that contributed to its development. In this case, the banks end-up having a credible model without a need to share data about their customers.

 

Disease Prognosis

Modern machine learning tools make it possible to develop powerful predictive analytics tools that help healthcare organizations identify trends in illnesses and other health related events. However, these tools usually require large amounts of high-quality data that can take years to collect on their own. One way that companies can solve this problem is by building partnerships with other companies (e.g., other healthcare organizations) who share similar interests or goals. This allows them to pool their resources together and use them to build better predictive models faster than they could do alone.

The problem with this approach is that it puts sensitive personal information at risk if one organization leaks it or uses it inappropriately. The only way for this type of collaboration to work is if both parties agree to keep each other’s data private while still being able to use the results that come out of the model’s predictions. Federated machine learning offers a solution in this direction. It allows multiple organizations to collaborate on building predictive models without having access to each other’s private data sets. For example, if two hospitals are connected through federated machine learning, they can share information about patients without compromising their privacy. This allows them to build a more accurate disease prediction model for their community as a whole.

 

Overall, modern organizations must deal with increasing amounts of data, which creates a clear need for new data sharing strategies. Current centralized models are unsustainable in this direction due to their inattention to privacy and data protection. Therefore, the industry is currently considering decentralized, privacy-friendly approaches to data sharing. Federated Machine Learning is one of the most promising approaches to privacy friendly collaborative learning and collective intelligence. We expect to see a proliferation of FML applications in the years to come.

Leave a comment

Recent Posts

get in touch

We're here to help!

Terms of use
Privacy Policy
Cookie Policy
Site Map
2020 IT Exchange, Inc