Artificial Intelligence (AI), BigData and Cloud/Edge computing are three of the most important tech trends of our era. For over a data, the unprecedented data generation has given rise to the development and deployment of a host of AI applications, notably Machine Learning (ML) applications. This explosion in the use of AI/ML applications has been enabled by the developments in the Cloud computing space, Big Data were collected, store and manage in the cloud in order to benefit from the scalability and capacity of cloud computing. More recently, we are witnessing a shift of data from cloud infrastructures to edge nodes. This is driven by the need to reduce latency in data analytics tasks, while at the same limiting the amount of data that travels from the data sources to be privacy towards increasing privacy and power efficiency. Thus, the execution of several ML tasks is also shifted towards the edge, which has given rise to the emergence of the Edge AI paradigm.
In Edge AI data are provided and processed locally, without the need for an internet connection and communication with the cloud-based main application. Provided that there is enough computational power available on the Edge, there are some clear benefits stemming from this approach. Next to the shorter processing and lag time, operations can be completed over smaller bandwidth. This can reduce significantly the costs associated with the internet provider’s services and data storage. Furthermore, Edge AI reduces the security attack surface of AI applications and alleviates safety concerns regarding the remote storage and data transfer to the Cloud.
Introducing Federated Machine Learning (FML)
One of the most prominent approaches to Edge AI and machine learning applications at the edge, is the Federated Machine Learning (FML) approach. FML enables the development of a global model without the need for data sharing with a cloud infrastructure. Specifically, because FML trains ML algorithms on the Edge based on local data and accordingly combines the parameters of local models (e.g., neural networks weights) into a global model. The latter turns out to be more accurate and efficient that any of the local models. The different phases of federated learning involve local data at edge nodes and central server(s) within the cloud infrastructures. At a high level they operate based on the following four steps:
- Step 1: The central server picks a new training model.
- Step 2: It pushes the training model to several distributed edge nodes.
- Step 3: The distributed nodes train the model using local data, without ever sharing them.
- Step 4: The central server receives model results from each node and collectively generates a global model.
Federated Machine Learning be used in Digital Finance Applications of the INFINITECH project
The very nature of the FML framework and its deprivation of data transfer from the Edge to the Central location makes it ideal for high-security and privacy-sensitive financial applications, notably applications where different parties are reluctant to share data. This is for example the case in many fraud detection, anti-money laundering and cybersecurity risk assessment applications. All these applications can greatly benefit from data aggregation across financial organizations. Nevertheless, banks and financial institutions are usually reluctant (or even not allowed) to share some data with other organizations in their business network. This is a lost opportunity for creating robust and highly effective ML models for the above listed tasks (e.g., fraud detection, cybersecurity risk assessment).
In the scope of the INFINITECH H2020 project, FBK and IBM have closely collaborated in the development of an FML approach for a credit card fraud application. Specifically, FBK developed an FML algorithm for credit card fraud and run it on the (“Edge”) nodes of different organizations, each one using their own local data. For higher precision, it forced the sharing of intermediate learning results with each other. On top of this FML solution, IBM provided a blockchain-based secure execution framework for the distributed fraud algorithm execution. The blockchain solution provide safe and tampered proof recording of the metadata and the shared intermediate results on a verifiable, immutable ledger. The computation results in a completed model which includes searchable information and is further stored in a Blockchain Data marketplace. This is a foundation for trading with external organizations i.e., giving organizations the opportunity to locate and use the data assets of their interest.
The following diagram illustrates different execution stages of the FBK-IBM federated machine learning algorithm for fraud detection:
Diagram 2: The FBK-IBM fraud detection FML execution flow
- Stage 1: The Federated Learning algorithm creates the algorithm image.
- Stage 2: The algorithm image is pushed to a Docker Image Depository.
- Stage 3: The image metadata are being published.
- Stage 4: A script is run to initiate the learning process, while image metadata record is read from the ledger.
- Stage 5: The Docker image, which is based on the image metadata, is pulled from the image repository.
- Stage 6: The distributed learning process is initiated by pulling the appropriate configuration information of the participating organizations and the respective computation nodes, thus creating the learning process record on the chain
- Stage 7: The learning process Orchestrator script runs the first iteration of the computation on all nodes.
- Stage 8: The Orchestrator script creates a Docker container from the Federated Learning image for each node running the computation. It then passes the private key as initialization to this container, which is executed on the specific organizational node, using the local organization’s data as input.
- Stage 9: The results of the computation are signed with the container’s private key and are used to update the execution record created for the container. The chain code that writes the result on the ledger checks that the signed result matches the public key stored in the execution record, verifying that the result is reported by the correct execution entity.
- Stage 10: The previous stages 7-9 are executed repeatedly in a loop, until each node finishes its calculation. Then, the global model is published and stored in the Blockchain Data marketplace.
- Stage 11: External organizations can search and find the specific asset.
- Stage 12: If interested, they can purchase the right to access the asset in the Blockchain Data marketplace.
- Stage 13: In this case, the tokens balance is checked, necessary tokens are being transferred and access to asset is granted.
For more information insights on the technical aspects and results of the FDK-IBM FML approach, you can:
- Visit the Infinitech Marketplace Asset’s page regarding “Blockchain (BC) based secure execution environment and data marketplace for federated learning”.
- Watch the demo video on “IBM & FBK BC based federated learning environment and data marketplace”.