skip to main content
research-article
Open Access

Federated Multi-view Learning for Private Medical Data Integration and Analysis

Published:28 June 2022Publication History

Skip Abstract Section

Abstract

Along with the rapid expansion of information technology and digitalization of health data, there is an increasing concern on maintaining data privacy while garnering the benefits in the medical field. Two critical challenges are identified: First, medical data is naturally distributed across multiple local sites, making it difficult to collectively train machine learning models without data leakage. Second, in medical applications, data are often collected from different sources and views, resulting in heterogeneity and complexity that requires reconciliation. In this article, we present a generic Federated Multi-view Learning (FedMV) framework for multi-view data leakage prevention. Specifically, we apply this framework to two types of problems based on local data availability: Vertical Federated Multi-view Learning (V-FedMV) and Horizontal Federated Multi-view Learning (H-FedMV). We experimented with real-world keyboard data collected from BiAffect study. Our results demonstrated that the proposed approach can make full use of multi-view data in a privacy-preserving way, and both V-FedMV and H-FedMV perform better than their single-view and pairwise counterparts. Besides, the framework can be easily adapted to deal with multi-view sequential data. We have developed a sequential model (S-FedMV) that takes sequence of multi-view data as input and demonstrated it experimentally. To the best of our knowledge, this framework is the first to consider both vertical and horizontal diversification in the multi-view setting, as well as their sequential federated learning.

Skip 1INTRODUCTION Section

1 INTRODUCTION

With the recent advance of technology, medical treatment has gradually become digitized, and a large and heterogeneous amount of medical data (e.g., electronic health records (EHRs) [22], claims, laboratory tests, imaging, and genomics) has been accumulated in medical institutions. The availability of these data offers many potential benefits for healthcare. It not only facilitates sharing information in care-related activities but also reduces medical errors and service time. Moreover, data sharing in the multi-institutional and large-scale collaboration context can effectively improve the generalizability of research, accelerate progress, enhance collaborations among institutions, and lead to new discoveries from data pooled from multiple sources [42]. For these reasons, substantial efforts have been made in the past to improve the medical information systems enabling the collection and processing of huge amounts of data to work toward a multi-institutional collaborative setting. For example, District Health Information Software 2 (DHIS2) [13] is an open-source, web-based Health Management Information System (HMIS) widely used by countries for national-level aggregation of medical data. With the broad adoption of HMISs, security and privacy become critical [47], as medical data are highly sensitive to the patients.

In the past two decades, great efforts have been made to facilitate the availability and integrity of medical data while protecting confidentiality and privacy in medical information systems [2, 3]. Most of the research is centered around data de-identification [16, 17, 50, 53] and data anonymization [14, 39, 43, 55], which removes the identifiable information from the published medical data to prevent an adversary from reasoning about the patients’ privacy. However, as pointed out in Reference [38], published medical data is not the only source that the adversaries can count on: with a large amount of information that people voluntarily share on the Web, sophisticated attacks that join disparate information pieces from multiple sources against medical data privacy become practical. As a result, many medical institutions are unwilling to share their data, as sharing may cause sensitive information to be leaked to researchers, other institutions, and unauthorized users. Even in multi-institutional collaborative research, they are expected to perform integrated analysis without leaving their medical data outside institutions. Therefore, security and privacy become an obstacle and challenge for data integration in medical research, especially when the data are required to be shared for secondary use.

However, machine learning methods [1, 30] often require a huge amount of data for better performance, and the current circumstances put deploying or developing medical AI applications in an extremely difficult situation. To address the data limitation and isolation issues, great progress has been made in the development of secure machine learning frameworks in recent years. A popular approach is the use of federated learning (FL) to support collaborative and distributed learning processes, which enables a large number of resource-limited client nodes to cooperatively train a model without data sharing [44]. For example, FL can be used to process EHR data distributed in multiple hospitals by sharing the local model instead of the patients’ data to prevent the raw data leakage [6, 7, 27, 37].

In the literature, many studies have explored the variants of FL schemes to support complicated tasks in real life. For example, Yang et al. [58] introduced a comprehensive federated learning framework that contains vertical federated learning (VFL) and horizontal federated learning (HFL) based on different local data availability. Specifically, VFL deals with the case that the datasets share the same sample ID space but their feature spaces are different, while HFL deals with the case that the datasets share the same feature space but are different in sample ID spaces. However, the previous works mainly focus on the difficulties in the configuration of FL because of the data distributions but less consider the data complexity. In medical applications, many datasets are complex, heterogeneous and often collected from different instruments and/or measures, known as “multi-view” data [56]. For example, EHRs contain different types of patient-level variables, such as demographics, diagnoses, problem lists, medications, vital signs, and laboratory data. The mobile keyboard data are composed of various sensor data and keystroke records. It has been shown that multi-view learning can get better performance than the single view counterpart, especially when the strengths of one view complement the weaknesses of the other. Nevertheless, federated learning from multi-view data is still in its infancy, as most current solutions can only handle the single-view data. Moreover, existing works are only designed for a static setting without considering the temporal information of the data (e.g., EHRs).

Motivated by the aforementioned problems, this work focuses on multi-view learning tasks and associated problems of federated learning. Specifically, we first present a basic multi-view learning method to fuse multi-view data. Then, based on two types of local data availability, i.e., horizontal and vertical, we adapt the basic multi-view learning method to develop two federated multi-view algorithms: V-FedMV and H-FedMV. Both V-FedMV and H-FedMV approaches can perform the same tasks, but for different multi-view data distributions. In the V-FedMV approach, each client has the same participants, but has only a single-view data for each participant. In the H-FedMV approach, every client has multi-view data equal to a subset of the overall data, but each of them has very limited participants. Third, we investigate how multi-view sequential data can be arranged in the setting of federated learning and present a sequential modelling strategy (S-FedMV) to consider the temporal information of the multi-view data. These methods together provide flexible and effective tools to support multi-institutional collaborations for multi-view data mining while solving the privacy and security challenges for data sharing in medical research.

Our contributions can be summarized as follows:

  • To the best of our knowledge, we are the first group that aims to propose a systematic solution for federated multi-view learning.

  • We have considered two types of distributed multi-view data and proposed V-FedMV and H-FedMV to deal with these two cases separately.

  • This is the first work to consider multi-view sequential data in the federated learning setting, and S-FedMV method is developed in this regard.

  • Based on the experimental results, all three proposed methods (V-FedMV, H-FedMV, and S-FedMV) can make full use of multi-view data and are more effective than local training.

The rest of this article is organized as follows. Section 2 describes the problem definition. Section 3 introduces the multi-view learning method and optimization. The V-FedMV and H-FedMV methods are presented in Sections 4 and 5, respectively. Then in Section 6, we show how to adapt the model to deal with sequential data. In Section 7, we briefly discuss the privacy issue. The dataset, experiments and results are presented in Section 8. Section 9 discusses the related work, followed by the conclusion in Section 10.

Skip 2PROBLEM DEFINITION Section

2 PROBLEM DEFINITION

In this section, we state the problem of federated multi-view learning task. Without loss of generality, we consider the classification problem. Assume that we have a distributed multi-view dataset available with N instances from K views: let \( \mathbf {X}_k\in \mathbb {R}^{N\times d_k} \) denote the data matrix in the kth view, where the ith row \( \mathbf {x}_{ik} \in \mathbb {R}^{d_{k}} \) is the feature descriptor of the ith instance in the kth view. Suppose these N instances are sampled from C classes and denote \( \mathbf {Y}= [\mathbf {y}_1, \ldots , \mathbf {y}_N]^T \in \lbrace 0, 1\rbrace ^{N \times C} \), where \( \mathbf {y}_i \in \lbrace 0, 1\rbrace ^{1 \times C} \) is the one-hot label indicator vector of the ith instance.

The key challenge to the adoption of distributed data is the security and privacy of highly sensitive data. For example, in many situations, medical data cannot leave the institutions, which limits the usefulness of these data to perform analytics in the data aggregation process. To solve this issue, federated learning has emerged as a promising technique for distributing machine learning (ML) model training, which performs local model training, and upload model parameters for global aggregation; thus, it enables collaborative model training while preserving each participant’s privacy, which is particularly beneficial to the medical field.

Based on the local data availability, we suppose that there are two types of federated multi-view learning: horizontal and vertical. Figure 1 shows an example of horizontal and vertical distributions of data. Assume that there are three hospitals, and they want to build a predictive model to diagnose a specific disease in a patient. In some cases, each of hospitals has only a single view-point for each patient (as illustrated in Figure 1(a)), and they are not willing to share the data due to confidentiality of the data. As a result, we cannot directly combine the data from different views together. In some other cases, each hospital has multiple devices and can collect data from different views by their own (as illustrated in Figure 1(b)). However, each of them has only a limited number of patients, and could not be used to develop robust and accurate models for real-world application. Our goal is to leverage all available data without sharing data between institutions for model learning in these respective scenarios, by distributing the model-training to the data-owners and aggregating their results. The problems are formally defined as follows:

Fig. 1.

Fig. 1. Two types of federated learning in multi-view scenarios.

  • Vertical Federated Multi-view Learning (V-FedMV): This scenario assumes that K clients are sharing the same ID space, but each of them has different single-view data. The data \( \mathbf {X}=\lbrace \mathbf {X}_1,\mathbf {X}_2,\cdots ,\mathbf {X}_K\rbrace \) is distributed over K clients, which is denoted by \( \lbrace L_1,L_2,\ldots ,L_K\rbrace \); in other words, each client \( L_k \) owns a specific single-view data \( \mathbf {X}_k \). Sharing the same ID space means the ith rows across different \( \mathbf {X}_k \) represent the same instance. Our goal is to generate more accurate and robust classification results, by collectively training multiple single-view models through federated learning, to exploit vertical-level view information across multiple decentralized sites holding local private data samples without exchanging.

  • Horizontal Federated Multi-view Learning (H-FedMV): This scenario assumes that there are M clients, each of them owns k-view data but shares different ID spaces, denoted by \( \lbrace L_l\rbrace ^M_{l=1} \). Let \( \mathbf {X}_k^l \in \mathbb {R}^{N_l \times d_k} \) denote the kth single-view data on the lth client, where \( N_l \) represents the number of samples on lth client. Then the data owned by \( L_l \) is defined as \( \mathbf {X}^l=\lbrace \mathbf {X}_1^l,\cdots ,\mathbf {X}^l_K\rbrace \) and can be seen as a subset of the overall data. Our goal is to incorporate all samples in the model training to make the result more promising, by collaboratively training multiple multi-view models with the benefit of the federated learning framework.

Table 1 provides the list of important notations and their definitions that are used in this study.

Table 1.
NotationDefinition
\( \mathbf {X} \); \( \mathbf {X}^{test} \); \( \mathbf {Y} \)The training data matrix; The testing data matrix; The one-hot label matrix of \( \mathbf {X} \)
\( \mathbf {W} \)The transformation matrix
\( \mathbf {Z} \); \( \mathbf {Z}^{test} \)The pseudo-label matrix in training phase; The pseudo-label matrix in testing phase
\( \mathbf {w}_t^k \), \( \mathbf {w}_t^{k,l} \)The parameters of kth view model on server and on the lth client in the t round
kThe client index in V-FedMV or the view index
KThe maximum number of clients in V-FedMV or the maximum number of views
lThe client index in H-FedMV
MThe maximum number of clients in H-FedMV
LA client
DA sequential dataset
\( \beta \), \( \zeta \), \( \eta \)The hyperparameters

Table 1. Glossary of Important Notations and Definitions

Skip 3MULTI-VIEW LEARNING Section

3 MULTI-VIEW LEARNING

Multi-view learning, also known as data fusion, aims to integrate multiple data information from different views to improve the performance of learning tasks [23, 60]. To deal with the federated multi-view learning problems in the vertical and horizontal situations, we start with a basic multi-view learning method to fuse multi-view data in this section, and then show how the formulas are modified to accommodate vertical and horizontal cases in the following two sections. Our approach is inspired by Reference [18], while the work proposed in Reference [18] aimed at learning multi-class label-sharing model in a privacy-preserving way, and the goal of this study is to process the distributed and heterogeneous multi-view data in a privacy-preserving manner.

3.1 Objective Function

Let us define \( \mathbf {X}=\lbrace \mathbf {X}_1,\mathbf {X}_2,\ldots ,\mathbf {X}_K\rbrace \) as the multi-view data matrix with K views, where \( \mathbf {X}_k\in \mathbb {R}^{N\times d_k} \) is the kth view data, \( k=1,\ldots ,K \). The objective function of multi-view learning can be formally written as follows: (1) \( \begin{align} \underset{\mathbf {W}_k,\,\mathbf {Z}_k,\, \mathbf {Z}}{min}\sum _{k=1}^K\Vert \mathbf {X}_k\mathbf {W}_k-\mathbf {Z}_k\Vert _F^2 +\beta _k\Vert \mathbf {W}_k\Vert _{2,1}+\zeta _k\Vert \mathbf {Z}_k-\mathbf {Z}\Vert _F^2+\eta \Vert \mathbf {Z}-\mathbf {Y}\Vert _F^2, \end{align} \) where \( \mathbf {W}_k\in \mathbb {R}^{d_k\times C} \) is the transformation matrix, \( \mathbf {Z}_k\in \mathbb {R}^{N\times C} \) serves as a pseudo-label matrix for \( \mathbf {X}_k \), \( \mathbf {Z}\in \mathbb {R}^{N\times C} \) is a common matrix for all views, and \( \Vert \cdot \Vert _{2,1} \) denotes the \( \mathcal {l}_{2,1} \) norm of a matrix, which is defined as the sum of the \( \mathcal {l}_{2} \) norms of all row vectors of the matrix and can promote row-sparsity for feature selection [45]. Note that the first term projects multi-view data to a new space by using transformation matrix \( \mathbf {W}_k \). The second term is used as a regularization function. The third and fourth term determines how \( \mathbf {Z}_k \) is similar to \( \mathbf {Z} \) and how \( \mathbf {Z} \) is similar to \( \mathbf {Y} \), respectively.

Our aim is to get \( \mathbf {W}_k \) from Equation (1), \( k=1,\ldots ,K \), so when new testing data \( \mathbf {X}^{test} = \lbrace \mathbf {X}_1^{test}, \ldots , \mathbf {X}_K^{test}\rbrace \) comes, we can use \( \mathbf {W}_k \) to transform the data matrix \( \mathbf {X}_k^{test} \) to a label matrix. The objective function in the testing phase can be written as follows: (2) \( \begin{equation} \underset{\mathbf {Z}^{test}_k,\,\mathbf {Z}^{test}}{min}\sum _{k=1}^K\left\Vert \mathbf {X}^{test}_k\mathbf {W}_k-\mathbf {Z}^{test}_k\right\Vert _F^2 +\zeta _k\left\Vert \mathbf {Z}^{test}_k-\mathbf {Z}^{test}\right\Vert _F^2. \end{equation} \)

Finally, \( \mathbf {Z}^{test} \) will serve as a label matrix to facilitate the federated multi-view learning.

3.2 Optimization

We illustrate how to solve Equation (1) in the training phase and how to solve Equation (2) in the testing phase, respectively. Since the objective functions are non-convex and potentially non-smooth, we iteratively update the parameters one by one while fixing the other parameters [18]. The overall algorithm is summarized in Algorithm 1.

3.2.1 Training Phase.

In the training phase, all parameters are iteratively updated according to the following three steps until Equation (1) converges or reaches a maximum number of iterations.

Step 1: update \( \mathbf {W}_k \). When \( \mathbf {Z}_k \) and \( \mathbf {Z} \) are fixed, the optimization problem of minimizing Equation (1) over \( \mathbf {W}_k \) can be written as (3) \( \begin{equation} \underset{\mathbf {W}_k}{min}\sum _{k=1}^K\Vert \mathbf {X}_k\mathbf {W}_k-\mathbf {Z}_k\Vert _F^2 +\beta _k\Vert \mathbf {W}_k\Vert _{2,1}. \end{equation} \)

Following Reference [26], Equation (3) can be written as (4) \( \begin{equation} \underset{\mathbf {W}_k,\,\mathbf {A}_k}{min}\Vert \mathbf {X}_k\mathbf {W}_k-\mathbf {Z}_k\Vert _F^2 +\beta _k\mathrm{Tr}\left(\mathbf {W}_k^\mathrm{T}\mathbf {A}_k\mathbf {W}_k\right)\!, \end{equation} \) where \( \mathbf {A}_k\in \mathbb {R}^{d_k\times d_k} \) is a diagonal matrix with diagonal elements and can be written as (5) \( \begin{equation} \mathbf {A}_k^{(i,j)} = \left\lbrace \!\!\! \begin{array}{cc} 1/\left[2\left(\left\Vert \mathbf {W}_k^{(i)}\right\Vert _2+\epsilon \right)\right] &i=j, \\ [3pt] 0 & i\ne j. \end{array} \right. \end{equation} \)

Here, \( \Vert \mathbf {W}_k^{(i)}\Vert _2 \) denotes the \( \mathcal {l}_{2} \) norm of \( \mathbf {W}_k^{(i)} \) that is the vector at the ith row of \( \mathbf {W}_k \), and \( \epsilon \) denotes a small constant to avoid overflow.

Then, when \( \mathbf {A}_k \) is fixed, \( \mathbf {W}_k \) can be updated by (6) \( \begin{equation} \mathbf {W}_k = \left(\mathbf {X}_k^\mathrm{T}\mathbf {X}_k + \beta _k\mathbf {A}_k\right)^{-1}\mathbf {X}_k^\mathrm{T}\mathbf {Z}_k. \end{equation} \)

Specifically, when \( \mathbf {Z}_k \) and \( \mathbf {Z} \) are fixed, we can repeat the following operation to get \( \mathbf {W}_k \) until the value of Equation (3) converges: fix \( \mathbf {W}_k \) to update \( \mathbf {A}_k \) through Equation (5) and then fix \( \mathbf {A}_k \) to update \( \mathbf {W}_k \) through Equation (6).

Step 2: update \( \mathbf {Z}_k \). When \( \mathbf {Z} \) and \( \mathbf {W}_k \) are fixed, Equation (1) becomes (7) \( \begin{equation} \underset{\mathbf {Z}_k}{min}\sum _{k=1}^K\Vert \mathbf {X}_k\mathbf {W}_k-\mathbf {Z}_k\Vert _F^2 +\zeta _k\Vert \mathbf {Z}_k-\mathbf {Z}\Vert _F^2. \end{equation} \)

\( \mathbf {Z}_k \) can be updated directly by (8) \( \begin{equation} \mathbf {Z}_k = (\mathbf {X}_k\mathbf {W}_k + \zeta _k\mathbf {Z})/(1+\zeta _k). \end{equation} \)

Step 3: update \( \mathbf {Z} \). When \( \mathbf {Z}_k \) and \( \mathbf {W}_k \) are fixed, Equation (1) becomes (9) \( \begin{equation} \underset{\mathbf {Z}}{min}\sum _{k=1}^K \zeta _k\Vert \mathbf {Z}_k-\mathbf {Z}\Vert _F^2+\eta \Vert \mathbf {Z}-\mathbf {Y}\Vert _F^2. \end{equation} \)

Then \( \mathbf {Z} \) can be updated directly by (10) \( \begin{equation} \mathbf {Z} = \left(\sum _{k=1}^K\zeta _k\mathbf {Z}_k + \eta \mathbf {Y}\right)/\left(\sum _{k=1}^K\zeta _k + \eta \right)\!. \end{equation} \)

3.2.2 Testing Phase.

First, initialize \( \mathbf {Z}^{test}_k \) as \( \mathbf {Z}^{test}_k = \mathbf {X}^{test}_k\mathbf {W}_k \). When \( \mathbf {Z}^{test}_k \) is fixed, Equation (2) can be written as (11) \( \begin{equation} \underset{\mathbf {Z}^{test}}{min}\sum _{k=1}^K\zeta _k\Vert \mathbf {Z}^{test}_k-\mathbf {Z}^{test}\Vert _F^2. \end{equation} \)

Then \( \mathbf {Z}^{test} \) can be updated directly by (12) \( \begin{equation} \mathbf {Z}^{test} = \sum _{k=1}^K\zeta _k\mathbf {Z}^{test}_k/\sum _{k=1}^K\zeta _k. \end{equation} \)

When \( \mathbf {Z}^{test} \) is fixed, then \( \mathbf {Z}^{test}_k \) can be updated directly by (13) \( \begin{equation} \mathbf {Z}^{test}_k = (\mathbf {X}^{test}_k\mathbf {W}_k+\zeta _k\mathbf {Z}^{test})/(1+\zeta _k). \end{equation} \)

Skip 4VERTICAL FEDERATED MULTI-VIEW LEARNING Section

4 VERTICAL FEDERATED MULTI-VIEW LEARNING

In this section, we elaborate on how proper design of the above multi-view learning mechanism can yield guaranteed vertical federated multi-view learning gains, which consequently not only allows us to connect multi-view relations over a global server for remote collaboration but also to set up an environment across multiple decentralized data to distribute the model-training to the data-owners, without sharing data between institutions.

The overall algorithm of V-FedMV is summarized in Algorithm 3. Briefly, the main procedures of V-FedMV can be described as follows: We first establish a server with multiple distributed clients, then the computations involving the original sensitive data will be executed on clients, while some other computations related to insensitive data of all clients will be executed on the server. For example, Equations (5), (6), (8), and (13) can be computed on each client \( L_k \), while Equations (10) and (12) can be computed on the server.

Figure 2 provides an overview of V-FedMV for vertical accumulation of different views of data for multi-view classification, where the cycle at the bottom of the figure shows the workflow of this system. Assume that there are a server and K hospitals with private single-view data on each of them. First, the server sends each hospital the \( \mathbf {Z} \), which would be used to update the local model with local data on each hospital. Then the \( \mathbf {Z}_k \) would be obtained and sent to the server to be aggregated, resulting in a new \( \mathbf {Z} \).

Fig. 2.

Fig. 2. V-FedMV.

Skip 5HORIZONTAL FEDERATED MULTI-VIEW LEARNING Section

5 HORIZONTAL FEDERATED MULTI-VIEW LEARNING

In this section, we illustrate how multi-view learning mechanism can be used for horizontal federated multi-view learning, which consequently allows us to leverage all available multi-view samples from a collection of decentralized local data over a central global server for remote collaboration, but without delivering any raw data to the global server. Figure 3 provides an overview of H-FedMV for horizontal accumulation of samples for multi-view classification. Compared to the vertical case, this system has data available from multiple views. Thus, the hospitals and server exchange the transformation matrices instead of \( \mathbf {Z} \) and \( \mathbf {Z}_k \).

Fig. 3.

Fig. 3. H-FedMV.

Before proceeding, we first redefine the notation, since the setting of H-FedMV is different from V-FedMV. Suppose there are M clients, each of which owns multi-view data but has different patients, denoted by \( \lbrace L_l\rbrace _{l = 1}^M \). Let \( \mathbf {X}^l_k \in \mathbb {R}^{N_l \times d_k} \) denote the kth single-view data on the lth client, where \( N_l \) represents the number of samples on lth client. Then the data owned by \( L_l \) is defined as \( \mathbf {X}^l=\lbrace \mathbf {X}^l_1,\ldots ,\mathbf {X}^l_K\rbrace \). The goal of H-FedMV is to use more samples from different institutes or hospitals without data sharing for training models to increase the spectrum of patients incorporated in learning, thereby giving better predictive performance than the individual local averages and preserving data privacy.

Based on the multi-view learning mechanism, we can easily generalize it to the case of horizontal federated learning, in which we perform the multi-view learning algorithm on each client separately. To indicate the data is owned by the lth client, we add the superscript l to each equation. Accordingly, Equations (5), (6), (8), (10), (12), and (13) can be turned into the following forms: (14) \( \begin{equation} \mathbf {A}_k^{l (i,j)} = \left\lbrace \!\!\! \begin{array}{cc} \frac{1}{2\left(\left\Vert \mathbf {W}_k^{l (i)}\right\Vert _2+\epsilon \right)} &i=j, \\ 0 &i\ne j. \end{array} \right. \end{equation} \) (15) \( \begin{equation} \mathbf {W}_k^l = \left[\left(\mathbf {X}_k^l\right)^\mathrm{T}\mathbf {X}_k^l + \beta _k^l\mathbf {A}_k^{l}\right]^{-1}\left(\mathbf {X}_k^l\right)^\mathrm{T}\mathbf {Z}_k^l, \end{equation} \) (16) \( \begin{equation} \mathbf {Z}_k^l = \left(\mathbf {X}_k^l\mathbf {W}_k^l + \zeta _k^l\mathbf {Z}^l\right)/(1+\zeta _k^l), \end{equation} \) (17) \( \begin{equation} \mathbf {Z}^l = \left(\sum _{k=1}^K\zeta _k^l\mathbf {Z}_k^l + \eta ^l\mathbf {Y}^l\right)/\left(\sum _{k=1}^K\zeta _k^l + \eta ^l\right)\!, \end{equation} \) (18) \( \begin{equation} \mathbf {Z}^{l,test} = \sum _{k=1}^K\zeta _k^l\mathbf {Z}^{l,test}_k/\sum _{k=1}^K\zeta _k^l, \end{equation} \) (19) \( \begin{equation} \mathbf {Z}^{l,test}_k = \left(\mathbf {X}^{l,test}_k\mathbf {W}_k+\zeta _k^l\mathbf {Z}^{l,test}\right)/\left(1+\zeta _k^l\right)\!. \end{equation} \)

The overall algorithm of H-FedMV is summarized in Algorithm 3. Briefly, the main procedures can be described as follows: first, all clients apply the algorithm of the multi-view learning on their own data and devices, then each of them can get \( \mathbf {W}_k^l \), which are sent to the global server to compute the weighted average value. After the computation on the server, it sends the output weighted average transformation matrix \( \mathbf {W}_k \) back to the local client \( L_l \) for distributed model training. This process will be repeated several times until convergence is reached or some stopping criteria are met. At last, the \( \mathbf {W}_k \) of all K views are obtained on the server and then the server will send them to clients, which can be used to predict the testing data of each client.

Skip 6FEDERATED MULTI-VIEW SEQUENTIAL LEARNING Section

6 FEDERATED MULTI-VIEW SEQUENTIAL LEARNING

In medical practice, sequential or longitudinal data is very common, such as clinical research and epidemiological studies. However, V-FedMV and H-FedMV cannot be directly applied to sequential data for considering the temporal information of the multi-view data. We further proposed a federated multi-view sequential learning method to adapt our model to deal with multi-view sequential data in a federated environment.

In the vertical setting, each client contains a single-view data, and can be independently executed (such as using GRU) on their own data to get the feature embedding matrix from sequential data and then apply the proposed V-FedMV method, which will not expose the raw data in the whole process. However, in the horizontal setting, each client owns multi-view data partially and they cannot process their own data independently, because the trained GRU models for each view are inconsistent locally. Hence, how to do federated multi-view learning in the horizontal setting when the raw data is sequential is a problem. Here, we develop a sequential version of H-FedMV to address this problem, named S-FedMV.

Figure 4 provides an overview of S-FedMV for horizontal accumulation of different samples of multi-view sequential data for feature extraction. As shown in Figure 4, there are M hospitals containing K-view sequential data. Since the data of each view are separately processed, it will finally result in K models on the server, which can be used to extract features for H-FedMV. Briefly, the idea is based on the Federated Averaging (FedAvg), following a server-client setup with two repeated stages: (i) the clients train their models locally on their data, and (ii) the server collects and aggregates the models to obtain a global model by weighted averaging. For each client, we assign the aggregation weight as the ratio of data samples on each client to the total number of samples, namely, \( N_l/N \). It is appropriate to process the data of each view separately in the case of distributed horizontal multi-view data, because there exists inconsistency among the models of different views. FedAvg is flexible to the model and the optimizer used for training, here we use the bidirectional GRU as the model and RMSProp as the optimizer in the experiments. For simplicity, in every round we choose all clients to participate in.

Fig. 4.

Fig. 4. Feature extraction process in S-FedMV.

The overall algorithm is summarized in Algorithm 1, and the notations used in the proposed model are as below: As before, we use \( \lbrace L_l\rbrace \) to denote M clients, \( l=1,\ldots ,M \). Let \( \lbrace D^l_k\rbrace \) denote the sequential data on lth client in the kth view, \( \mathbf {w}_t^k \) and \( \mathbf {w}_t^{k,l} \) denote the parameters of kth view model on server and on the lth client in the t round. \( \mathcal {l}^k(\mathbf {w};b) \) denotes the loss of the model of kth view with parameters \( \mathbf {w} \), and b denotes a batch.

Skip 7DISCUSSION Section

7 DISCUSSION

In V-FedMV, both \( \mathbf {W}_k \) and \( \mathbf {Z}_k \) are updated on local devices while \( \mathbf {Z} \) is updated on the server. \( \zeta _k \) and \( \mathbf {Z}_k \) are sent to the server from local devices when the server updates \( \mathbf {Z} \), which would not leak the raw data, because \( \mathbf {X}_k \) is preserved at local devices throughout the process. Besides, in some cases, the server may not be set up by these clients but by another specialized organization that would not provide the raw data but only do the computation, and in this case, although \( \mathbf {Y} \) is also needed when server updates \( \mathbf {Z} \), it will not cause leakage, because \( \mathbf {Y} \) is only a one-hot label matrix that would not be useful without extra information.

In H-FedMV and S-FedMV, each client sends the parameters of its own model to the server in each round, and the server averages it and returns the result. Each client then continues to update locally with the parameters returned by the server. In the whole process, the server only contacts the parameters, not the raw data, and the local devices can only contact the updated parameters from the server, not the raw data from other clients, so this ensures privacy.

Skip 8EXPERIMENTAL EVALUATION Section

8 EXPERIMENTAL EVALUATION

In this section, we evaluate the performance of the proposed V-FedMV, H-FedMV, and S-FedMV on a new multi-view sequential keystroke dataset collected from the BiAffect1 study.

8.1 Data Description

BiAffect,1 the first study on mood and cognition using mobile typing kinematics, provides the multi-view sequential data for our experiments. BiAffect invited 40 participants to use the customized smartphones in their daily life. These phones are equipped with a custom keyboard that would collect the data of keypress duration, typing behaviors, accelerometer value, and others. In the experiment, we use three types of metadata: alphanumeric characters, special characters, and accelerometer values, which can be seen as three views: Alphanumeric characters include the duration time of keypress, time consumed since the last key was pressed, and the distance between the current key and the previous key along two axes; Special characters are auto-correct, backspace, space, suggestion, switching-keyboard, and other special characters; Accelerometer values are collected by the sensor of the smartphone. The device records accelerometer values every 60ms, regardless of typing speed.

The diagnosis of patients with bipolar disorder was obtained by weekly assessment of participants through Hamilton Depression Rating Scale (HDRS) [54] and Young Mania Rating Scale (YMRS) [59], which are used as the very reliable assessment standard for bipolar disorder. In this work, We study the collected metadata provided by participants who had provided at least one week of data. There are seven participants with bipolar I disorder, five participants with bipolar II disorder, and eight participants with no diagnosis per DSM-IV TR criteria [32].

8.2 Experimental Setup

Data Preprocessing. In the experiment, we investigate a session-based mood prediction problem same as prior works in the BiAffect project [8], which utilizes features of three views in a session to predict a user’s mood score. However, our goal is to train a binary mood classification. To get the binary labels, after consulting with the professional experts, we label the sessions with the HDRS score between 0 and 7 (inclusive) as negative samples and those with the HDRS score higher than score 7 as positive samples.

However, the original input data is not naturally fitted our proposed V-FedMV and H-FedMV, because the original keystroke data is time-sequential format, but these methods are designed for feature matrices. To address the data format problem, we applied a GRU [10], which is a simplified version of Long Short-term Memory (LSTM) [24] to each view, respectively, to preprocess the data and extract feature embedding matrix as the input of V-FedMV and H-FedMV methods.

In this project, we use Keras with Tensorflow as the backend to implement the code. We use RMSProp [51] as the optimizer for GRU training. We retain sessions that any view contains the number of keypress between 10 and 100, and then we have 14,971 total samples. Besides, we set the batch size as 128, the epoch as 500, the learning rate from \( \lbrace 0.001,0.005\rbrace \), and the dropout from \( \lbrace 0.1,0.3\rbrace \). We use the validation dataset to select the optimal parameters and get the feature matrices which are the input of our V-FedMV and H-FedMV frameworks. Furthermore, the visualization of three views after data preprocessing is shown in Figure 5(a), demonstrating the original space of the input data of V-FedMV and H-FedMV.

Fig. 5.

Fig. 5. Summary of visualization in various settings.

Hyperparameter Setting. In V-FedMV, we set up three clients, split the processed feature matrices according to the type of view, and make each client own one of three views. Besides, We set four clients in both H-FedMV and S-FedMV. In H-FedMV, we split the processed feature matrices, and make each client own the same number of samples with all three views. We also equally assign positive and negative samples to each client for H-FedMV. The data of S-FedMV is partitioned similarly to H-FedMV but the data is time-sequential.

For more details of hyperparameter setting, V-FedMV uses \( \beta _k \) as 4, \( \zeta _k \) and \( \eta \) from \( 2^0 \) to \( 2^5 \). In H-FedMV, we set the max communication rounds as 20, and the value of \( \beta _k^l \) as 4, and \( \zeta _k^l \), \( \eta ^l \) across different clients to be the same from \( 2^0 \) to \( 2^5 \). In S-FedMV, the max communication round is 150 for alphanumeric and special views, 170 for accel view. We set the local epoch from \( \lbrace 10,15\rbrace \), the dropout from \( \lbrace 0.1,0.3\rbrace \), the \( \beta _k^l \), \( \zeta _k^l \) and \( \eta ^l \) the same as H-FedMV. For all the experiments, the optimal parameters are selected by validation. In addition, we repeat all the experiments ten times with four metrics, i.e., accuracy, precision, recall, and F1 score. In addition, we also calculate the average and standard deviation.

8.3 Vertical Federated Multi-view Learning

Baselines. In this part, we evaluate V-FedMV with six other baselines. All approaches are summarized as follows:

  • V-FedMV: It is our proposed federated multi-view learning approach in the vertical setting, as described in Algorithm 3. Note that, all three views are used in the experiment of V-FedMV.

  • Pairwise FL: This approach is similar to V-FedMV and also applies the federated multi-view learning algorithm in vertical setting. The difference is that only two views are considered. In this case, we have three experiments with two views, including Pairwise FL w/o Special, Pairwise FL w/o Accel, Pairwise FL w/o Alphanum.

  • Single View w/o FL [61]: In this approach, we optimize the following objective function on a single view: \( \begin{equation} \underset{\mathbf {W}_k}{min}\Vert \mathbf {X}_k\mathbf {W}_k-\mathbf {Y}\Vert _F^2 +\beta _k\Vert \mathbf {W}_k\Vert _{2,1}, \end{equation} \) where \( \mathbf {Y} \) denotes the one-hot label matrix. Similar to pairwise FL, we also have three Single View w/o FL experiments, such as Alphanum w/o FL, Accel w/o FL, Special w/o FL.

Performance. Experimental results of V-FedMV and its baselines are shown in Table 2. It is not hard to see that V-FedMV shows the best performance with 88.76% accuracy and 88.93% F1 score. Besides, The visualizations of each view and \( \mathbf {Z} \) after V-FedMV are shown in Figure 5(b). All methods have a low standard deviation, which means all approaches are very stable in the repeated experiments. Meanwhile, we also find the special view makes the most negligible contribution for this task based on the experimental results of Pairwise FL w/o Special and Special w/o FL. Especially, the Pairwise FL w/o Special approach can perform almost the same as V-FedMV. One of the main reasons is that the special view provides less information than the other two views. Besides, we can see that the alphanum view achieves the best accuracy (84.53%) and F1 score (84.59%) among all three views, and the accel view is between the others. While comparing all approaches together, we can easily conclude that the model’s performance increases with more views, demonstrating that the proposed method is effective for multi-view data. Finally, the proposed approach V-FedMV improves 4.23% accuracy, 3.31% precision, 5.28% recall, and 4.34 % F1 score than Alphanum w/o FL that is the best single view learning without FL.

Table 2.
MetricAccuracy (%)Precision (%)Recall (%)F1 (%)
V-FedMV88.76 \( \pm \) 0.7490.89 \( \pm \) 0.8387.09 \( \pm \) 1.8988.93 \( \pm \) 0.84
Pairwise FL w/o Special88.30 \( \pm \) 0.6489.54 \( \pm \) 0.6887.72 \( \pm \) 1.4788.61 \( \pm \) 0.70
Pairwise FL w/o Accel84.92 \( \pm \) 0.5089.00 \( \pm \) 1.3481.01 \( \pm \) 2.0084.79 \( \pm \) 0.67
Pairwise FL w/o Alphanum78.71 \( \pm \) 0.7479.63 \( \pm \) 1.5979.42 \( \pm \) 3.3479.45 \( \pm \) 1.11
Alphanum w/o FL84.53 \( \pm \) 0.6887.58 \( \pm \) 0.8881.81 \( \pm \) 1.4084.59 \( \pm \) 0.75
Accel w/o FL76.94 \( \pm \) 0.9175.51 \( \pm \) 1.0382.32 \( \pm \) 2.7378.73 \( \pm \) 1.13
Special w/o FL62.30 \( \pm \) 0.6470.19 \( \pm \) 1.8147.83 \( \pm \) 3.0956.79 \( \pm \) 1.87

Table 2. Prediction Performance of Compared Methods in V-FedMV Experiment

8.4 Horizontal Federated Multi-view Learning

Baselines. Here, we evaluate H-FedMV with seven other baselines. All approaches are summarized as follows:

  • H-FedMV: It is our proposed federated multi-view learning approach in the horizontal setting, as described in Algorithm 3. H-FedMV also uses all three views like V-FedMV.

  • MV w/o FL: In this approach, each client applies the multi-view learning method on its local device. All clients do not share the parameters with the server for aggregation. Same as H-FedMV, MV w/o FL also uses all three views.

  • Pairwise FL: This approach also applies the proposed federated multi-view learning approach in the horizontal setting, but with only two views. Thus, we have three experiments with two views, including Pairwise FL w/o Special, Pairwise FL w/o Accel, Pairwise FL w/o Alphanum.

  • Single View FL: In this approach, the data on each client is represented by a single view. All clients optimize Equation (20) on their own devices firstly. Then each client sends its \( \mathbf {W}_k \) to the server for aggregation. After the aggregation is done, the global \( \mathbf {W}_k \) will be back to update the local model for all clients. This process will continue until the maximum number of communication rounds is reached. Similar to pairwise FL, we also have three Single View FL experiments: Single View FL—Alphanum, Single View FL—Accel, Single View FL—Special.

Performance. Table 3 shows the experimental results of H-FedMV and its baselines. It can be observed that H-FedMV outperforms the compared baselines with 88.98% accuracy and 89.17% F1 score. Besides, The visualizations of each view and \( \mathbf {Z} \) after H-FedMV are shown in Figure 5(c). MV w/o FL trains each model locally, and the average performance of all local models is worse than H-FedMV. One of the main reasons is that federated learning enables H-FedMV to get more training sample information than MV w/o FL, resulting in a better performance. In addition, the significance of each view in model training is similar to the experimental results of V-FedMV and its baselines in the last section, i.e., alphanum view \( \gt \) accel view \( \gt \) special view. Eventually, the proposed H-FedMV improves 1.06% accuracy, 0.8% precision, 1.27% recall, and 1.08% F1 score compared to MV w/o FL.

Table 3.
MetricAccuracy (%)Precision (%)Recall (%)F1 (%)
H-FedMV88.98 \( \pm \) 0.5291.01 \( \pm \) 0.8087.42 \( \pm \) 1.3389.17 \( \pm \) 0.57
MV w/o FL87.92 \( \pm \) 0.5190.21 \( \pm \) 1.5386.15 \( \pm \) 2.5888.09 \( \pm \) 0.72
Pairwise FL w/o Special88.28 \( \pm \) 0.6589.67 \( \pm \) 0.6987.52 \( \pm \) 1.5488.57 \( \pm \) 0.72
Pairwise FL w/o Accel84.72 \( \pm \) 1.0189.06 \( \pm \) 1.1080.47 \( \pm \) 2.4084.52 \( \pm \) 1.23
Pairwise FL w/o Alphanum78.77 \( \pm \) 0.7279.70 \( \pm \) 1.6079.45 \( \pm \) 3.1779.51 \( \pm \) 0.99
Single View FL—Alphanum84.57 \( \pm \) 0.6587.53 \( \pm \) 0.7981.97 \( \pm \) 1.4484.65 \( \pm \) 0.73
Single View FL—Accel76.96 \( \pm \) 0.9175.51 \( \pm \) 1.0782.39 \( \pm \) 2.6078.77 \( \pm \) 1.09
Single View FL—Special62.26 \( \pm \) 0.5870.20 \( \pm \) 1.7747.65 \( \pm \) 3.0256.67 \( \pm \) 1.81

Table 3. Prediction Performance of Compared Methods in H-FedMV Experiment

8.5 Federated Multi-view Sequential Learning

Baselines. We further evaluate the S-FedMV approach with three other baselines.

  • S-FedMV: It is our proposed federated learning framework for multi-view sequential data, as described in Algorithm 1.

  • Local Sequential H-FedMV: Each client runs feature representation learning locally with H-FedMV, which indicates no interactions of feature learning between any two clients.

  • Local Sequential LocalMV: There is no federated learning involved in this approach. Each client trains the multi-view sequential model by only using the local data.

  • Centralized Sequential H-FedMV: Centralized sequential approach has access to all training data across all clients for feature learning with H-FedMV. In general, it should achieve the best performance due to the best feature representation in federated learning.

Performance. Experimental results of different federated multi-view learning methods for sequential data are shown in Table 4. First, we can see that S-FedMV achieves outstanding performance with 88.90% accuracy and 89.22% F1 score. It demonstrates that S-FedMV is the best approach for distributed multi-view sequential data, since its performance is close to centralized sequential H-FedMV. Note that centralized sequential H-FedMV is the up-bound performance of all approaches, since the feature representation learning can access all training data. Then, we find the results of local sequential H-FedMV can confirm our statement that it is infeasible for each client to preprocess its data independently. It is because the trained models are inconsistent between different clients, and they obtain a bad representation learning of H-FedMV. We can also see that the proposed approach S-FedMV is better than local sequential LocalMV, which trains the local model on local sequential data only.

Table 4.
MetricAccuracy (%)Precision (%)Recall (%)F1 (%)
S-FedMV88.90 \( \pm \) 0.5389.93 \( \pm \) 0.8288.54 \( \pm \) 1.1689.22 \( \pm \) 0.55
Local Sequential H-FedMV68.31 \( \pm \) 5.5874.77 \( \pm \) 5.3958.10 \( \pm \) 9.0665.24 \( \pm \) 7.82
Local Sequential LocalMV82.27 \( \pm \) 0.9184.87 \( \pm \) 1.5080.34 \( \pm \) 0.4482.44 \( \pm \) 0.77
Centralized Sequential H-FedMV88.98 \( \pm \) 0.5291.01 \( \pm \) 0.8087.42 \( \pm \) 1.3389.17 \( \pm \) 0.57

Table 4. Prediction Performance of Compared Methods in S-FedMV Experiment

8.6 Hyperparameter Analysis

We use the hyperparameter \( \zeta _1,\zeta _2,\zeta _3 \) to correspond with alphanumeric view, special view and accel view, respectively, which controls the balance between the three views. For instance, a large \( \zeta _k \) indicates a higher impact of the kth view of model training. In this experimental setting, we test each hyperparameter \( \in \lbrace 2^0,2^1,2^2,2^3,2^4,2^5\rbrace \). Note that while testing one hyperparameter, we fix the others as \( 2^3 \). We show the evaluation results with four metrics, i.e., accuracy, precision, recall, F1 scores shown in Figure 6. We find that the hyperparameters affect the accuracy and F1 score consistently, which shows a similar trend line for the same hyperparameter evaluation. In summary, while \( \zeta _1=2^4 \), \( \zeta _2=2^2 \), and \( \zeta _3=2^3 \), the model can achieve the best performance on accuracy and F1 score for both H-FedMV and V-FedMV approaches. While \( \eta =2^4 \), H-FedMV is better than V-FedMV, but V-FedMV is better than H-FedMV once \( \eta =2^5 \). Moreover, we also find the performance of the trained model is most sensitive to \( \zeta _1 \) and \( \zeta _2 \) and less sensitive to \( \eta \).

Fig. 6.

Fig. 6. Hyperparameters analysis with V-FedMV and H-FedMV.

Skip 9RELATED WORK Section

9 RELATED WORK

In this section, we review the related work, which can be placed into three main categories: multi-view learning, federated learning, and federated multi-view learning.

Multi-view Learning: Sun et al. [49] provided a survey for multi-view machine learning and pointed out that multi-view learning is related to the machine learning problem with the data represented by multiple distinct feature sets. Xu et al. analyzed different multi-view algorithms and indicated that it was consensus and complementary principles that ensure their promising performance [56]. The aim of the consensus principle is to minimize the disagreement on multiple distinct views. A connection between the consensus of two hypotheses on two views, respectively, and their error rates was given by Dasgupta et al. [12]. The complementary principle means that multiple views can be complements for each other and can be exploited comprehensively to produce better learning performance for the reason that each view may contain some specific information that other views do not have. Wang and Zhou [52] demonstrated that the performance of co-training algorithms was largely affected by the complementary information in distinct views. These two principles are very important for multi-view learning and should be taken into consideration when designing multi-view learning algorithms.

Xu et al. also categorized the classical approaches of combining multiple views into co-training style algorithms, multiple kernel learning algorithms, and subspace learning-based approaches [56].

  • Co-training style algorithm: Blum and Mitchell [4] proposed the original co-training algorithm to solve semi-supervised classification problems. First, two classifiers are trained separately on each view, and then each classifier labels the unlabeled data that are then added to the training set of another classifier. Besides, Kumar and Daumé [35] extended the idea of co-training to an unsupervised setting. Since co-training algorithms usually separately train the base learners, it can be seen as a late combination method.

  • Multiple kernel learning algorithm: Combining different kernels is another way to integrate multiple views and can be regarded as an intermediate method for the reason that kernels are integrated just before or during the training phase. These methods include linear combination methods [29, 36] and nonlinear combination methods [11].

  • Subspace learning-based approach: In the subspace learning-based approach, there is an assumption that multiple views are generated from a latent subspace. This approach can be regarded as a prior combination of multiple views, and the goal is to obtain the latent subspace. Canonical correlation analysis (CCA) [25] is a classical subspace learning-based approach that can be applied to the datasets containing two views. Besides, it can be extended to cope with datasets represented by more than two views [33] and to kernel CCA [20].

Federated Learning: Federated Learning (FL) was proposed by McMahan et al. [44]. It is a collaborative machine learning paradigm for training models based on locally stored data from multiple organizations in a privacy-preserving way. Yang et al. gave a comprehensive survey for FL [58], which introduced horizontal federated learning, vertical federated learning, and federated transfer learning.

  • Horizontal federated learning: In horizontal federated learning, or sample-based federated learning, the data sets share the same feature space but differ in sample ID space [58]. Smith et al. proposed a novel framework for federated multi-task learning and considered high communication cost, stragglers, and fault tolerance in the federated environment for the first time [48]. Bonawitz et al. designed a secure aggregation scheme that allows a server to securely compute users’ data from mobile devices [5].

  • Vertical federated learning: In vertical federated learning, or feature-based federated learning, the data sets share the same sample ID space but have different feature spaces [58]. Some algorithms and models have been proposed for vertical federated learning. [21, 31, 46] are about secure linear regression. Du et al. defined two types of secure two-party multivariate statistical analysis problems: linear regression and classification problem. Secure methods are also proposed to solve these two problems [15]. Liu et al. proposed asymmetrical vertical federated learning and showed the way to achieve the asymmetrical ID alignment [41].

  • Federated transfer learning: In federated transfer learning, the data sets are different in both samples and feature space [58]. Liu et al. proposed an end-to-end approach to the FTL problem and demonstrated it was comparable to transfer learning without privacy protection [40]. A federated transfer learning framework for wearable healthcare was proposed in Reference [9].

Federated Multi-view Learning: Federated learning with multi-view data is currently less well explored in the literature. Flanagan et al. integrated multi-view matrix factorization with a federated learning framework for personalized recommendations and introduced a solution to the cold-start problem [19]. Huang et al. proposed FL-MV-DSSM, which is a generic content-based federated multi-view framework for recommendation scenarios and can address the cold-start problem [28]. Xu et al. [57] extended the DeepMood model [8] to a multi-view federated learning framework that suits the horizontal case. Feng et al. extended the idea of multi-view learning and proposed MMVFL that enables label sharing from its owner to other participants [18]. MMVFL suits for vertical setting and it can deal with multi-participant and multi-class problems while the other existing VFL approaches can only handle two participants and binary classification problems. Kim et al. proposed a federated tensor factorization framework for horizontally partitioned data [34]. To sum up, today’s federated multi-view learning is mainly applied to recommendation systems or tenor data, or only consider the horizontal or vertical situation, rather than consider the two together.

Skip 10CONCLUSION Section

10 CONCLUSION

In this article, we proposed a generic multi-view learning framework by using federated learning paradigm for privacy-preserving and secure sharing of medical data among institutions, which can well protect patient privacy by keeping the data within their own confines and achieve multi-view data integration. Specifically, we investigated two types of multi-view learning (i.e., vertical and horizontal data integration) in the setting of federated learning based on different local data availability, and developed vertical federated multi-view learning (V-FedMV) and horizontal federated multi-view learning (H-FedMV) algorithms to solve this problem. Moreover, we adapted our model to deal with multi-view sequential data in a federated environment and introduced a federated multi-view sequential learning method (S-FedMV). Extensive experiments on real-world keyboard data demonstrated that our methods could make full use of multi-view data and obtain better classification results as compared to local training. Moreover, the result of S-FedMV is comparable with the result of the centralized method that cannot protect the privacy of sensitive data and thus showing the effectiveness of our federated method.

Skip ACKNOWLEDGMENTS Section

ACKNOWLEDGMENTS

We thank CAAI-Huawei MindSpore Open Fund and Huawei MindSpore platform for providing the computing infrastructure.

Footnotes

REFERENCES

  1. [1] Al-Jarrah Omar Y., Yoo Paul D., Muhaidat Sami, Karagiannidis George K., and Taha Kamal. 2015. Efficient machine learning for big data: A review. Big Data Res. 2, 3 (2015), 8793.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Barach Paul and Small Stephen D.. 2000. Reporting and preventing medical mishaps: Lessons from non-medical near miss reporting systems. BMJ 320, 7237 (2000), 759763.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Jr. Randolph C. Barrows and Clayton Paul D.. 1996. Privacy, confidentiality, and electronic medical records. J. Amer. Med. Info. Assoc. 3, 2 (1996), 139148.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Blum Avrim and Mitchell Tom. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory. 92100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Bonawitz Keith, Ivanov Vladimir, Kreuter Ben, Marcedone Antonio, McMahan H. Brendan, Patel Sarvar, Ramage Daniel, Segal Aaron, and Seth Karn. 2017. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security. 11751191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Boughorbel Sabri, Jarray Fethi, Venugopal Neethu, Moosa Shabir, Elhadi Haithum, and Makhlouf Michel. 2019. Federated uncertainty-aware learning for distributed hospital ehr data. Retrieved from https://arXiv:1910.12191.Google ScholarGoogle Scholar
  7. [7] Brisimi Theodora S., Chen Ruidi, Mela Theofanie, Olshevsky Alex, Paschalidis Ioannis Ch, and Shi Wei. 2018. Federated learning of predictive models from federated electronic health records. Int. J. Med. Info. 112 (2018), 5967.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Cao Bokai, Zheng Lei, Zhang Chenwei, Yu Philip S., Piscitello Andrea, Zulueta John, Ajilore Olu, Ryan Kelly, and Leow Alex D.. 2017. Deepmood: Modeling mobile phone typing dynamics for mood detection. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 747755.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chen Yiqiang, Qin Xin, Wang Jindong, Yu Chaohui, and Gao Wen. 2020. Fedhealth: A federated transfer learning framework for wearable healthcare. IEEE Intell. Syst. 35, 4 (2020), 8393.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Cho Kyunghyun, Merriënboer Bart Van, Gulcehre Caglar, Bahdanau Dzmitry, Bougares Fethi, Schwenk Holger, and Bengio Yoshua. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Retrieved from https://arXiv:1406.1078.Google ScholarGoogle Scholar
  11. [11] Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh. 2009. Learning non-linear combinations of kernels. In Advances in Neural Information Processing Systems, Vol. 22.Google ScholarGoogle Scholar
  12. [12] Dasgupta Sanjoy, Littman Michael L., and McAllester David. 2002. PAC generalization bounds for co-training. Adv. Neural Info. Process. Syst. 1 (2002), 375382.Google ScholarGoogle Scholar
  13. [13] Dehnavieh Reza, Haghdoost AliAkbar, Khosravi Ardeshir, Hoseinabadi Fahime, Rahimi Hamed, Poursheikhali Atousa, Khajehpour Nahid, Khajeh Zahra, Mirshekari Nadia, Hasani Marziyeh, et al. 2019. The district health information system (DHIS2): A literature review and meta-synthesis of its strengths and operational challenges based on the experiences of 11 countries. Health Info. Manage. J. 48, 2 (2019), 6275.Google ScholarGoogle Scholar
  14. [14] Deutsch Alin and Papakonstantinou Yannis. 2005. Privacy in database publishing. In Proceedings of the International Conference on Database Theory. Springer, 230245.Google ScholarGoogle Scholar
  15. [15] Du Wenliang, Han Yunghsiang S., and Chen Shigang. 2004. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 222233.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Emam Khaled El. 2008. Heuristics for de-identifying health data. IEEE Secur. Priv. 6, 4 (2008), 5861.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Emam Khaled El, Dankar Fida Kamal, Issa Romeo, Jonker Elizabeth, Amyot Daniel, Cogo Elise, Corriveau Jean-Pierre, Walker Mark, Chowdhury Sadrul, Vaillancourt Regis, et al. 2009. A globally optimal k-anonymity method for the de-identification of health data. J. Amer. Med. Info. Assoc. 16, 5 (2009), 670682.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Feng Siwei and Yu Han. 2020. Multi-participant multi-class vertical federated learning. Retrieved from https://arXiv:2001.11154.Google ScholarGoogle Scholar
  19. [19] Flanagan Adrian, Oyomno Were, Grigorievskiy Alexander, Tan Kuan Eeik, Khan Suleiman A., and Ammad-Ud-Din Muhammad. 2020. Federated multi-view matrix factorization for personalized recommendations. Retrieved from https://arXiv:2004.04256.Google ScholarGoogle Scholar
  20. [20] Fyfe Colin and Lai Pei Ling. 2000. ICA using kernel canonical correlation analysis. In Proceedings of the International Workshop on Independent Component Analysis and Blind Signal Separation. Citeseer.Google ScholarGoogle Scholar
  21. [21] Gascón Adrià, Schoppmann Phillipp, Balle Borja, Raykova Mariana, Doerner Jack, Zahur Samee, and Evans David. 2016. Secure linear regression on vertically partitioned datasets. IACR Cryptol. ePrint Arch. 2016 (2016), 892.Google ScholarGoogle Scholar
  22. [22] Gunter Tracy D. and Terry Nicolas P.. 2005. The emergence of national electronic health record architectures in the united states and australia: Models, costs, and questions. J. Med. Internet Res. 7, 1 (2005), e3.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] He Lifang, Lu Chun-Ta, Chen Yong, Zhang Jiawei, Shen Linlin, Philip S. Yu, and Wang Fei. 2018. A self-organizing tensor architecture for multi-view clustering. In Proceedings of the IEEE International Conference on Data Mining (ICDM’18). IEEE, 10071012.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Hotelling Harold. 1992. Relations between two sets of variates. In Breakthroughs in Statistics. Springer, 162190.Google ScholarGoogle Scholar
  26. [26] Hou Chenping, Nie Feiping, Li Xuelong, Yi Dongyun, and Wu Yi. 2013. Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE Trans. Cybernet. 44, 6 (2013), 793804.Google ScholarGoogle Scholar
  27. [27] Huang Li, Shea Andrew L., Qian Huining, Masurkar Aditya, Deng Hao, and Liu Dianbo. 2019. Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. J. Biomed. Info. 99 (2019), 103291.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Huang Mingkai, Li Hao, Bai Bing, Wang Chang, Bai Kun, and Wang Fei. 2020. A federated multi-view deep learning framework for privacy-preserving recommendations. Retrieved from https://arXiv:2008.10808.Google ScholarGoogle Scholar
  29. [29] Joachims Thorsten, Cristianini Nello, and Shawe-Taylor John. 2001. Composite kernels for hypertext categorisation. In Proceedings of the 18th International Conference on Machine Learning, Vol. 1. 250257.Google ScholarGoogle Scholar
  30. [30] Jordan Michael I. and Mitchell Tom M.. 2015. Machine learning: Trends, perspectives, and prospects. Science 349, 6245 (2015), 255260.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Karr Alan F., Lin Xiaodong, Sanil Ashish P., and Reiter Jerome P.. 2009. Privacy-preserving analysis of vertically partitioned data using secure matrix products. J. Offic. Stat. 25, 1 (2009), 125.Google ScholarGoogle Scholar
  32. [32] Kessler Ronald C., Berglund Patricia, Demler Olga, Jin Robert, Merikangas Kathleen R., and Walters Ellen E.. 2005. Lifetime prevalence and age-of-onset distributions of DSM-IV disorders in the national comorbidity survey replication. Arch. Gen. Psych. 62, 6 (2005), 593602.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Kettenring Jon R.. 1971. Canonical analysis of several sets of variables. Biometrika 58, 3 (1971), 433451.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Kim Yejin, Sun Jimeng, Yu Hwanjo, and Jiang Xiaoqian. 2017. Federated tensor factorization for computational phenotyping. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 887895.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Kumar Abhishek and Daumé Hal. 2011. A co-training approach for multi-view spectral clustering. In Proceedings of the 28th International Conference on Machine Learning. 393400.Google ScholarGoogle Scholar
  36. [36] Lanckriet Gert R. G., Cristianini Nello, Bartlett Peter, Ghaoui Laurent El, and Jordan Michael I.. 2004. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5(Jan.2004), 2772.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Lee Junghye, Sun Jimeng, Wang Fei, Wang Shuang, Jun Chi-Hyuck, and Jiang Xiaoqian. 2018. Privacy-preserving patient similarity learning in a federated environment: Development and analysis. JMIR Med. Info. 6, 2 (2018), e20.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Li Fengjun, Zou Xukai, Liu Peng, and Chen Jake Y.. 2011. New threats to health data privacy. In BMC Bioinformatics, Vol. 12. BioMed Central, 17.Google ScholarGoogle Scholar
  39. [39] Li Ninghui, Li Tiancheng, and Venkatasubramanian Suresh. 2007. t-closeness: Privacy beyond k-anonymity and l-diversity. In Proceedings of the IEEE 23rd International Conference on Data Engineering. IEEE, 106115.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Liu Yang, Kang Yan, Xing Chaoping, Chen Tianjian, and Yang Qiang. 2020. A secure federated transfer learning framework. IEEE Intell. Syst. 35, 4 (2020), 7082.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Liu Yang, Zhang Xiong, and Wang Libin. 2020. Asymmetrically vertical federated learning. Retrieved from https://arXiv:2004.07427.Google ScholarGoogle Scholar
  42. [42] Lu Yao, Zhou Tianshu, Tian Yu, Zhu Shiqiang, and Li Jingsong. 2020. Web-based privacy-preserving multicenter medical data analysis tools via threshold homomorphic encryption: Design and development study. J. Med. Internet Res. 22, 12 (2020), e22555.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Machanavajjhala Ashwin, Kifer Daniel, Gehrke Johannes, and Venkitasubramaniam Muthuramakrishnan. 2007. l-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1, 1 (2007), 3–es.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] McMahan Brendan, Moore Eider, Ramage Daniel, Hampson Seth, and Arcas Blaise Aguera y. 2017. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics. PMLR, 12731282.Google ScholarGoogle Scholar
  45. [45] Nie Feiping, Huang Heng, Cai Xiao, and Ding Chris H.. 2010. Efficient and robust feature selection via joint \( \ell _{2,1} \)-norms minimization. In Advances in Neural Information Processing Systems. MIT Press, 18131821.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Sanil Ashish P., Karr Alan F., Lin Xiaodong, and Reiter Jerome P.. 2004. Privacy preserving regression modelling via distributed computation. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 677682.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Singer Eleanor, Mathiowetz Nancy A., and Couper Mick P.. 1993. The impact of privacy and confidentiality concerns on survey participation the case of the 1990 US census. Public Opin. Quart. 57, 4 (1993), 465482.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Smith Virginia, Chiang Chao-Kai, Sanjabi Maziar, and Talwalkar Ameet. 2017. Federated multi-task learning. Retrieved from https://arXiv:1705.10467.Google ScholarGoogle Scholar
  49. [49] Sun Shiliang. 2013. A survey of multi-view machine learning. Neural Comput. Appl. 23, 7 (2013), 20312038.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Sweeney Latanya. 2002. k-anonymity: A model for protecting privacy. Int. J. Uncert. Fuzz. Knowl.-Based Syst. 10, 5 (2002), 557570.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Tieleman Tijmen and Hinton Geoffrey. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4, 2 (2012), 2631.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Wang Wei and Zhou Zhi-Hua. 2007. Analyzing co-training style algorithms. In Proceedings of the European Conference on Machine Learning. Springer, 454465.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. [53] Wellner Ben, Huyck Matt, Mardis Scott, Aberdeen John, Morgan Alex, Peshkin Leonid, Yeh Alex, Hitzeman Janet, and Hirschman Lynette. 2007. Rapidly retargetable approaches to de-identification in medical records. J. Amer. Med. Info. Assoc. 14, 5 (2007), 564573.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Williams Janet B. W.. 1988. A structured interview guide for the hamilton depression rating scale. Arch. Gen. Psych. 45, 8 (1988), 742747.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Xiao Xiaokui and Tao Yufei. 2006. Anatomy: Simple and effective privacy preservation. In Proceedings of the International Conference on Very Large Data Bases, Vol. 6. 139150.Google ScholarGoogle Scholar
  56. [56] Xu Chang, Tao Dacheng, and Xu Chao. 2013. A survey on multi-view learning. Retrieved from https://arXiv:1304.5634.Google ScholarGoogle Scholar
  57. [57] Xu Xiaohang, Peng Hao, Sun Lichao, Niu Yan, Ma Hongyuan, Liu Lianzhong, and He Lifang. 2021. FedMood: Federated learning on mobile health data for mood detection. Retrieved from https://arXiv:2102.09342.Google ScholarGoogle Scholar
  58. [58] Yang Qiang, Liu Yang, Chen Tianjian, and Tong Yongxin. 2019. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. 10, 2 (2019), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Young Robert C., Biggs Jeffery T., Ziegler Veronika E., and Meyer Dolores A.. 1978. A rating scale for mania: Reliability, validity and sensitivity. Brit. J. Psych. 133, 5 (1978), 429435.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Zhao Jing, Xie Xijiong, Xu Xin, and Sun Shiliang. 2017. Multi-view learning overview: Recent progress and new challenges. Info. Fusion 38 (2017), 4354.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Zhao Zheng, Wang Lei, and Liu Huan. 2010. Efficient spectral feature selection with minimum redundancy. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 24.Google ScholarGoogle Scholar

Index Terms

  1. Federated Multi-view Learning for Private Medical Data Integration and Analysis

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Intelligent Systems and Technology
            ACM Transactions on Intelligent Systems and Technology  Volume 13, Issue 4
            August 2022
            364 pages
            ISSN:2157-6904
            EISSN:2157-6912
            DOI:10.1145/3522732
            • Editor:
            • Huan Liu
            Issue’s Table of Contents

            Copyright © 2022 Copyright held by the owner/author(s).

            This work is licensed under a Creative Commons Attribution International 4.0 License.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 28 June 2022
            • Online AM: 18 May 2022
            • Accepted: 1 November 2021
            • Revised: 1 August 2021
            • Received: 1 May 2021
            Published in tist Volume 13, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format