Identifying associations among genomic, proteomic and imaging biomarkers via adaptive sparse multi-view canonical correlation analysis

https://doi.org/10.1016/j.media.2021.102003Get rights and content

Highlights

  • We proposed two adaptive sparse multi-view sparse canonical correlation analysis.

  • An efficient optimization algorithm was proposed, which was guaranteed to converge.

  • AdaSMCCA iteratively reweighed each sub-objective to ensure a better performance.

  • AdaSMCCA improved state-of-the-art methods with more reasonable canonical weights.

Abstract

To uncover the genetic underpinnings of brain disorders, brain imaging genomics usually jointly analyzes genetic variations and imaging measurements. Meanwhile, other biomarkers such as proteomic expressions can also carry valuable complementary information. Therefore, it is necessary yet challenging to investigate the underlying relationships among genetic variations, proteomic expressions, and neuroimaging measurements, which stands a chance of gaining new insights into the pathogenesis of brain disorders. Given multiple types of biomarkers, using sparse multi-view canonical correlation analysis (SMCCA) and its variants to identify the multi-way associations is straightforward. However, due to the gradient domination issue caused by the naive fusion of multiple SCCA objectives, SMCCA is suboptimal. In this paper, we proposed two adaptive SMCCA (AdaSMCCA) methods, i.e. the robustness-aware AdaSMCCA and the uncertainty-aware AdaSMCCA, to analyze the complicated associations among genetic, proteomic, and neuroimaging biomarkers. We also imposed a data-driven feature grouping penalty to the genetic data with aim to uncover the joint inheritance of neighboring genetic variations. An efficient optimization algorithm, which is guaranteed to converge, was provided. Using two state-of-the-art SMCCA as benchmarks, we evaluated robustness-aware AdaSMCCA and uncertainty-aware AdaSMCCA on both synthetic data and real neuroimaging, proteomics, and genetic data. Both proposed methods obtained higher associations and cleaner canonical weight profiles than comparison methods, indicating their promising capability for association identification and feature selection. In addition, the subsequent analysis showed that the identified biomarkers were related to Alzheimer’s disease, demonstrating the power of our methods in identifying multi-way bi-multivariate associations among multiple heterogeneous biomarkers.

Introduction

Alzheimer’s disease (AD) is a multifactorial neurodegenerative disorder which involves many abnormal alterations happening to the brain. For example, the hippocampus usually exhibits atrophic patterns in AD-affected brain, and simultaneously, the apolipoprotein E (APOE) concentration is also altered in AD patients and thus show relevance to AD pathology (Gupta, Laws, Villemagne, Ames, Bush, Ellis, Lui, Masters, Rowe, Szoeke, et al., 2011, Soares, Potter, Pickering, Kuhn, Immermann, Shera, Ferm, Dean, Simon, Swenson, et al., 2012). Despite an increasing number of studies during the last decade, the pathological mechanism of AD still remains uncertain (Association, 2019). Therefore, jointly analyzing multiple types of biomarkers, such as magnetic resonance imaging (MRI) derived imaging measurements (Feldman, McPherson, Biesecker, Wiers, Manza, Volkow, Wang, 2020, Fan, Cheng, Gou, Liu, Deng, Liu, Chen, Bu, Zhang, 2020), blood-based proteomic expression levels and genetic variations, and investigating their associations could deepen our understanding of the pathology of AD. Additionally, a combination of multiple different types of biomarkers as well as their interplays could also increase the reliability and specificity of AD diagnosis, as many biomarkers are not exclusive to AD.

During the last decade, many brain imaging genomic studies arose to investigate the association between two types of biomarkers. A recent systematic review (Shen and Thompson, 2020) showed that most of them were designed to identify the association between the single nucleotide polymorphisms (SNPs) and brain imaging quantitative traits (QTs) (Du, Liu, Zhang, Yao, Yan, Risacher, Han, Guo, Saykin, Shen, 2018, Du, Liu, Liu, Yao, Risacher, Han, Guo, Saykin, Shen, 2020, Du, Liu, Liu, Yao, Risacher, Han, Saykin, Shen, 2020, Du, Liu, Yao, Risacher, Han, Saykin, Guo, Shen, 2020, Bi, Hu, Wu, Wang, 2020, Bi, Liu, Xie, Hu, Jiang, 2020). Technically speaking, both the regression methods and sparse canonical correlation analysis (SCCA) methods were widely used. For example, based on regression alone, Wang et al. (2012) proposed the multi-task regression and classification to combine SNPs and imaging QTs to predict the memory deterioration and diagnostic status. Using SCCA alone, Yan et al. (2017) studied the association between proteomic analytes and brain imaging QTs. In addition, the integration of regression and SCCA were also proposed to identify associations among SNPs, imaging QTs and diagnostic outcomes (Zille et al., 2018). To the best of our knowledge, regression methods are typically not designed to directly identify SNP-QT correlations (Wang et al., 2012), and the classical SCCA could only handle two distinct types of biomarkers (Lin, Calhoun, Wang, 2014, Fang, Lin, Schulz, Xu, Calhoun, Wang, 2016, Yan, Risacher, Nho, Saykin, Shen, 2017, Du, Liu, Zhang, Yao, Yan, Risacher, Han, Guo, Saykin, Shen, 2018). Their combination still confronts with the same issue as SCCA. Consequently, it is essential and important to develop novel methods to efficiently and practically identify multi-way associations among more than three different types of biomarkers. By looking into this complex multi-way associations, it would deepen our understanding of the pathological characteristics of AD.

To identify associations among multiple different types of biomarkers, the results combination strategy could be an alternative. It first analyzes each kind of biomarkers independently, and then combines the results together to draw a meta conclusion. Obviously, the interplays among different types of biomarkers are overlooked. SMCCA (Witten and Tibshirani, 2009) is another alternative, but directly applying it to identify multi-way associations usually suffers from the gradient domination issue (Kendall et al., 2018) which comes out of the unfair objectives combination (Hu et al., 2017). This is a common problem in imaging genomics since, in general, significantly different correlation levels exhibit among multiple types of biomarkers. For example, the correlation coefficient whose value range is [1,1] (or [0,1] in absolute value), obtained by SCCA, between SNPs and structural imaging QTs such as grey matter loss is around [0.2, 0.3] (Du et al., 2021), while that between proteomic markers and structural imaging QTs such as cortical thickness is much higher, with values being around 0.7 for training and 0.38 for testing (Yan et al., 2017). This significant difference incurs gradient domination, and thus leads to the biased optimization. More seriously, as the kinds of biomarkers increase, the gradient domination will get worse. This further deteriorates SMCCA’s performance due to its naive fusion strategy. Hu et al. (2017) proposed an adaptive SMCCA, named AdaSMCCA in this paper, which assigns an adaptive weight for each SCCA model. Unfortunately, this method still suffers from the gradient domination. And, since it treats covariance matrices to be identity ones, AdaSMCCA is lacking the theoretical guarantee of consistency and convergence, which might be unreliable (Chen et al., 2013). Therefore, to better identify multi-way bi-multivariate associations, developing more adaptive methods, with solid theoretical properties to handle the gradient domination issue, would be very valuable and meaningful.

In this article, we revisited SMCCA and its limitation in multi-way association identification for imaging genomics. To overcome the gradient domination, we first proposed a robustness-aware AdaSMCCA (rAdaSMCCA) method which adaptively balances between multiple pairwise SCCA models. In addition, to ensure the selection of meaningful biomarkers, we imposed fused pairwise group Lasso (FGL) (Du et al., 2020c) and Lasso to regularize SNPs, and Lasso to both proteomic markers and imaging QTs. We further found that rAdaSMCCA still suffers from the gradient domination issue caused by extreme SCCA model. Therefore, we proposed a novel uncertainty-aware AdaSMCCA (unAdaSMCCA) which resolves the gradient domination issue well with desirable theoretical properties. The contributions of this study were fourfold. First, we proposed two novel AdaSMCCA methods, i.e. rAdaSMCCA and unAdaSMCCA, which could identify multi-way bi-multivariate associations among multiple (3) types of biomarkers without blindly fusing them. We first introduced rAdaSMCCA since it is an enhancement of AdaSMCCA, and then we introduced unAdaSMCCA which is better than rAdaSMCCA and AdaSMCCA in terms of modeling. Second, both methods overcame the gradient domination issue, and unAdaSMCCA was the best one to overcome this issue. In this study, addressing the gradient domination enabled a better identification of relationships among SNPs, proteomic analytes, and imaging measurements, which could yield interesting findings of AD. Third, the feature grouping penalty for SNPs automatically learnt the grouping structure embedded within neighbouring SNPs. This data-driven regularization could extract SNPs jointly affecting proteomic QTs and imaging QTs. Fourth, to efficiently solve two models, we derived an alternative iteration algorithm with its convergence demonstrated.

In the experiments, we compared rAdaSMCCA and unAdaSMCCA with two state-of-the-art methods, including SMCCA (Witten and Tibshirani, 2009) and adaptive SMCCA (Hu et al., 2017), on four synthetic data sets and one real data set including SNPs, proteomic analyte markers and imaging QTs of 244 subjects from the Alzheimer’s disease neuroimaging initiative (ADNI) database. The results on both synthetic and real data sets showed that rAdaSMCCA and unAdaSMCCA identified higher canonical correlation coefficients and better canonical weight patterns indicating enhanced feature selection capability. In particular, unAdaSMCCA performed the best owing to its well-designed loss balancing strategy. In sum, all these results demonstrated that both rAdaSMCCA and unAdaSMCCA held very promising power, with unAdaSMCCA being the best, in identifying multi-way bi-multivariate associations among SNPs, proteomic analytes and imaging QTs. Therefore, our proposed rAdaSMCCA and unAdaSMCCA were promising methods for identifying multi-way associations among multi-omics data in brain imaging genomics.

Section snippets

Method

Throughout this article, we denote vectors as lowercase letters, and matrices as uppercase letters. Specifically, X=(xij) denotes a matrix, and its i-th row and j-th column are separately denoted by xi and xj. The Euclidean norm of x is denoted as x2=xi2.

Experiment results

We used the SMCCA (Witten and Tibshirani, 2009) and Adaptive SMCCA (AdaSMCCA) as benchmark methods. SMCCA simply combines multiple SCCA models without consideration on gradient domination. AdaSMCCA combines these SCCA models with each of them associated with an additional weight. By now, AdaSMCCA was the state-of-the-art SMCCA method. Therefore, this comparison study could help show the efficiency and effectiveness of our proposed methods (The Matlab code of our AdaSMCCA methods is publicly

Conclusion

Alzheimer’s disease is a multifactorial neurodegenerative disorder which could incur many abnormal alterations to the brain. Brain imaging genomics jointly analyzes genetic variations, imaging QTs and other biomarkers such as proteomic expressions. Multiple heterogeneous markers carry valuable complementary information and fusing them might yield interesting findings. However, directly fusing multiple SCCA models might be suboptimal due to undesired gradient domination. We proposed two AdaSMCCA

CRediT authorship contribution statement

Lei Du: Conceptualization, Methodology, Writing - original draft. Jin Zhang: Software, Writing - review & editing. Fang Liu: Software, Visualization, Investigation. Huiai Wang: Validation, Writing - review & editing. Lei Guo: Conceptualization. Junwei Han: Writing - review & editing. the Alzheimer’s Disease Neuroimaging Initiative: .

Declaration of Competing Interest

None.

Acknowledgements

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech;

References (37)

  • A. Association

    2019 Alzheimer’s disease facts and figures

    Alzheimer’s & Dementia

    (2019)
  • X.-a. Bi et al.

    Multimodal data analysis of alzheimer’s disease based on clustering evolutionary random forest

    IEEE J Biomed Health Inform

    (2020)
  • X.-a. Bi et al.

    Morbigenous brain region and gene detection with a genetically evolved random neural network cluster approach in late mild cognitive impairment

    Bioinformatics

    (2020)
  • M. Chen et al.

    Sparse cca via precision adjusted iterative thresholding

    arXiv preprint arXiv:1311.6186

    (2013)
  • J. Ciesielski-Treska et al.

    Chromogranin a induces a neurotoxic phenotype in brain microglial cells

    J. Biol. Chem.

    (1998)
  • L.W. De Jong et al.

    Strongly reduced volumes of putamen and thalamus in Alzheimer’s disease: an MRI study

    Brain

    (2008)
  • R. Deane et al.

    Rage mediates amyloid-beta peptide transport across the blood-brain barrier and accumulation in brain.

    Nat. Med.

    (2003)
  • L. Du et al.

    Structured sparse canonical correlation analysis for brain imaging genetics: an improved graphnet method

    Bioinformatics

    (2016)
  • L. Du et al.

    Identifying diagnosis-specific genotype-phenotype associations via joint multi-task sparse canonical correlation analysis and classification

    Bioinformatics

    (2020)
  • L. Du et al.

    Associating multi-modal brain imaging phenotypes and genetic risk factors via a dirty multi-task learning method

    IEEE Trans Med Imaging

    (2020)
  • L. Du et al.

    Multi-task sparse canonical correlation analysis with application to multi-modal brain imaging genetics

    IEEE/ACM Trans. Comput. Biol. Bioinf.

    (2021)
  • L. Du et al.

    Detecting genetic associations with brain imaging phenotypes in Alzheimer’s disease via a novel structured SCCA approach

    Med Image Anal

    (2020)
  • L. Du et al.

    A novel SCCA approach via truncated 1-norm and truncated group lasso for brain imaging genetics

    Bioinformatics

    (2018)
  • L. Du et al.

    A novel structure-aware sparse learning algorithm for brain imaging genetics

    International Conference on Medical Image Computing and Computer Assisted Intervention

    (2014)
  • C. Fan et al.

    Neuroimaging and intervening in memory reconsolidation of human drug addiction

    Science China Information Sciences

    (2020)
  • J. Fang et al.

    Joint sparse canonical correlation analysis for detecting differential imaging genetics modules.

    Bioinformatics

    (2016)
  • D.E. Feldman et al.

    Neuroimaging of inflammation in alcohol use disorder: a review

    Science China Information Sciences

    (2020)
  • H. Gao et al.

    Robust capped norm nonnegative matrix factorization

    the 24th ACM International on Conference on Information and Knowledge Management

    (2015)
  • Cited by (24)

    View all citing articles on Scopus
    1

    Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

    View full text