Identifying associations among genomic, proteomic and imaging biomarkers via adaptive sparse multi-view canonical correlation analysis
Graphical abstract
Introduction
Alzheimer’s disease (AD) is a multifactorial neurodegenerative disorder which involves many abnormal alterations happening to the brain. For example, the hippocampus usually exhibits atrophic patterns in AD-affected brain, and simultaneously, the apolipoprotein E (APOE) concentration is also altered in AD patients and thus show relevance to AD pathology (Gupta, Laws, Villemagne, Ames, Bush, Ellis, Lui, Masters, Rowe, Szoeke, et al., 2011, Soares, Potter, Pickering, Kuhn, Immermann, Shera, Ferm, Dean, Simon, Swenson, et al., 2012). Despite an increasing number of studies during the last decade, the pathological mechanism of AD still remains uncertain (Association, 2019). Therefore, jointly analyzing multiple types of biomarkers, such as magnetic resonance imaging (MRI) derived imaging measurements (Feldman, McPherson, Biesecker, Wiers, Manza, Volkow, Wang, 2020, Fan, Cheng, Gou, Liu, Deng, Liu, Chen, Bu, Zhang, 2020), blood-based proteomic expression levels and genetic variations, and investigating their associations could deepen our understanding of the pathology of AD. Additionally, a combination of multiple different types of biomarkers as well as their interplays could also increase the reliability and specificity of AD diagnosis, as many biomarkers are not exclusive to AD.
During the last decade, many brain imaging genomic studies arose to investigate the association between two types of biomarkers. A recent systematic review (Shen and Thompson, 2020) showed that most of them were designed to identify the association between the single nucleotide polymorphisms (SNPs) and brain imaging quantitative traits (QTs) (Du, Liu, Zhang, Yao, Yan, Risacher, Han, Guo, Saykin, Shen, 2018, Du, Liu, Liu, Yao, Risacher, Han, Guo, Saykin, Shen, 2020, Du, Liu, Liu, Yao, Risacher, Han, Saykin, Shen, 2020, Du, Liu, Yao, Risacher, Han, Saykin, Guo, Shen, 2020, Bi, Hu, Wu, Wang, 2020, Bi, Liu, Xie, Hu, Jiang, 2020). Technically speaking, both the regression methods and sparse canonical correlation analysis (SCCA) methods were widely used. For example, based on regression alone, Wang et al. (2012) proposed the multi-task regression and classification to combine SNPs and imaging QTs to predict the memory deterioration and diagnostic status. Using SCCA alone, Yan et al. (2017) studied the association between proteomic analytes and brain imaging QTs. In addition, the integration of regression and SCCA were also proposed to identify associations among SNPs, imaging QTs and diagnostic outcomes (Zille et al., 2018). To the best of our knowledge, regression methods are typically not designed to directly identify SNP-QT correlations (Wang et al., 2012), and the classical SCCA could only handle two distinct types of biomarkers (Lin, Calhoun, Wang, 2014, Fang, Lin, Schulz, Xu, Calhoun, Wang, 2016, Yan, Risacher, Nho, Saykin, Shen, 2017, Du, Liu, Zhang, Yao, Yan, Risacher, Han, Guo, Saykin, Shen, 2018). Their combination still confronts with the same issue as SCCA. Consequently, it is essential and important to develop novel methods to efficiently and practically identify multi-way associations among more than three different types of biomarkers. By looking into this complex multi-way associations, it would deepen our understanding of the pathological characteristics of AD.
To identify associations among multiple different types of biomarkers, the results combination strategy could be an alternative. It first analyzes each kind of biomarkers independently, and then combines the results together to draw a meta conclusion. Obviously, the interplays among different types of biomarkers are overlooked. SMCCA (Witten and Tibshirani, 2009) is another alternative, but directly applying it to identify multi-way associations usually suffers from the gradient domination issue (Kendall et al., 2018) which comes out of the unfair objectives combination (Hu et al., 2017). This is a common problem in imaging genomics since, in general, significantly different correlation levels exhibit among multiple types of biomarkers. For example, the correlation coefficient whose value range is (or [0,1] in absolute value), obtained by SCCA, between SNPs and structural imaging QTs such as grey matter loss is around [0.2, 0.3] (Du et al., 2021), while that between proteomic markers and structural imaging QTs such as cortical thickness is much higher, with values being around 0.7 for training and 0.38 for testing (Yan et al., 2017). This significant difference incurs gradient domination, and thus leads to the biased optimization. More seriously, as the kinds of biomarkers increase, the gradient domination will get worse. This further deteriorates SMCCA’s performance due to its naive fusion strategy. Hu et al. (2017) proposed an adaptive SMCCA, named AdaSMCCA in this paper, which assigns an adaptive weight for each SCCA model. Unfortunately, this method still suffers from the gradient domination. And, since it treats covariance matrices to be identity ones, AdaSMCCA is lacking the theoretical guarantee of consistency and convergence, which might be unreliable (Chen et al., 2013). Therefore, to better identify multi-way bi-multivariate associations, developing more adaptive methods, with solid theoretical properties to handle the gradient domination issue, would be very valuable and meaningful.
In this article, we revisited SMCCA and its limitation in multi-way association identification for imaging genomics. To overcome the gradient domination, we first proposed a robustness-aware AdaSMCCA (rAdaSMCCA) method which adaptively balances between multiple pairwise SCCA models. In addition, to ensure the selection of meaningful biomarkers, we imposed fused pairwise group Lasso (FGL) (Du et al., 2020c) and Lasso to regularize SNPs, and Lasso to both proteomic markers and imaging QTs. We further found that rAdaSMCCA still suffers from the gradient domination issue caused by extreme SCCA model. Therefore, we proposed a novel uncertainty-aware AdaSMCCA (unAdaSMCCA) which resolves the gradient domination issue well with desirable theoretical properties. The contributions of this study were fourfold. First, we proposed two novel AdaSMCCA methods, i.e. rAdaSMCCA and unAdaSMCCA, which could identify multi-way bi-multivariate associations among multiple () types of biomarkers without blindly fusing them. We first introduced rAdaSMCCA since it is an enhancement of AdaSMCCA, and then we introduced unAdaSMCCA which is better than rAdaSMCCA and AdaSMCCA in terms of modeling. Second, both methods overcame the gradient domination issue, and unAdaSMCCA was the best one to overcome this issue. In this study, addressing the gradient domination enabled a better identification of relationships among SNPs, proteomic analytes, and imaging measurements, which could yield interesting findings of AD. Third, the feature grouping penalty for SNPs automatically learnt the grouping structure embedded within neighbouring SNPs. This data-driven regularization could extract SNPs jointly affecting proteomic QTs and imaging QTs. Fourth, to efficiently solve two models, we derived an alternative iteration algorithm with its convergence demonstrated.
In the experiments, we compared rAdaSMCCA and unAdaSMCCA with two state-of-the-art methods, including SMCCA (Witten and Tibshirani, 2009) and adaptive SMCCA (Hu et al., 2017), on four synthetic data sets and one real data set including SNPs, proteomic analyte markers and imaging QTs of 244 subjects from the Alzheimer’s disease neuroimaging initiative (ADNI) database. The results on both synthetic and real data sets showed that rAdaSMCCA and unAdaSMCCA identified higher canonical correlation coefficients and better canonical weight patterns indicating enhanced feature selection capability. In particular, unAdaSMCCA performed the best owing to its well-designed loss balancing strategy. In sum, all these results demonstrated that both rAdaSMCCA and unAdaSMCCA held very promising power, with unAdaSMCCA being the best, in identifying multi-way bi-multivariate associations among SNPs, proteomic analytes and imaging QTs. Therefore, our proposed rAdaSMCCA and unAdaSMCCA were promising methods for identifying multi-way associations among multi-omics data in brain imaging genomics.
Section snippets
Method
Throughout this article, we denote vectors as lowercase letters, and matrices as uppercase letters. Specifically, denotes a matrix, and its -th row and -th column are separately denoted by and . The Euclidean norm of is denoted as .
Experiment results
We used the SMCCA (Witten and Tibshirani, 2009) and Adaptive SMCCA (AdaSMCCA) as benchmark methods. SMCCA simply combines multiple SCCA models without consideration on gradient domination. AdaSMCCA combines these SCCA models with each of them associated with an additional weight. By now, AdaSMCCA was the state-of-the-art SMCCA method. Therefore, this comparison study could help show the efficiency and effectiveness of our proposed methods (The Matlab code of our AdaSMCCA methods is publicly
Conclusion
Alzheimer’s disease is a multifactorial neurodegenerative disorder which could incur many abnormal alterations to the brain. Brain imaging genomics jointly analyzes genetic variations, imaging QTs and other biomarkers such as proteomic expressions. Multiple heterogeneous markers carry valuable complementary information and fusing them might yield interesting findings. However, directly fusing multiple SCCA models might be suboptimal due to undesired gradient domination. We proposed two AdaSMCCA
CRediT authorship contribution statement
Lei Du: Conceptualization, Methodology, Writing - original draft. Jin Zhang: Software, Writing - review & editing. Fang Liu: Software, Visualization, Investigation. Huiai Wang: Validation, Writing - review & editing. Lei Guo: Conceptualization. Junwei Han: Writing - review & editing. the Alzheimer’s Disease Neuroimaging Initiative: .
Declaration of Competing Interest
None.
Acknowledgements
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech;
References (37)
2019 Alzheimer’s disease facts and figures
Alzheimer’s & Dementia
(2019)- et al.
Multimodal data analysis of alzheimer’s disease based on clustering evolutionary random forest
IEEE J Biomed Health Inform
(2020) - et al.
Morbigenous brain region and gene detection with a genetically evolved random neural network cluster approach in late mild cognitive impairment
Bioinformatics
(2020) - et al.
Sparse cca via precision adjusted iterative thresholding
arXiv preprint arXiv:1311.6186
(2013) - et al.
Chromogranin a induces a neurotoxic phenotype in brain microglial cells
J. Biol. Chem.
(1998) - et al.
Strongly reduced volumes of putamen and thalamus in Alzheimer’s disease: an MRI study
Brain
(2008) - et al.
Rage mediates amyloid-beta peptide transport across the blood-brain barrier and accumulation in brain.
Nat. Med.
(2003) - et al.
Structured sparse canonical correlation analysis for brain imaging genetics: an improved graphnet method
Bioinformatics
(2016) - et al.
Identifying diagnosis-specific genotype-phenotype associations via joint multi-task sparse canonical correlation analysis and classification
Bioinformatics
(2020) - et al.
Associating multi-modal brain imaging phenotypes and genetic risk factors via a dirty multi-task learning method
IEEE Trans Med Imaging
(2020)
Multi-task sparse canonical correlation analysis with application to multi-modal brain imaging genetics
IEEE/ACM Trans. Comput. Biol. Bioinf.
Detecting genetic associations with brain imaging phenotypes in Alzheimer’s disease via a novel structured SCCA approach
Med Image Anal
A novel SCCA approach via truncated -norm and truncated group lasso for brain imaging genetics
Bioinformatics
A novel structure-aware sparse learning algorithm for brain imaging genetics
International Conference on Medical Image Computing and Computer Assisted Intervention
Neuroimaging and intervening in memory reconsolidation of human drug addiction
Science China Information Sciences
Joint sparse canonical correlation analysis for detecting differential imaging genetics modules.
Bioinformatics
Neuroimaging of inflammation in alcohol use disorder: a review
Science China Information Sciences
Robust capped norm nonnegative matrix factorization
the 24th ACM International on Conference on Information and Knowledge Management
Cited by (24)
Multi-modal imaging genetics data fusion by deep auto-encoder and self-representation network for Alzheimer's disease diagnosis and biomarkers extraction
2024, Engineering Applications of Artificial IntelligenceIdentifying frequency-dependent imaging genetic associations via hypergraph-structured multi-task sparse canonical correlation analysis
2024, Computers in Biology and MedicineIMAGGS: a radiogenomic framework for identifying multi-way associations in breast cancer subtypes
2024, Journal of Genetics and GenomicsExplainable and programmable hypergraph convolutional network for imaging genetics data fusion
2023, Information FusionDeep multimodality-disentangled association analysis network for imaging genetics in neurodegenerative diseases
2023, Medical Image AnalysisMultimodal cross enhanced fusion network for diagnosis of Alzheimer's disease and subjective memory complaints
2023, Computers in Biology and Medicine
- 1
Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.