Abstract
Venous thromboembolism (VTE) is the third most common cardiovascular disease, affecting \(\sim \)1,000,000 individuals each year in Europe. VTE is characterized by an annual recurrent rate of \(\sim \)6%, and \(\sim \)30% of patients with unprovoked VTE will face a recurrent event after a six-month course of anticoagulant treatment. Even if guidelines recommend life-long treatment for these patients, about \(\sim \)70% of them will never experience a recurrence and will receive unnecessary lifelong anti-coagulation that is associated with increased risk of bleeding and is highly costly for the society. There is then urgent need to identify biomarkers that could distinguish VTE patients with high risk of recurrence from low-risk patients.
Capitalizing on a sample of 913 patients followed up for the risk of VTE recurrence during a median of \(\sim \)10 years and profiled for 376 plasma proteomic antibodies, we here develop an artificial neural network (ANN) based strategy to identify a proteomic signature that helps discriminating patients at low and high risk of recurrence. In a first stage, we implemented a Repeated Editing Nearest Neighbors algorithm to select a homogeneous sub-sample of VTE patients. This sub-sample was then split in a training and a testing sets. The former was used for training our ANN, the latter for testing its discriminatory properties. In the testing dataset, our ANN led to an accuracy of 0.86 that compared to an accuracy of 0.79 as provided by a random forest classifier. We then applied a Deep Learning Important FeaTures (DeepLIFT) – based approach to identify the variables that contribute the most to the ANN predictions. In addition to sex, the proposed DeepLIFT strategy identified 6 important proteins (DDX1, HTRA3, LRG1, MAST2, NFATC4 and STXBP5) whose exact roles in the etiology of VTE recurrence now deserve further experimental validations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Angermueller, C., Pärnamaa, T., Parts, L., Stegle, O.: Deep learning for computational biology. Mol. Syst. Biol. 12(7), 878 (2016)
Boks, M.P.M., Schipper, M., Schubart, C.D., Sommer, I.E., Kahn, R.S., Ophoff, R.A.: Investigating gene-environment interaction in complex diseases: increasing power by selective sampling for environmental exposure. Int. J. Epidemiol. 36(6), 1363–1369 (2007)
Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. 227–236. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-76153-9_28
Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2018)
Chollet, F., et al.: Keras (2015). https://keras.io
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)
Commenges, D., Letenneur, L., Joly, P., Alioum, A., Dartigues, J.-F.: Modelling age-specific risk: application to dementia. Stat. Med. 17(17), 1973–1988 (1998)
Douketis, J.D., Crowther, M.A., Foster, G.A., Ginsberg, J.S.: Does the location of thrombosis determine the risk of disease recurrence in patients with proximal deep vein thrombosis? Am. J. Med. 110(7), 515–519 (2001)
Douketis, J.D., Kearon, C., Bates, S., Duku, E.K., Ginsberg, J.S.: Risk of fatal pulmonary embolism in patients with treated venous thromboembolism. Jama 279(6), 458–462 (1998)
Drobin, K., Nilsson, P., Schwenk, J.M.: Highly multiplexed antibody suspension bead arrays for plasma protein profiling. In: Bäckvall, H., Lehtiö, J. (eds.) The Low Molecular Weight Proteome. MIMB, vol. 1023, pp. 137–145. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7209-4_8
Farzamnia, H., Rabiei, K., Sadeghi, M., Roghani, F.: The predictive factors of recurrent deep vein thrombosis. ARYA Atherosclerosis 7(3), 123 (2011)
Feng, W., Huang, W., Ren, J.: Class imbalance ensemble learning based on the margin theory. Appl. Sci. 8(5), 815 (2018)
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2011)
Goto, S., et al.: Assessment of outcomes among patients with venous thromboembolism with and without chronic kidney disease. JAMA Netw. Open 3(10), e2022886–e2022886 (2020)
Guan, D., Yuan, W., Lee, Y.-K., Lee, S.: Nearest neighbor editing aided by unlabeled data. Inf. Sci. 179(13), 2273–2282 (2009)
Hahnloser, R.H.R., Sarpeshkar, R., Mahowald, M.A., Douglas, R.J., Seung, H.S.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947–951 (2000)
Hansson, P.-O., Sörbo, J., Eriksson, H.: Recurrent venous thromboembolism after deep vein thrombosis: incidence and risk factors. Arch. Intern. Med. 160(6), 769–774 (2000)
Heaton, J.: AIFH, volume 3: deep learning and neural networks. J. Chem. Inf. Model. 3, Heaton Research Inc (2015)
Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The “wake-sleep’’ algorithm for unsupervised neural networks. Science 268(5214), 1158–1161 (1995)
Jensen, S.B., et al.: Discovery of novel plasma biomarkers for future incident venous thromboembolism by untargeted synchronous precursor selection mass spectrometry proteomics. J. Thromb. Haemost. 16(9), 1763–1774 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kong, J., Kowalczyk, W., Nguyen, D.A., Bäck, T., Menzel, S.: Hyperparameter optimisation for improving classification under class imbalance. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 3072–3078. IEEE (2019)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
van der Maaten, L., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Martins, T.D., Annichino-Bizzacchi, J.M., Romano, A.V.C., Filho, R.M.: Artificial neural networks for prediction of recurrent venous thromboembolism. Int. J. Med. Inform. 141, 104221 (2020)
Matthews, B.W.: Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Struct. 405(2), 442–451 (1975)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinform. 18(5), 851–869 (2017)
Morange, P.-E., et al.: A rare coding mutation in the MAST2 gene causes venous thrombosis in a French family with unexplained thrombophilia: the Breizh MAST2 Arg89Gln variant. PLoS Genet. 17(1), e1009284 (2021)
Morange, P.-E., Suchon, P., Trégouët, D.-A.: Genetics of venous thrombosis: update in 2015. Thromb. Haemost. 114(11), 910–919 (2015)
Ocak, G., et al.: Risk of venous thrombosis in patients with chronic kidney disease: identification of high-risk groups. J Thromb. Haemost. 11(4), 627–633 (2013)
O’Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H., Invernizzi, L., et al.: Keras Tuner (2019). https://github.com/keras-team/keras-tuner
Oshiro, T.M., Perez, P.S., Baranauskas, J.A.: How many trees in a random forest? In: Perner, P. (ed.) MLDM 2012. LNCS (LNAI), vol. 7376, pp. 154–168. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31537-4_13
Oudot-Mellakh, T., et al.: Genome wide association study for plasma levels of natural anticoagulant inhibitors and protein C anticoagulant pathway: the MARTHA project. Br. J. Haematol. 157(2), 230–239 (2012)
Pattaro, C., et al.: Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet. 8(3), e1002584 (2012)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Prandoni, P., et al.: The long-term clinical course of acute deep venous thrombosis. Ann. Intern. Med. 125(1), 1–7 (1996)
Razzak, M.I., Naz, S., Zaib, A.: Deep learning for medical image processing: overview, challenges and the future. In: Dey, N., Ashour, A.S., Borra, S. (eds.) Classification in BioApps. LNCVB, vol. 26, pp. 323–350. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-65981-7_12
Razzaq, M., et al.: An artificial neural network approach integrating plasma proteomics and genetic data identifies PLXNA4 as a new susceptibility locus for pulmonary embolism. medRxiv (2020)
Romano, A.V.C., Martins, T.D., Maciel, R., De Paula, E.V., Annichino-Bizzacchi, J.M.: Artificial neural network for prediction of venous thrombosis recurrence. Blood 128(22), 3771 (2016). ISSN 0006–4971
Gnana Sheela, K., Deepa, S.N.: Review on methods to fix number of hidden neurons in neural networks. Math. Probl. Eng. 2013 (2013)
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685 (2017)
Siragusa, S., Cosmi, B., Piovella, F., Hirsh, J., Ginsberg, J.S.: Low-molecular-weight heparins and unfractionated heparin in the treatment of patients with acute venous thromboembolism: results of a meta-analysis. Am. J. Med. 100(3), 269–277 (1996)
Stevens, H., Peter, K., Tran, H., McFadyen, J.: Predicting the risk of recurrent venous thromboembolism: current challenges and future opportunities. J. Clin. Med. 9(5), 1582 (2020)
Tagalakis, V., et al.: Men had a higher risk of recurrent venous thromboembolism than women: a large population study. Gender Med. 9(1), 33–43 (2012)
Thiébaut, A.C.M., Bénichou, J.: Choice of time-scale in Cox’s model analysis of epidemiologic cohort data: a simulation study. Stat. Med. 23(24), 3803–3820 (2004)
Tomek, I., et al.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC–6(6), 448–452 (1976)
van Hylckama Vlieg, A., et al.: Genetic variations associated with recurrent venous thrombosis. Circ. Cardiovasc. Genet. 7(6), 806–813 (2014)
Xu, S., Chen, L.: A novel approach for determining the optimal number of hidden layer neurons for FNN’s and its application in data mining. In: 5th International Conference on Information Technology and Applications (ICITA) (2008)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Acknowledgments
M.R was financially supported by the GENMED Laboratory of Excellence on Medical Genomics [ANR-10-LABX-0013], a research program managed by the National Research Agency (ANR) as part of the French Investment for the Future. DA.T was partially supported by the EPIDEMIOM-VTE Senior Chair from the Initiative of Excellence of the University of Bordeaux. The proteomics screening was financed by a grant from Stockholm County Council (SLL 2017-0842) and from Familjen Erling Perssons Foundation. G.M has benefited from training offered by the EUR DPH, a PhD program supported within the framework of the PIA3 (Investment for the future), project reference 17-EURE-0019.
Additional Information. The script to build the model can be found on the following link: https://github.com/misbahch6/paper_script.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Razzaq, M. et al. (2021). Explainable Artificial Neural Network for Recurrent Venous Thromboembolism Based on Plasma Proteomics. In: Cinquemani, E., Paulevé, L. (eds) Computational Methods in Systems Biology. CMSB 2021. Lecture Notes in Computer Science(), vol 12881. Springer, Cham. https://doi.org/10.1007/978-3-030-85633-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-85633-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85632-8
Online ISBN: 978-3-030-85633-5
eBook Packages: Computer ScienceComputer Science (R0)