Skip to main content

Explainable Artificial Neural Network for Recurrent Venous Thromboembolism Based on Plasma Proteomics

  • Conference paper
  • First Online:
Book cover Computational Methods in Systems Biology (CMSB 2021)

Abstract

Venous thromboembolism (VTE) is the third most common cardiovascular disease, affecting \(\sim \)1,000,000 individuals each year in Europe. VTE is characterized by an annual recurrent rate of \(\sim \)6%, and \(\sim \)30% of patients with unprovoked VTE will face a recurrent event after a six-month course of anticoagulant treatment. Even if guidelines recommend life-long treatment for these patients, about \(\sim \)70% of them will never experience a recurrence and will receive unnecessary lifelong anti-coagulation that is associated with increased risk of bleeding and is highly costly for the society. There is then urgent need to identify biomarkers that could distinguish VTE patients with high risk of recurrence from low-risk patients.

Capitalizing on a sample of 913 patients followed up for the risk of VTE recurrence during a median of \(\sim \)10 years and profiled for 376 plasma proteomic antibodies, we here develop an artificial neural network (ANN) based strategy to identify a proteomic signature that helps discriminating patients at low and high risk of recurrence. In a first stage, we implemented a Repeated Editing Nearest Neighbors algorithm to select a homogeneous sub-sample of VTE patients. This sub-sample was then split in a training and a testing sets. The former was used for training our ANN, the latter for testing its discriminatory properties. In the testing dataset, our ANN led to an accuracy of 0.86 that compared to an accuracy of 0.79 as provided by a random forest classifier. We then applied a Deep Learning Important FeaTures (DeepLIFT) – based approach to identify the variables that contribute the most to the ANN predictions. In addition to sex, the proposed DeepLIFT strategy identified 6 important proteins (DDX1, HTRA3, LRG1, MAST2, NFATC4 and STXBP5) whose exact roles in the etiology of VTE recurrence now deserve further experimental validations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Angermueller, C., Pärnamaa, T., Parts, L., Stegle, O.: Deep learning for computational biology. Mol. Syst. Biol. 12(7), 878 (2016)

    Article  Google Scholar 

  2. Boks, M.P.M., Schipper, M., Schubart, C.D., Sommer, I.E., Kahn, R.S., Ophoff, R.A.: Investigating gene-environment interaction in complex diseases: increasing power by selective sampling for environmental exposure. Int. J. Epidemiol. 36(6), 1363–1369 (2007)

    Article  CAS  Google Scholar 

  3. Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. 227–236. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-76153-9_28

  4. Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2018)

    Article  Google Scholar 

  5. Chollet, F., et al.: Keras (2015). https://keras.io

  6. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)

    Article  Google Scholar 

  7. Commenges, D., Letenneur, L., Joly, P., Alioum, A., Dartigues, J.-F.: Modelling age-specific risk: application to dementia. Stat. Med. 17(17), 1973–1988 (1998)

    Article  CAS  Google Scholar 

  8. Douketis, J.D., Crowther, M.A., Foster, G.A., Ginsberg, J.S.: Does the location of thrombosis determine the risk of disease recurrence in patients with proximal deep vein thrombosis? Am. J. Med. 110(7), 515–519 (2001)

    Article  CAS  Google Scholar 

  9. Douketis, J.D., Kearon, C., Bates, S., Duku, E.K., Ginsberg, J.S.: Risk of fatal pulmonary embolism in patients with treated venous thromboembolism. Jama 279(6), 458–462 (1998)

    Article  CAS  Google Scholar 

  10. Drobin, K., Nilsson, P., Schwenk, J.M.: Highly multiplexed antibody suspension bead arrays for plasma protein profiling. In: Bäckvall, H., Lehtiö, J. (eds.) The Low Molecular Weight Proteome. MIMB, vol. 1023, pp. 137–145. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7209-4_8

  11. Farzamnia, H., Rabiei, K., Sadeghi, M., Roghani, F.: The predictive factors of recurrent deep vein thrombosis. ARYA Atherosclerosis 7(3), 123 (2011)

    PubMed  PubMed Central  Google Scholar 

  12. Feng, W., Huang, W., Ren, J.: Class imbalance ensemble learning based on the margin theory. Appl. Sci. 8(5), 815 (2018)

    Article  Google Scholar 

  13. Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(4), 463–484 (2011)

    Article  Google Scholar 

  14. Goto, S., et al.: Assessment of outcomes among patients with venous thromboembolism with and without chronic kidney disease. JAMA Netw. Open 3(10), e2022886–e2022886 (2020)

    Article  Google Scholar 

  15. Guan, D., Yuan, W., Lee, Y.-K., Lee, S.: Nearest neighbor editing aided by unlabeled data. Inf. Sci. 179(13), 2273–2282 (2009)

    Article  Google Scholar 

  16. Hahnloser, R.H.R., Sarpeshkar, R., Mahowald, M.A., Douglas, R.J., Seung, H.S.: Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947–951 (2000)

    Article  CAS  Google Scholar 

  17. Hansson, P.-O., Sörbo, J., Eriksson, H.: Recurrent venous thromboembolism after deep vein thrombosis: incidence and risk factors. Arch. Intern. Med. 160(6), 769–774 (2000)

    Article  CAS  Google Scholar 

  18. Heaton, J.: AIFH, volume 3: deep learning and neural networks. J. Chem. Inf. Model. 3, Heaton Research Inc (2015)

    Google Scholar 

  19. Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The “wake-sleep’’ algorithm for unsupervised neural networks. Science 268(5214), 1158–1161 (1995)

    Article  CAS  Google Scholar 

  20. Jensen, S.B., et al.: Discovery of novel plasma biomarkers for future incident venous thromboembolism by untargeted synchronous precursor selection mass spectrometry proteomics. J. Thromb. Haemost. 16(9), 1763–1774 (2018)

    Article  CAS  Google Scholar 

  21. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  22. Kong, J., Kowalczyk, W., Nguyen, D.A., Bäck, T., Menzel, S.: Hyperparameter optimisation for improving classification under class imbalance. In: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 3072–3078. IEEE (2019)

    Google Scholar 

  23. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  CAS  Google Scholar 

  24. van der Maaten, L., Hinton, G.: Visualizing data using T-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    Google Scholar 

  25. Martins, T.D., Annichino-Bizzacchi, J.M., Romano, A.V.C., Filho, R.M.: Artificial neural networks for prediction of recurrent venous thromboembolism. Int. J. Med. Inform. 141, 104221 (2020)

    Article  CAS  Google Scholar 

  26. Matthews, B.W.: Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Struct. 405(2), 442–451 (1975)

    Google Scholar 

  27. McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  28. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinform. 18(5), 851–869 (2017)

    PubMed  Google Scholar 

  29. Morange, P.-E., et al.: A rare coding mutation in the MAST2 gene causes venous thrombosis in a French family with unexplained thrombophilia: the Breizh MAST2 Arg89Gln variant. PLoS Genet. 17(1), e1009284 (2021)

    Article  CAS  Google Scholar 

  30. Morange, P.-E., Suchon, P., Trégouët, D.-A.: Genetics of venous thrombosis: update in 2015. Thromb. Haemost. 114(11), 910–919 (2015)

    Article  Google Scholar 

  31. Ocak, G., et al.: Risk of venous thrombosis in patients with chronic kidney disease: identification of high-risk groups. J Thromb. Haemost. 11(4), 627–633 (2013)

    Article  CAS  Google Scholar 

  32. O’Malley, T., Bursztein, E., Long, J., Chollet, F., Jin, H., Invernizzi, L., et al.: Keras Tuner (2019). https://github.com/keras-team/keras-tuner

  33. Oshiro, T.M., Perez, P.S., Baranauskas, J.A.: How many trees in a random forest? In: Perner, P. (ed.) MLDM 2012. LNCS (LNAI), vol. 7376, pp. 154–168. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31537-4_13

    Chapter  Google Scholar 

  34. Oudot-Mellakh, T., et al.: Genome wide association study for plasma levels of natural anticoagulant inhibitors and protein C anticoagulant pathway: the MARTHA project. Br. J. Haematol. 157(2), 230–239 (2012)

    Article  CAS  Google Scholar 

  35. Pattaro, C., et al.: Genome-wide association and functional follow-up reveals new loci for kidney function. PLoS Genet. 8(3), e1002584 (2012)

    Article  Google Scholar 

  36. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  37. Prandoni, P., et al.: The long-term clinical course of acute deep venous thrombosis. Ann. Intern. Med. 125(1), 1–7 (1996)

    Article  CAS  Google Scholar 

  38. Razzak, M.I., Naz, S., Zaib, A.: Deep learning for medical image processing: overview, challenges and the future. In: Dey, N., Ashour, A.S., Borra, S. (eds.) Classification in BioApps. LNCVB, vol. 26, pp. 323–350. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-65981-7_12

    Chapter  Google Scholar 

  39. Razzaq, M., et al.: An artificial neural network approach integrating plasma proteomics and genetic data identifies PLXNA4 as a new susceptibility locus for pulmonary embolism. medRxiv (2020)

    Google Scholar 

  40. Romano, A.V.C., Martins, T.D., Maciel, R., De Paula, E.V., Annichino-Bizzacchi, J.M.: Artificial neural network for prediction of venous thrombosis recurrence. Blood 128(22), 3771 (2016). ISSN 0006–4971

    Google Scholar 

  41. Gnana Sheela, K., Deepa, S.N.: Review on methods to fix number of hidden neurons in neural networks. Math. Probl. Eng. 2013 (2013)

    Google Scholar 

  42. Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685 (2017)

  43. Siragusa, S., Cosmi, B., Piovella, F., Hirsh, J., Ginsberg, J.S.: Low-molecular-weight heparins and unfractionated heparin in the treatment of patients with acute venous thromboembolism: results of a meta-analysis. Am. J. Med. 100(3), 269–277 (1996)

    Article  CAS  Google Scholar 

  44. Stevens, H., Peter, K., Tran, H., McFadyen, J.: Predicting the risk of recurrent venous thromboembolism: current challenges and future opportunities. J. Clin. Med. 9(5), 1582 (2020)

    Article  Google Scholar 

  45. Tagalakis, V., et al.: Men had a higher risk of recurrent venous thromboembolism than women: a large population study. Gender Med. 9(1), 33–43 (2012)

    Article  Google Scholar 

  46. Thiébaut, A.C.M., Bénichou, J.: Choice of time-scale in Cox’s model analysis of epidemiologic cohort data: a simulation study. Stat. Med. 23(24), 3803–3820 (2004)

    Article  Google Scholar 

  47. Tomek, I., et al.: An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC–6(6), 448–452 (1976)

    Google Scholar 

  48. van Hylckama Vlieg, A., et al.: Genetic variations associated with recurrent venous thrombosis. Circ. Cardiovasc. Genet. 7(6), 806–813 (2014)

    Article  Google Scholar 

  49. Xu, S., Chen, L.: A novel approach for determining the optimal number of hidden layer neurons for FNN’s and its application in data mining. In: 5th International Conference on Information Technology and Applications (ICITA) (2008)

    Google Scholar 

  50. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

Download references

Acknowledgments

M.R was financially supported by the GENMED Laboratory of Excellence on Medical Genomics [ANR-10-LABX-0013], a research program managed by the National Research Agency (ANR) as part of the French Investment for the Future. DA.T was partially supported by the EPIDEMIOM-VTE Senior Chair from the Initiative of Excellence of the University of Bordeaux. The proteomics screening was financed by a grant from Stockholm County Council (SLL 2017-0842) and from Familjen Erling Perssons Foundation. G.M has benefited from training offered by the EUR DPH, a PhD program supported within the framework of the PIA3 (Investment for the future), project reference 17-EURE-0019.

Additional Information. The script to build the model can be found on the following link: https://github.com/misbahch6/paper_script.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Misbah Razzaq or David Alexandre Tregouet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Razzaq, M. et al. (2021). Explainable Artificial Neural Network for Recurrent Venous Thromboembolism Based on Plasma Proteomics. In: Cinquemani, E., Paulevé, L. (eds) Computational Methods in Systems Biology. CMSB 2021. Lecture Notes in Computer Science(), vol 12881. Springer, Cham. https://doi.org/10.1007/978-3-030-85633-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85633-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85632-8

  • Online ISBN: 978-3-030-85633-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics