Autoencoder-Based Attribute Noise Handling Method for Medical Data

Ranvier, Thomas; Elgazel, Haytham; Coquery, Emmanuel; Benabdeslem, Khalid

doi:10.1007/978-981-99-1645-0_18

Thomas Ranvier ORCID: orcid.org/0000-0001-9250-9530¹⁰,
Haytham Elgazel¹⁰,
Emmanuel Coquery¹⁰ &
…
Khalid Benabdeslem¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1793))

Included in the following conference series:

International Conference on Neural Information Processing

842 Accesses

Abstract

Medical datasets are particularly subject to attribute noise, that is, missing and erroneous values. Attribute noise is known to be largely detrimental to learning performances. To maximize future learning performances, it is primordial to deal with attribute noise before performing any inference. We propose a simple autoencoder-based preprocessing method that can correct mixed-type tabular data corrupted by attribute noise. No other method currently exists to entirely handle attribute noise in tabular data. We experimentally demonstrate that our method outperforms both state-of-the-art imputation methods and noise correction methods on several real-world medical datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Barnard, J., Meng, X.-L.: Applications of multiple imputation in medical studies: from AIDS to NHANES. Stat. Meth. Med. Res. 8(1). ISSN 0962–2802. https://doi.org/10.1177/096228029900800103
van Buuren, S., Groothuis-Oudshoorn, K.: mice: multivariate Imputation by Chained Equations in R. Journal of Statistical Software 45(3), 1–67 (2011). ISSN 1548–7660. https://doi.org/10.18637/jss.v045.i03
Golovenkin, S.E., et al.: Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data. GigaScience 9(11), giaa128, November 2020. ISSN 2047–217X. https://doi.org/10.1093/gigascience/giaa128
Gondara, L., Wang, K.: MIDA: multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_21
Chapter Google Scholar
Mazumder, R., Hastie, T., Tibshirani, R.: Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. JMLR 11, 2287–2322 (2010)
MathSciNet MATH Google Scholar
Sagheer, S.V.M., George, S.N.: A review on medical image denoising algorithms. Biomed. Sig. Process. Control 61 (2020). ISSN 1746–8094. https://doi.org/10.1016/j.bspc.2020.102036
Muzellec, B., Josse, J., Boyer, C., Cuturi, M.: Missing data imputation using optimal transport. In: Proceedings of the 37th International Conference on Machine Learning, pp. 7130–7140. PMLR, November 2020. ISSN: 2640–3498 (2020)
Google Scholar
Pereira, R.C., Santos, M., Rodrigues, P., Abreu, P.H.: Reviewing autoencoders for missing data imputation: technical trends, applications and outcomes. J. Artif. Intell. Res. 69, December 2020. https://doi.org/10.1613/jair.1.12312
Stef, V.B.: Flexible Imputation of Missing Data, 2nd edn.. Chapman & Hall (2018)
Google Scholar
Stekhoven, D.J., Bühlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1) (2012). ISSN 1367–4803. https://doi.org/10.1093/bioinformatics/btr597
Teng, C.M.: Polishing Blemishes: issues in data correction. IEEE Intell. Syst. 19(2) (2004). ISSN 1941–1294. https://doi.org/10.1109/MIS.2004.1274909. Conference Name: IEEE Intelligent Systems
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Deep image prior. Int. J. Comput. Vis. 128(7), 1867–1888 (2020). https://doi.org/10.1007/s11263-020-01303-4
Article Google Scholar
Van Hulse, J.D., Khoshgoftaar, T.M., Huang, H.: The pairwise attribute noise detection algorithm. Knowl. Inf. Syst. 11(2), 171–190 (2007). ISSN 0219–1377, 0219–3116. https://doi.org/10.1007/s10115-006-0022-x
Yan, l., et al.: An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2(5), 283–288 (2020). ISSN 2522–5839. https://doi.org/10.1038/s42256-020-0180-7
Yang, Y., Wu, X., Zhu, X.: Dealing with predictive-but-unpredictable attributes in noisy data sources. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 471–483. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30116-5_43
Chapter Google Scholar
Yoon, J., Jordon, J., Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: Proceedings of the 35th International Conference on Machine Learning, pp. 5689–5698. PMLR, July 2018. ISSN: 2640–3498
Google Scholar
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004). ISSN 1573–7462. https://doi.org/10.1007/s10462-004-0751-8

Download references

Acknowledgments

This research is supported by the European Union’s Horizon 2020 research and innovation program under grant agreement No 875171, project QUALITOP (Monitoring multidimensional aspects of QUAlity of Life after cancer ImmunoTherapy - an Open smart digital Platform for personalized prevention and patient management).

Author information

Authors and Affiliations

Univ Lyon, UCBL, CNRS, INSA Lyon, LIRIS, UMR 5205, 43 bd du 11 Novembre 1918, 69622, Villeurbanne, France
Thomas Ranvier, Haytham Elgazel, Emmanuel Coquery & Khalid Benabdeslem

Authors

Thomas Ranvier
View author publications
You can also search for this author in PubMed Google Scholar
Haytham Elgazel
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Coquery
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Benabdeslem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Ranvier .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ranvier, T., Elgazel, H., Coquery, E., Benabdeslem, K. (2023). Autoencoder-Based Attribute Noise Handling Method for Medical Data. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1793. Springer, Singapore. https://doi.org/10.1007/978-981-99-1645-0_18

Download citation

DOI: https://doi.org/10.1007/978-981-99-1645-0_18
Published: 14 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1644-3
Online ISBN: 978-981-99-1645-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Autoencoder-Based Attribute Noise Handling Method for Medical Data