A Data-Locality-Aware Distributed Learning System

Carneiro, Davide; Oliveira, Filipe; Novais, Paulo

doi:10.1007/978-3-031-06894-2_6

Davide Carneiro^12,13,
Filipe Oliveira¹² &
Paulo Novais¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 483))

Included in the following conference series:

International Symposium on Ambient Intelligence

151 Accesses

Abstract

Machine Learning problems are significantly growing in complexity, either due to an increase in the volume of data, to new forms of data, or due to the change of data over time. This poses new challenges that are both technical and scientific. In this paper we propose a Distributed Learning System that runs on top of a Hadoop cluster, leveraging its native functionalities. It is guided by the principle of data locality. Data are distributed across the cluster, so models are also distributed and trained in parallel. Models are thus seen as Ensembles of base models, and predictions are made by combining the predictions of the base models. Moreover, models are replicated and distributed across the cluster, so that multiple nodes can answer requests. This results in a system that is both resilient and with high availability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bashir, H.A., Neville, R.S.: Hybrid evolutionary computation for continuous optimization. arXiv preprint arXiv:1303.3469 (2013)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)
Article Google Scholar
Chandra, A., Yao, X.: Ensemble learning using multi-objective evolutionary algorithms. J. Math. Model. Algorithms 5(4), 417–445 (2006)
Article MathSciNet Google Scholar
Chandra, A., Yao, X.: Evolving hybrid ensembles of learning machines for better generalisation. Neurocomputing 69(7–9), 686–700 (2006)
Article Google Scholar
Chen, H., Li, T., Luo, C., Horng, S.J., Wang, G.: A rough set-based method for updating decision rules on attribute values’ coarsening and refining. IEEE Trans. Knowl. Data Eng. 26(12), 2886–2899 (2014)
Article Google Scholar
Chen, J., Wang, C., Wang, R.: Using stacked generalization to combine SVMs in magnitude and shape feature spaces for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 47(7), 2193–2205 (2009)
Article Google Scholar
Christ, M., Kempa-Liehr, A.W., Feindt, M.: Distributed and parallel time series feature extraction for industrial big data applications. arXiv preprint arXiv:1610.07717 (2016)
Gagné, C., Sebag, M., Schoenauer, M., Tomassini, M.: Ensemble learning for free with evolutionary algorithms? In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 1782–1789 (2007)
Google Scholar
Gomes, H.M., Read, J., Bifet, A., Barddal, J.P., Gama, J.: Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explor. Newsl. 21(2), 6–22 (2019)
Article Google Scholar
Leyva, E., González, A., Perez, R.: A set of complexity measures designed for applying meta-learning to instance selection. IEEE Trans. Knowl. Data Eng. 27(2), 354–367 (2014)
Article Google Scholar
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31(12), 2346–2363 (2018)
Google Scholar
Qiu, J., Wu, Q., Ding, G., Xu, Y., Feng, S.: A survey of machine learning for big data processing. EURASIP J. Adv. Sig. Process. 2016(1), 1–16 (2016)
Article Google Scholar
Ramos, D., Carneiro, D., Novais, P.: Using a genetic algorithm to optimize a stacking ensemble in data streaming scenarios. AI Commun. (Preprint) 33, 1–14 (2020)
Article MathSciNet Google Scholar
Ren, P., Xiao, Y., Chang, X., Huang, P.Y., Li, Z., Chen, X., Wang, X.: A survey of deep active learning. arXiv preprint arXiv:2009.00236 (2020)
Sarnovsky, M., Vronc, M.: Distributed boosting algorithm for classification of text documents. In: 2014 IEEE 12th International Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 217–220. IEEE (2014)
Google Scholar
Suárez, J.L., Garcıa, S., Herrera, F.: pyDML: a Python library for distance metric learning. J. Mach. Learn. Res. 21(96), 1–7 (2020)
MATH Google Scholar
Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., Rellermeyer, J.S.: A survey on distributed machine learning. ACM Comput. Surv. (CSUR) 53(2), 1–33 (2020)
Article Google Scholar
Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Northern Regional Operational Program, Portugal 2020 and European Union, trough European Regional Development Fund (ERDF) in the scope of project number 39900 - 31/SI/2017, and by FCT—Fundação para a Ciência e Tecnologia within projects UIDB/04728/2020 and UIDB/00319/2020.

Author information

Authors and Affiliations

CIICESI, Escola Superior de Tecnologia e Gestão, Instituto Politécnico do Porto, Porto, Portugal
Davide Carneiro & Filipe Oliveira
Centro ALGORITMI, Universidade do Minho, Braga, Portugal
Davide Carneiro & Paulo Novais

Authors

Davide Carneiro
View author publications
You can also search for this author in PubMed Google Scholar
Filipe Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Paulo Novais
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Davide Carneiro .

Editor information

Editors and Affiliations

Departamento de Informática, University of Minho, Braga, Portugal
Paulo Novais
ISEP/GECAD, Porto, Portugal
Joao Carneiro
Biotechnology, Intelligent Systems, University of Salamanca, Salamanca, Salamanca, Spain
Pablo Chamoso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carneiro, D., Oliveira, F., Novais, P. (2022). A Data-Locality-Aware Distributed Learning System. In: Novais, P., Carneiro, J., Chamoso, P. (eds) Ambient Intelligence – Software and Applications – 12th International Symposium on Ambient Intelligence. ISAmI 2021. Lecture Notes in Networks and Systems, vol 483. Springer, Cham. https://doi.org/10.1007/978-3-031-06894-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-06894-2_6
Published: 01 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06893-5
Online ISBN: 978-3-031-06894-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics