Skip to main content

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 483))

Included in the following conference series:

  • 151 Accesses

Abstract

Machine Learning problems are significantly growing in complexity, either due to an increase in the volume of data, to new forms of data, or due to the change of data over time. This poses new challenges that are both technical and scientific. In this paper we propose a Distributed Learning System that runs on top of a Hadoop cluster, leveraging its native functionalities. It is guided by the principle of data locality. Data are distributed across the cluster, so models are also distributed and trained in parallel. Models are thus seen as Ensembles of base models, and predictions are made by combining the predictions of the base models. Moreover, models are replicated and distributed across the cluster, so that multiple nodes can answer requests. This results in a system that is both resilient and with high availability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bashir, H.A., Neville, R.S.: Hybrid evolutionary computation for continuous optimization. arXiv preprint arXiv:1303.3469 (2013)

  2. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  3. Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018)

    Article  Google Scholar 

  4. Chandra, A., Yao, X.: Ensemble learning using multi-objective evolutionary algorithms. J. Math. Model. Algorithms 5(4), 417–445 (2006)

    Article  MathSciNet  Google Scholar 

  5. Chandra, A., Yao, X.: Evolving hybrid ensembles of learning machines for better generalisation. Neurocomputing 69(7–9), 686–700 (2006)

    Article  Google Scholar 

  6. Chen, H., Li, T., Luo, C., Horng, S.J., Wang, G.: A rough set-based method for updating decision rules on attribute values’ coarsening and refining. IEEE Trans. Knowl. Data Eng. 26(12), 2886–2899 (2014)

    Article  Google Scholar 

  7. Chen, J., Wang, C., Wang, R.: Using stacked generalization to combine SVMs in magnitude and shape feature spaces for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 47(7), 2193–2205 (2009)

    Article  Google Scholar 

  8. Christ, M., Kempa-Liehr, A.W., Feindt, M.: Distributed and parallel time series feature extraction for industrial big data applications. arXiv preprint arXiv:1610.07717 (2016)

  9. Gagné, C., Sebag, M., Schoenauer, M., Tomassini, M.: Ensemble learning for free with evolutionary algorithms? In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, pp. 1782–1789 (2007)

    Google Scholar 

  10. Gomes, H.M., Read, J., Bifet, A., Barddal, J.P., Gama, J.: Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explor. Newsl. 21(2), 6–22 (2019)

    Article  Google Scholar 

  11. Leyva, E., González, A., Perez, R.: A set of complexity measures designed for applying meta-learning to instance selection. IEEE Trans. Knowl. Data Eng. 27(2), 354–367 (2014)

    Article  Google Scholar 

  12. Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31(12), 2346–2363 (2018)

    Google Scholar 

  13. Qiu, J., Wu, Q., Ding, G., Xu, Y., Feng, S.: A survey of machine learning for big data processing. EURASIP J. Adv. Sig. Process. 2016(1), 1–16 (2016)

    Article  Google Scholar 

  14. Ramos, D., Carneiro, D., Novais, P.: Using a genetic algorithm to optimize a stacking ensemble in data streaming scenarios. AI Commun. (Preprint) 33, 1–14 (2020)

    Article  MathSciNet  Google Scholar 

  15. Ren, P., Xiao, Y., Chang, X., Huang, P.Y., Li, Z., Chen, X., Wang, X.: A survey of deep active learning. arXiv preprint arXiv:2009.00236 (2020)

  16. Sarnovsky, M., Vronc, M.: Distributed boosting algorithm for classification of text documents. In: 2014 IEEE 12th International Symposium on Applied Machine Intelligence and Informatics (SAMI), pp. 217–220. IEEE (2014)

    Google Scholar 

  17. Suárez, J.L., Garcıa, S., Herrera, F.: pyDML: a Python library for distance metric learning. J. Mach. Learn. Res. 21(96), 1–7 (2020)

    MATH  Google Scholar 

  18. Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., Rellermeyer, J.S.: A survey on distributed machine learning. ACM Comput. Surv. (CSUR) 53(2), 1–33 (2020)

    Article  Google Scholar 

  19. Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Northern Regional Operational Program, Portugal 2020 and European Union, trough European Regional Development Fund (ERDF) in the scope of project number 39900 - 31/SI/2017, and by FCT—Fundação para a Ciência e Tecnologia within projects UIDB/04728/2020 and UIDB/00319/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Davide Carneiro .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carneiro, D., Oliveira, F., Novais, P. (2022). A Data-Locality-Aware Distributed Learning System. In: Novais, P., Carneiro, J., Chamoso, P. (eds) Ambient Intelligence – Software and Applications – 12th International Symposium on Ambient Intelligence. ISAmI 2021. Lecture Notes in Networks and Systems, vol 483. Springer, Cham. https://doi.org/10.1007/978-3-031-06894-2_6

Download citation

Publish with us

Policies and ethics