Maintenance of a Long Running Distributed Genetic Programming System for Solving Problems Requiring Big Data

Hodjat, Babak; Hemberg, Erik; Shahrzad, Hormoz; O’Reilly, Una-May

doi:10.1007/978-1-4939-0375-7_4

Maintenance of a Long Running Distributed Genetic Programming System for Solving Problems Requiring Big Data

Babak Hodjat⁶,
Erik Hemberg⁷,
Hormoz Shahrzad⁶ &
…
Una-May O’Reilly⁷

Chapter
First Online: 01 January 2014

1195 Accesses
3 Citations

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

Abstract

We describe a system, ECStar, that outstrips many scaling aspects of extant genetic programming systems. One instance in the domain of financial strategies has executed for extended durations (months to years) on nodes distributed around the globe. ECStar system instances are almost never stopped and restarted, though they are resource elastic. Instead they are interactively redirected to different parts of the problem space and updated with up-to-date learning. Their non-reproducibility (i.e. single “play of the tape” process) due to their complexity makes them similar to real biological systems. In this contribution we focus upon how ECStar introduces a provocative, important, new paradigm for GP by its sheer size and complexity. ECStar’s scale, volunteer compute nodes and distributed hub-and-spoke design have implications on how a multi-node instance is managed. We describe the set up, deployment, operation and update of an instance of such a large, distributed and long running system. Moreover, we outline how ECStar is designed to allow manual guidance and re-alignment of its evolutionary search trajectory.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Anderson D (2004) BOINC: a system for public-resource computing and storage. In: Proceedings of fifth international workshop on grid computing, Pittsburg, 2004. IEEE/ACM, pp 4–10. doi:10.1109/GRID.2004.14
Google Scholar
Bedau MA (2003) Artificial life: organization, adaptation and complexity from the bottom up. Trends Cogn Sci 7(11):505–512
Article Google Scholar
Bennett III FH, Koza JR, Shipman J, Stiffelman O (1999) Building a parallel computer system for $18,000 that performs a half peta-flop per day. In: Proceedings of the genetic and evolutionary computation conference, Shanghai, vol 2, pp 1484–1490
Google Scholar
Cantu-Paz E (2000) Efficient and accurate parallel genetic algorithms, vol 1. Kluwer, Boston
MATH Google Scholar
Crainic TG, Toulouse M (2010) Parallel meta-heuristics. In: Handbook of metaheuristics. Springer, Berlin/New York, pp 497–541
Google Scholar
Desell T, Anderson DP, Magdon-Ismail M, Newberg H, Szymanski B, Varela CA (2010) An analysis of massively distributed evolutionary algorithms. In: 2010 IEEE World Congress on computational intelligence, Barcelona, pp 18–23
Google Scholar
de Vega FF, Olague G, Trujillo L, Lombraña González D (2012) Customizable execution environments for evolutionary computation using boinc+ virtualization. Nat Comput 1–15
Google Scholar
Gonzalez DL, Laredo JLJ, Vega FF, Guervas JJM (2012) Characterizing fault-tolerance in evolutionary algorithms. In: Fernandez de Vega F, Hidalgo Perez JI, Lanchares J (eds) Parallel architectures and bioinspired algorithms, studies in computational intelligence, vol 415. Springer, Berlin/Heidelberg, pp 77–99. doi:10.1007/978-3-642-28789-3-4, http://dx.doi.org/10.1007/978-3-642-28789-3-4
Hemberg E, Wagy M, Dernoncourt F, Veeramachaneni K, O’Reilly UM (2013a) Efficient training set use for blood pressure prediction in a large scale learning classifier system. In: Sixteenth international workshop on learning classifiers systems, Amsterdam. ACM, New York
Google Scholar
Hemberg E, Wagy M, Dernoncourt F, Veeramachaneni K, O’Reilly UM (2013b) Imprecise selection and fitness approximation in a large-scale evolutionary rule based system for blood pressure prediction. In: Proceedings of the fifthteenth international conference on genetic and evolutionary computation conference – GBML, GECCO’13, Amsterdam. ACM, New York
Google Scholar
Hodjat B, Shahrzad H (2012) Introducing an age-varying fitness estimation function. In: Genetic programming theory and practice x. Springer, New York
Google Scholar
Langdon WB (2012) Distilling genechips with GP on the emerald GPU supercomputer. ACM SIGEVOlution 6(1):16–22
Article Google Scholar
Merelo J, Mora A, Fernandes C, Esparcia-Alcazar AI, Laredo JL (2012) Pool vs. island based evolutionary algorithms: an initial exploration. In: 2012 seventh international conference on P2P, parallel, grid, cloud and internet computing (3PGCIC), Victoria. IEEE, pp 19–24
Google Scholar
O’Reilly UM, Wagy M, Hodjat B (2012) Ec-Star: a massive-scale, hub and spoke, distributed genetic programming system. In: Genetic programming theory and practice x. Springer, New York
Google Scholar
Rivest R (1987) Learning decision lists. Mach Learn 2(3):229–246
Google Scholar
Scheibenpflug A, Wagner S, Kronberger G, Affenzeller M (2012) Heuristiclab hive – an open source environment for parallel and distributed execution of heuristic optimization algorithms. In: 1st Australian conference on the applications of systems engineering ACASE’12, Sydney, p 63
Google Scholar
Smaoui M, Garbey M (2013) Improving volunteer computing scheduling for evolutionary algorithms. Future Gener Comput Syst 29(1):1–14
Article Google Scholar
Tomassini M (2005) Spatially structured evolutionary algorithms. Springer, Berlin/New York
MATH Google Scholar

Download references

Acknowledgements

We acknowledge the generous support of Li Ka-Shing Foundation.

Author information

Authors and Affiliations

Genetic Finance, San Francisco, CA, 94105, USA
Babak Hodjat & Hormoz Shahrzad
ALFA Group, CSAIL, MIT, Cambridge, MA, USA
Erik Hemberg & Una-May O’Reilly

Authors

Babak Hodjat
View author publications
You can also search for this author in PubMed Google Scholar
Erik Hemberg
View author publications
You can also search for this author in PubMed Google Scholar
Hormoz Shahrzad
View author publications
You can also search for this author in PubMed Google Scholar
Una-May O’Reilly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Babak Hodjat .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, Michigan, USA
Rick Riolo
Inst for Quantitative Biomedical Science, Dartmouth Medical School, Lebanon, New Hampshire, USA
Jason H. Moore
Evolved Analytics, Midland, Michigan, USA
Mark Kotanchek

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hodjat, B., Hemberg, E., Shahrzad, H., O’Reilly, UM. (2014). Maintenance of a Long Running Distributed Genetic Programming System for Solving Problems Requiring Big Data. In: Riolo, R., Moore, J., Kotanchek, M. (eds) Genetic Programming Theory and Practice XI. Genetic and Evolutionary Computation. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-0375-7_4

Download citation

DOI: https://doi.org/10.1007/978-1-4939-0375-7_4
Published: 10 March 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-0374-0
Online ISBN: 978-1-4939-0375-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics