Abstract
A market-based algorithm is presented which autonomously apportions complex tasks to multiple cooperating agents giving each agent the motivation of improving performance of the whole system. A specific model, called “The Hayek Machine” is proposed and tested on a simulated Blocks World (BW) planning problem. Hayek learns to solve more complex BW problems than any previous learning algorithm. Given intermediate reward and simple features, it has learned to efficiently solve arbitrary BW problems. The Hayek Machine can also be seen as a model of evolutionary economics.
Article PDF
Similar content being viewed by others
References
Anderson, E.S. (1996). Evolutionary Economics: Post-schumpeterian contributions. London: Pinter Publishers.
Anderson, P.W., Arrow, K.J., & Pines, D. (1998). The economy as an evolving complex system. Redwood City, CA: Addison Wesley.
Bacchus, F., & Kabanza, F. (1995). Using temporal logic to control search in planning. Unpublished document available from http://logos.uwaterloo.ca/tlplan/tlplan.html. A short version was presented at the European Workshop on Planning.
Baum, E.B. (1996). Toward a model of mind as a laissez-faire economy of idiots, extended abstract. In L. Saitta (Ed.), Proc. 13th ICML '96 (pp. 28–36). San Francisco, CA: Morgan Kaufman.
Baum, E.B. (1998). Manifesto for an evolutionary economics of intelligence. In C.M. Bishop (Ed.), Neural networks and machine learning. Springer-Verlag.
Baum, E.B., Boneh, D., & Garrett, C. (1995). On genetic algorithms. COLT '95: Proceedings of the Eighth Annual Conference on Computational Learning Theory (pp. 230–239). New York: Association for Computing Machinery.
Baum, E.B., & Durdanovic, I. (1998a). Emergent planning by an artificial economy. Submitted for publication.
Baum, E.B., & Durdanovic, I. (1998b). Toward code evolution by artificial economies. In L.F. Landweber and E. Wintree (Eds.), Evaluation as Computation, Springer Verlag, 1999, and available at http://www.neci.nj.nec.com:80/homepages/eric/eric.html.
Bertsekas, D.P., & Tsitsiklis, D.P. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
Birk, A., & Paul, W.J. (1994). Schemas and genetic programming. Conference on Integration of Elementary Functions into Complex Behavior, Bielefeld.
Carbonell, J.G., Blythe, J., Etzioni, O., Gill, Y., Joseph, R., Khan, D., Knoblock, C., Minton, S., Perez, A., Reilly, S., Veloso, M., & Wang, X. (1992). Prodigy 4.0: The manual and tutorial. Technical Report CMU-CS-92-150, School of Computer Science.
S.H. Clearwater (Ed.). (1996). Market-based control, a paradigm for distributed resource allocation. Singapore: World Scientific.
Coase, R.H. (1960). The theory of social cost. Journal of Law and Economics, 3(1), 1–44.
Cosimides, L., & Tooby, J. (1992). Cognitive adaptations for social exchange. In J.H. Barkow, L. Cosimidies, & J. Tooby (Eds.), The adapted mind. New York: Oxford University Press.
Crites, R.H., & Barto, A.G. (1996). Improving elevator performance using reinforcement learning. In D.S. Touretsky, M.C. Mozer, & M.E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 1017–1023). Cambridge, MA: MIT Press.
Dayan, P., & Sejnowski, T.J. (1994). Td converges with probability 1. Machine Learning, 14(3).
Dennett, D.C. (1991). Consciousness explained. Brown, Boston. Little.
Drescher, G.L. (1991). Made-up minds. MIT Press.
Dzeroski, S., Blockeel, H., & DeRaedt, L. (1998). Relational reinforcement learning. In J. Shavlik (Ed.), Proceedings of the 12th International Conference on Machine Learning, San Mateo, CA: Morgan Kaufman.
Estlin, T.A., & Mooney, R.J. (1996). Multi-strategy learning of search control for partial-order planning. Proceedings of the Thirteenth National Conference on Aritificial Intelligence (pp. 843–848).
Forrest, S. (1985). Implementing semantic network structures using the classifiersystem. Proc. First International Conference on Genetic Algorithms (pp. 188–196). Hillsdale, NJ: Lawrence Erlbaum Associates.
Fox, P. (1997). Functional volume models: System level models for funcational neuroimaging. In International Conference on Neural Networks.
Gurvits, L., Lin, L.-J., & Hanson, S.J. (1994). Incremental learning of evaluation functions for absorbing markov chains: New methods and theorems. Unpublished report.
Hardin, G. (1968). The tragedy of the commons. Science, 162, 1243–1248.
Holland, J.H. (1986). Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. In R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.), Machine learning (Vol. 2, pp. 593–623). Los Altos, CA: Morgan Kauffman.
Holland, J.H. (1995). Hidden order. Reading, MA: Addison-Wesley.
Humphrys, M. (1996). Action selection methods using reinforcement learning. In P. Maes, M. Mataric, J.-A. Meyer, J. Pollack, & S.W. Wilson (Eds.), From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior (pp. 135–144). Cambridge MA: MIT Press/Bradford Books.
Koza, J.R. (1992). Genetic programming (pp. 459–470). Cambridge: MIT Press.
Lang, K. (1995a). Hill climbing beats genetic search on a boolean circuit synthesis task of koza's. The Twelfth International Conference on Machine Learning (pp. 340–343).
Lang, K. (1995b). Comments on a response to..., August 18.
Lenat, D.B. (1983). EURISKO: a program that learns new heuristics and domain concepts, the nature of heuristics III: Program design and results. Artificial Intelligence, 21(1/2), 61–98.
Lettau, M., & Uhlig, H. (1999). Rule of thumb and dynamic programming. American Economic Review, in press.
Lloyd, W. (1833). Two lectures on the checks to population. Oxford: Oxford University Press.
Luria, A.R. (1973). The working brain, an introduction to neuropsychology. New York: Basic Books.
Maes, P. (1990). How to do the right thing. Connection Science, 1(3).
McAllester, D., & Rosenblitt, D. (1991). Systematic nonlinear planning. Proceedings of the AAAI National Conference.
Miller, M.S., & Drexler, K.E. (1988a). Markets and computation: Agoric open systems. In B.A. Huberman (Ed.), The ecology of computation, number 2 in Studies in Computer Science and Artificial Intelligence (pp. 133–176). New York: North Holland.
Miller, M.S., & Drexler, K.E. (1988b). Comparative ecology. In B.A. Huberman (Ed.), The ecology of computation, number 2 in Studies in Computer Science and Artificial Intelligence (pp. 51–76). New York: North Holland.
Minsky, M. (1986). The society of mind. New York: Simon and Schuster.
Minsky, M. (1995). Steps towards artificial intelligence. In E.A. Feigenbaum & J. Feldman (Eds.). Computers and thought. Menlo Park: AAAI Press.
Nelson, R.R., & Winter, S.G. (1994). An evolutionary theory of economic change, volume 5th Printing. Harvard University Press.
Newell, A. (1990). Unified theories of cognition. Cambridge: Harvard University Press.
Palmer, R.G., Arthur, W.B., Holland, J.H., LeBaron, B., & Tayler, P. (1994). Artificial economic life: A simple model of a stockmarket. Physica D 75 (pp. 264–274).
Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning internal representations by error propagation. In D.E. Rumelhart & J.L. McClelland (Eds.), Parallel distributed processing. Cambridge: MIT Press.
Schmidhuber, J. (1989). The Neural Bucket Brigade: A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science, 1(4), 403–412.
Schuurmans, D., & Schaeffer, J. (1989). Representational difficulties with classifier systems. Proceedings of International Conference on Genetic Algorithms (pp. 328–333), Fairfax, VA.
Selfridge, O.G. (1959). Pandemonium: A paradigm for learning. Proceedings of the Symposium on Mechanisation of Thought Process. National Physics Laboratory.
Simon, H.A. (1987). Bounded rationality. In J. Eatwell, M. Millgate, & P. Newman (Eds.), The new palgrave: A dictionary of economics. London and Basingstoke: Macmillan.
Soderlan, S., Barrett, T., & Weld, D. (1990). The snlp planner implementation, contact bug-snlp@cs.washington.edu.
Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Sutton, R.S., & Barto, A.G. (1998). Reinforcement learning, an introduction. Cambridge: MIT Press.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.
Tesauro, G. (1995). Temporal difference learning and td-gammon. Communications of the ACM, 38(3), 58–68.
Toga, A.W., & Mazziotta, J.C. (1996). Brain mapping, the methods. San Diego: Academic Press.
Valiant, L. (1994). Circuits of the mind. Oxford University Press.
Valiant, L. (1995). Rationality. In Proceedings of the Eighth Annual Conference on Computational Learning Theory (pp. 3–14).
Venturini, G. (1994). Adaption in dynamic environments through a minimal probability of exploration. In Proceedings of the Third International Conference on Simulation of Adaptive Behavior (pp. 371–379). Cambridge, MA: MIT Press.
Watkins, C.J.C.H. (1989). Learning from delayed rewards. Ph.D. thesis, Cambridge University.
Wellman, M.P. (1993). A market oriented programming environment and its application to distributed multicommodity flow problems. Journal of Artificial Intelligence Research, 1, 1–23.
Whitehead, S.D., & Ballard, D.H. (1991). Learning to perceive and act. Machine Learning, 7(1), 45–83.
Wilson, S.W. (1995). Classifier fitness based on accuracy. Evolutionary Computation, 3(2), 149–175.
Wilson, S.W., & Goldberg, D.E. (1998). A critical review of classifier systems. Proceedings of the Third International Conference on Genetic Algorithms, San Mateo, CA: Morgan Kauffman.
Winograd, T. (1972). Understanding natural language. New York: Academic Press.
Zang, W., & Dietterich, T.G. (1996). High-performance job-shop scheduling with a time-delay td (lambda) network. In D.S. Touretszky, M.C. Mozer, & M.E Haselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 1024–1030).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Baum, E.B. Toward a Model of Intelligence as an Economy of Agents. Machine Learning 35, 155–185 (1999). https://doi.org/10.1023/A:1007593124513
Issue Date:
DOI: https://doi.org/10.1023/A:1007593124513
- reinforcement learning
- multi-agent systems
- planning
- evolutionary economics
- tragedy of the commons
- classifier systems
- agoric systems
- autonomous programming
- cognition
- artificial intelligence
- Hayek
- complex adaptive systems
- temporal difference learning
- evolutionary computation
- economic models of mind
- economic models of computation
- Blocks World
- reasoning
- learning
- computational learning theory
- learning to reason
- meta-reasoning