Elsevier

Journal of Economic Theory

Volume 154, November 2014, Pages 229-244
Journal of Economic Theory

Convergence in models with bounded expected relative hazard rates

https://doi.org/10.1016/j.jet.2014.09.014Get rights and content

Abstract

We provide a general framework to study stochastic sequences related to individual learning in economics, learning automata in computer sciences, social learning in marketing, and other applications. More precisely, we study the asymptotic properties of a class of stochastic sequences that take values in [0,1] and satisfy a property called “bounded expected relative hazard rates.” Sequences that satisfy this property and feature “small step-size” or “shrinking step-size” converge to 1 with high probability or almost surely, respectively. These convergence results yield conditions for the learning models in [13], [35], [7] to choose expected payoff maximizing actions with probability one in the long run.

Introduction

Stochastic sequences arising in the analysis of several models in economics often exhibit expected hazard rates that are proportional to the sequence's current value. For instance, models of technology adoption often satisfy that the change in the fraction of a population that adopts a new technology is proportional to the product of the current fraction of adopters and the current fraction of non-adopters (see, e.g., [42]). This follows from the assumption that diffusion of technology requires non-adopters to observe adopters in order to learn about the new technology. A similar reasoning applies to models in other disciplines, such as Bass' celebrated model of new product growth (see, e.g., [2], [16]) and selection models in biological evolution (see, e.g., [29]). As we discuss below, models of individual and social learning provide another class of examples for stochastic sequences with expected hazard rates that are proportional to the sequences' current value. In these models, the sequences represent the probability of choosing optimal actions.

The analysis of such models usually concerns the question whether a new technology or a product gets fully adopted, a certain type takes over in a biological selection process, or an optimal action is played almost surely in the long run. Towards this end, this paper provides general conditions on expected hazard rates of a bounded stochastic sequence that guarantee the convergence to the upper bound. Here, the sequence is interpreted as a fraction of a certain type or the probability of playing an optimal action at any point in time. This paper thus provides conditions that guarantee that, in the long run, a certain type takes over the whole group of types or only optimal actions are chosen, as illustrated in the applications discussed below.

It turns out that constraints on the relative hazard rates of a stochastic sequence, i.e., the proportions of the hazard rates to the values of the sequence,1 provide helpful conditions for the convergence to the upper bound. In contrast to the deterministic case, in a stochastic framework, lower bounds for the relative hazard rates are not sufficient for almost sure convergence. For example, in the case of technology adoption, full adoption might fail as the new technology may be completely abandoned at some point in time by chance, or adoption rates may drop too fast. The analysis below reveals that if the underlying submartingale moves in small or shrinking steps, convergence to the upper bound holds, nevertheless. Thus, in the long run, new technologies are used or optimal actions chosen if adoption or learning occurs in small or shrinking steps.

The first main result of this paper, Theorem 2.1, analyzes the asymptotical properties of a sequence that changes with small step-size and satisfies weak bounds on its relative hazard rates. Theorem 2.1 asserts that the probability of convergence to optimality, i.e., the event that the stochastic sequence converges to the upper bound, is arbitrarily high for sequences with sufficiently small step-size. This result allows us to obtain novel convergence results in different contexts, including, for instance, the models of individual and social learning that we discuss below. A limitation of Theorem 2.1 is that the question of how small the step-size needs to be in order to achieve any given probability of convergence to 1 is usually directly related to the probability measure of the underlying probability space. In applications, however, this probability measure is assumed to be unknown. This issue is addressed by Theorem 2.2 and Corollary 2.1, which provide sufficient conditions for achieving convergence to optimality almost surely under an extra condition that may be interpreted as requiring an arbitrary shrinking step-size over time.

These results can be applied to the analysis of several models of boundedly rational learning (see, e.g., [13], [35], [7]). In models of individual learning, in every period individuals choose one action out of a finite set and observe a payoff realization yielded by the action they choose (sometimes along with forgone payoffs). In models of social learning, individuals also observe the payoffs from the actions chosen by a sample of other individuals. Learning is assumed to be “adaptive,” i.e., in every period, individuals make their choice according to a probability distribution over actions and this distribution is revised as new payoff observations arrive. As discussed in Section 3, our results can be used to provide conditions for learning to yield convergence to choose expected-payoff maximizing actions, either with high probability or almost surely. All details are provided in the online Appendix [31].

Small and shrinking step-size appear often in applications. Small step-size has been used in both theoretical and experimental work in economics (see, e.g., [8] and [39], respectively). Shrinking step-size appears endogenously in the Roth–Erev model (see, e.g., [13]).2 Researchers using the [10] model in applications often assume shrinking step-size (see, e.g., [34]), even though the benchmark version of this model has a fixed step-size. The condition of shrinking step-size captures the “power law of practice” in learning (see, e.g., [13] and the references therein): initial periods typically exhibit a substantial response of behavior to experience and are followed by gradually decreasing responses, such as those implied by shrinking step-size.

The question then arises when and why the “power law of practice” is relevant. Psychologists have long studied this problem. For instance, [6] and [27] study the decrease over time of motivation, psychophysical performance, or cognitive gains from experience, as possible explanations of the “power law of practice.” These explanations have appeal in the analysis of economic applications, as well. In particular, motivation, psychophysical performance, and cognitive gains play an important role in the analysis of data in experimental economics, where subjects tire and lose concentration. More importantly, in real-world economic problems, the “power law of practice” seems to hold for similar reasons. For instance, [9] analyze reinforcement learning and saving behavior, and provide evidence supporting the “power law of practice” hypothesis: younger investors are more responsive to their personal return realizations than older investors in terms of their 401(k) saving rates. We believe the “power law of practice” plays a role in individual and social learning in economics.

Norman [28] formally analyzes a two-armed bandit algorithm to study the asymptotic properties of reinforcement learning models considered by experimental psychologists (see, e.g., [41]) who study learning when success or failure are the only possible outcomes. In this pioneering work, he shows that certain learning models converge with high probability to choose the action that is more likely to yield success, provided that changes in the probability of choosing each action are small. Computer scientists (see, e.g., [36], [26], [20], [38]), provide similar results in the context of learning automata. Oyarzun and Sarin [32] adapt these techniques to prove convergence of a class of learning models to risk averse choice. The settings in these papers are more restrictive than in this work, and their convergence results are implied by Theorem 2.1 below. None of these papers has a counterpart to the almost-sure convergence results in Theorem 2.2 and Corollary 2.1 below, as the models they analyze fail to satisfy our conditions on shrinking step-size over time.

The paper closest to our analysis is that of [23], who thoroughly analyze the asymptotical properties of the two-armed bandit algorithm. This analysis is of particular interest because the algorithm may have a positive probability of converging to a non-optimal state, i.e., a “trap,” despite of the probability of choosing an optimal action being a submartingale. Lamberton et al. [23] take an approach similar to ours based on shrinking step-size to provide conditions that yield convergence to optimality almost surely. Their analysis is tailored to the specific characteristics of the two-armed bandit algorithm, whereas this paper's framework allows us to apply its results in more general settings such as the models in economics that we study in the applications.

Section snippets

Framework

In this subsection, we provide the analytical framework and introduce the condition of bounded expected relative hazard rates.

In our applications to models of individual and social learning, the realization of the state of the world in each period determines the action chosen by each individual, the obtained and forgone payoffs, and the information revealed to each individual. After observing this information, individuals adjust their behavior, i.e., the probability of choosing each action

Application to learning models

We provide several applications of our results in the online Appendix. In a first application, we consider models of individual learning with partial information. That is, we consider an individual who every period chooses one action out of a finite set according to a probability distribution and observes a payoff realization of her choice. Upon observing this realization, she adjusts the probability of choosing each action according to a function mapping (potentially all) past realizations of

Discussion

The analysis of systems that satisfy WBERHR or BERHR can be the starting point for the study of slightly more complex dynamics. There are many other models in the literature with similar characteristics to those considered here that do not satisfy these properties. One example is the model of word-of-mouth social learning in [12]. In their model, individuals sample nN other individuals out of a continuum population and choose the action that has the highest average payoff in their observed

References (42)

  • A.G. Bills

    General Experimental Psychology

    (1934)
  • T. Börgers et al.

    Expedient and monotone learning rules

    Econometrica

    (2004)
  • T. Börgers et al.

    Learning through reinforcement and replicator dynamics

    J. Econ. Theory

    (1997)
  • J. Choi et al.

    Reinforcement learning and savings behavior

    J. Finance

    (2009)
  • J. Cross

    A stochastic learning model of economic behavior

    Quart. J. Econ.

    (1973)
  • S. Durham et al.

    A sequential design for maximizing the probability of a favourable response

    Can. J. Statist.

    (1998)
  • G. Ellison et al.

    Word of mouth communication and social learning

    Quart. J. Econ.

    (1995)
  • I. Erev et al.

    Predicting how people play games: reinforcement learning in experimental games with a unique mixed strategy equilibria

    Amer. Econ. Rev.

    (1998)
  • M. Jackson et al.

    Diffusion, strategic interaction, and social structure

  • J. Kiefer et al.

    Stochastic estimation of the maximum of a regression function

    Ann. Math. Statist.

    (1952)
  • H. Kushner et al.

    Stochastic Approximation Methods for Constrained and Unconstrained Systems

    (1978)
  • Cited by (0)

    We thank J. Hedlund, S. Lakshmivarahan and M.A.L. Thathachar for helpful correspondence on the subject matter of this paper, and two referees and the Associate Editor for insightful and constructive comments. Oyarzun acknowledges financial support of the Ministerio de Ciencia y Tecnologia, FEDER funds under project SEJ2007-62656, and of the Instituto Valenciano de Investigaciones. Ruf acknowledges financial support of the Visitors Program of the School of Economics at the University of Queensland and of the Oxford-Man Institute of Quantitative Finance at the University of Oxford, where a major part of this work was completed.

    View full text