Abstract
Two-factor theory (Mowrer, 1947, 1951, 1956) remains one of the most influential theories of avoidance, but it is at odds with empirical findings that demonstrate sustained avoidance responding in situations in which the theory predicts that the response should extinguish. This article shows that the well-known actor-critic model seamlessly addresses the problems with two-factor theory, while simultaneously being consistent with the core ideas that underlie that theory. More specifically, the article shows that (1) the actor-critic model bears striking similarities to two-factor theory and explains all of the empirical phenomena that two-factor theory explains, in much the same way, and (2) there are subtle but important differences between the actor-critic model and two-factor theory, which result in the actor-critic model predicting the persistence of avoidance responses that is found empirically.
Article PDF
Similar content being viewed by others
References
Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. Quarterly Journal of Experimental Psychology, 34B, 77–98.
Baird, L. C. (1993). Advantage updating (Tech. Rep. No. WL-TR-93-1146). Dayton, OH: Wright-Patterson Air Force Base.
Barto, A. G. (1995). Adaptive critics and the basal ganglia. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 215–232). Cambridge, MA: MIT Press.
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, & Cybernetics, 13, 835–846.
Beninger, R. J., Mason, S. T., Phillips, A. G., & Fibiger, H. C. (1980). The use of conditioned suppression to evaluate the nature of neuroleptic-induced avoidance deficits. Journal of Pharmacology & Experimental Therapeutics, 213, 623–627.
Bolles, R. C. (1969). Avoidance and escape learning: Simultaneous acquisition of different responses. Journal of Comparative & Physiological Psychology, 68, 355–358.
Bolles, R. C. (1970). Species-specific defense reactions and avoidance learning. Psychological Review, 77, 32–48.
Bolles, R. C. (1972a). The avoidance learning problem. In G. H. Bower & K. W. Spence (Eds.), The psychology of learning and motivation (Vol. 6, pp. 97–145). New York: Academic Press.
Bolles, R. C. (1972b). Reinforcement, expectancy, and learning. Psychological Review, 79, 394–409.
Bolles, R. C. (1978). The role of stimulus learning in defensive behavior. In S. H. Hulse, H. Fowler, & W. K. Honig (Eds.), Cognitive processes in animal behavior (pp. 89–108). Hillsdale, NJ: Erlbaum.
Bolles, R. C., & Grossen, N. E. (1969). Effects of an informational stimulus on the acquisition of avoidance behavior in rats. Journal of Comparative & Physiological Psychology, 68, 90–99.
Bolles, R. C., Stokes, L. W., & Younger, M. S. (1966). Does CS termination reinforce avoidance behavior? Journal of Comparative & Physiological Psychology, 62, 201–207.
Brady, J. V. (1965). Experimental studies of psychophysiological responses to stressful situations. In Symposium on Medical Aspects of Stress in the Military Climate (pp. 271–289). Washington, DC: Walter Reed Army Institute of Research.
Brady, J. V., & Harris, A. (1977). The experimental production of altered physiological states. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 595–618). Englewood Cliffs, NJ: Prentice Hall.
Bridle, J. S. (1990). Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimates of parameters. In D. S. Touretzky (Ed.), Advances in neural information processing systems 2 (pp. 211–217). San Mateo, CA: Morgan Kaufmann.
Chorazyna, H. (1962). Some properties of conditioned inhibition. Acta Biologiae Experimentalis, 22, 5–13.
Cicala, G. A., & Owen, J. W. (1976). Warning signal termination and a feedback signal may not serve the same function. Learning & Motivation, 7, 356–367.
Cook, M., Mineka, S., & Trumble, D. (1987). The role of responseproduced and exteroceptive feedback in the attenuation of fear over the course of avoidance learning. Journal of Experimental Psychology: Animal Behavior Processes, 13, 239–249.
Coover, G. D., Ursin, H., & Levine, S. (1973). Plasma-corticosterone levels during active-avoidance learning in rats. Journal of Comparative & Physiological Psychology, 82, 170–174.
Crawford, M., & Masterson, F. A. (1978). Components of the flight response can reinforce bar-press avoidance learning. Journal of Experimental Psychology: Animal Behavior Processes, 4, 144–151.
Crawford, M., & Masterson, F. A. (1982). Species-specific defense reactions and avoidance learning. An evaluative review. Pavlovian Journal of Biological Science, 17, 204–214.
Crespi, L. P. (1942). Quantitative variation of incentive and performance in the white rat. American Journal of Psychology, 55, 467–517.
Daw, N. D. (2003). Reinforcement learning models of the dopamine system and their behavioral implications. Unpublished doctoral dissertation, Carnegie Mellon University, Pittsburgh.
Daw, N. D., Courville, A. C., & Touretzky, D. S. (2006). Representation and timing in theories of the dopamine system. Neural Computation, 18, 1637–1677.
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8, 1704–1711.
Daw, N. D., Niv, Y., & Dayan, P. (2006). Actions, policies, values, and the basal ganglia. In E. Bezard (Ed.), Recent breakthroughs in basal ganglia research (pp. 111–130). New York: Nova Science.
Daw, N. D., & Touretzky, D. S. (2002). Long-term reward prediction in TD models of the dopamine system. Neural Computation, 14, 2567–2583.
Dayan, P., & Balleine, B. W. (2002). Reward, motivation, and reinforcement learning. Neuron, 36, 285–298.
Dayan, P., Kakade, S., & Montague, P. R. (2000). Learning and selective attention. Nature Neuroscience, 3 (Suppl.), 1218–1223.
Dickinson, A. (1985). Actions and habits: The development of behavioural autonomy. Philosophical Transactions of the Royal Society B, 308, 67–78.
Dickinson, A. (1994). Instrumental conditioning. In N. J. Mackintosh (Ed.), Animal learning and cognition (pp. 45–79). San Diego: Academic Press.
Dinsmoor, J. A. (2001). Stimuli inevitably generated by behavior that avoids electric shock are inherently reinforcing. Journal of the Experimental Analysis of Behavior, 75, 311–333.
Dinsmoor, J. A., & Sears, G. W. (1973). Control of avoidance by a response-produced stimulus. Learning & Motivation, 4, 284–293.
Domjan, M. (2003). The principles of learning and behavior (5th ed.). Belmont, CA: Thomson/Wadsworth.
Estes, W. K., & Skinner, B. F. (1941). Some quantitative properties of anxiety. Journal of Experimental Psychology, 29, 390–400.
Grossberg, S. (1972). A neural theory of punishment and avoidance. I: Qualitative theory. Mathematical Biosciences, 15, 39–67.
Grossen, N. E., & Kelley, M. J. (1972). Species-specific behavior and acquisition of avoidance behavior in rats. Journal of Comparative & Physiological Psychology, 81, 307–310.
Herrnstein, R. (1969). Method and theory in the study of avoidance. Psychological Review, 76, 49–69.
Hodgson, R., & Rachman, S. (1974). II. Desynchrony in measures of fear. Behaviour Research & Therapy, 12, 319–326.
Houk, J. C., Adams, J. L., & Barto, A. G. (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 249–270). Cambridge, MA: MIT Press.
Hull, C. L. (1943). Principles of behavior: An introduction to behavior theory. New York: Appleton-Century.
Joel, D., Niv, Y., & Ruppin, E. (2002). Actor-critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15, 535–547.
Johnson, J. D., Li, W., Li, J., & Klopf, A. H. (2002). A computational model of learned avoidance behavior in a one-way avoidance experiment. Adaptive Behavior, 9, 91–104.
Kamin, L. J. (1956). The effects of termination of the CS and avoidance of the US on avoidance learning. Journal of Comparative & Physiological Psychology, 49, 420–424.
Kamin, L. J., Brimer, C. J., & Black, A. H. (1963). Conditioned suppression as a monitor of fear of the CS in the course of avoidance training. Journal of Comparative & Physiological Psychology, 56, 497–501.
Klopf, A. H., Morgan, J. S., & Weaver, S. E. (1993). A hierarchical network of control systems that learn: Modeling nervous system function during classical and instrumental conditioning. Adaptive Behavior, 1, 263–319.
Knapp, R. K. (1965). Acquisition and extinction of avoidance with similar and different shock and escape situations. Journal of Comparative & Physiological Psychology, 60, 272–273.
Levis, D. J. (1966). Effects of serial CS presentation and other characteristics of the CS on the conditioned avoidance response. Psychological Reports, 18, 755–766.
Levis, D. J., Bouska, S. A., Eron, J. B., & McIlhon, M. D. (1970). Serial CS presentation and one-way avoidance conditioning: A noticeable lack of delay in responding. Psychonomic Science, 20, 147–149.
Levis, D. J., & Boyd, T. L. (1979). Symptom maintenance: An infrahuman analysis and extension of the conservation of anxiety principle. Journal of Abnormal Psychology, 88, 107–120.
Levis, D. J., & Brewer, K. E. (2001). The neurotic paradox: Attempts by two-factor fear theory and alternative avoidance models to resolve the issues associated with sustained avoidance responding in extinction. In R. R. Mowrer & S. B. Klein (Eds.), Handbook of contemporary learning theories (pp. 561–597). Mahwah, NJ: Erlbaum.
Logan, F. A. (1951). A comparison of avoidance and nonavoidance eyelid conditioning. Journal of Experimental Psychology, 42, 390–393.
Mackintosh, N. J. (1974). The psychology of animal learning. New York: Academic Press.
Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276–298.
Maia, T. V. (2007). A reinforcement learning theory of avoidance. Unpublished doctoral dissertation, Carnegie Mellon University, Pittsburgh.
Maia, T. V. (2009). Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, & Behavioral Neuroscience, 9, 343–364.
Malloy, P., & Levis, D. J. (1988). A laboratory demonstration of persistent human avoidance. Behavior Therapy, 19, 229–241.
Masterson, F. A. (1970). Is termination of a warning signal an effective reward for the rat? Journal of Comparative & Physiological Psychology, 72, 471–475.
McAllister, W. R., & McAllister, D. E. (1995). Two-factor fear theory: Implications for understanding anxiety-based clinical phenomena. In W. O’Donohue & L. Krasner (Eds.), Theories of behavior therapy (pp. 145–171). Washington, DC: American Psychological Association.
McAllister, W. R., McAllister, D. E., Scoles, M. T., & Hampton, S. R. (1986). Persistence of fear-reducing behavior: Relevance for the conditioning theory of neurosis. Journal of Abnormal Psychology, 95, 365–372.
Mineka, S. (1979). The role of fear in theories of avoidance learning, flooding, and extinction. Psychological Bulletin, 86, 985–1010.
Mineka, S., & Gino, A. (1980). Dissociation between conditioned emotional response and extended avoidance performance. Learning & Motivation, 11, 476–502.
Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16, 1936–1947.
Morris, R. G. (1974). Pavlovian conditioned inhibition of fear during shuttlebox avoidance behavior. Learning & Motivation, 5, 424–447.
Morris, R. G. (1975). Preconditioning of reinforcing properties to an exteroceptive feedback stimulus. Learning & Motivation, 6, 289–298.
Moutoussis, M., Bentall, R. P., Williams, J., & Dayan, P. (2008). A temporal difference account of avoidance learning. Network, 19, 137–160.
Mowrer, O. H. (1947). On the dual nature of learning—a reinterpretation of conditioning and problem solving. Harvard Educational Review, 17, 102–148.
Mowrer, O. H. (1951). Two-factor learning theory: Summary and comment. Psychological Review, 58, 350–354.
Mowrer, O. H. (1956). Two-factor learning theory reconsidered, with special reference to secondary reinforcement and the concept of habit. Psychological Review, 63, 114–128.
Mowrer, O. H. (1960). Learning theory and behavior. New York: Wiley.
Neuenschwander, N., Fabrigoule, C., & Mackintosh, N. J. (1987). Fear of the warning signal during overtraining of avoidance. Quarterly Journal of Experimental Psychology, 39B, 23–33.
Niv, Y., Duff, M. O., & Dayan, P. (2005). Dopamine, uncertainty and TD learning. Behavioral & Brain Functions, 1, 6.
O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304, 452–454.
Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychological Review, 87, 532–552.
Rachman, S. (1976). The passing of the two-stage theory of fear and avoidance: Fresh possibilities. Behaviour Research & Therapy, 14, 125–131.
Rachman, S., & Hodgson, R. (1974). I. Synchrony and desynchrony in fear and avoidance. Behaviour Research & Therapy, 12, 311–318.
Rescorla, R. A. (1968). Pavlovian conditioned fear in Sidman avoidance learning. Journal of Comparative & Physiological Psychology, 65, 55–60.
Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-Century-Crofts.
Riccio, D. C., & Silvestri, R. (1973). Extinction of avoidance behavior and the problem of residual fear. Behaviour Research & Therapy, 11, 1–9.
Schmajuk, N. A., & Zanutto, B. S. (1997). Escape, avoidance, and imitation: A neural network approach. Adaptive Behavior, 6, 63–129.
Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80, 1–27.
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.
Seligman, M. E. P., & Campbell, B. A. (1965). Effect of intensity and duration of punishment on extinction of an avoidance response. Journal of Comparative & Physiological Psychology, 59, 295–297.
Seligman, M. E. P., & Johnston, J. C. (1973). A cognitive theory of avoidance learning. In F. J. McGuigan & D. B. Lumsden (Eds.), Contemporary approaches to conditioning and learning (pp. 69–110). Washington, DC: Winston.
Servatius, R. J., Jiao, X., Beck, K. D., Pang, K. C., & Minor, T. R. (2008). Rapid avoidance acquisition in Wistar-Kyoto rats. Behavioural Brain Research, 192, 191–197.
Sheffield, F. D., & Temmer, H. W. (1950). Relative resistance to extinction of escape training and avoidance training. Journal of Experimental Psychology, 40, 287–298.
Smith, A. J., Becker, S., & Kapur, S. (2005). A computational model of the functional role of the ventral-striatal D2 receptor in the expression of previously acquired behaviors. Neural Computation, 17, 361–395.
Smith, A. [J.], Li, M., Becker, S., & Kapur, S. (2004). A model of antipsychotic action in conditioned avoidance: A computational approach. Neuropsychopharmacology, 29, 1040–1049.
Solomon, R. L., Kamin, L. J., & Wynne, L. C. (1953). Traumatic avoidance learning: The outcomes of several extinction procedures with dogs. Journal of Abnormal Psychology, 48, 291–302.
Solomon, R. L., & Wynne, L. C. (1953). Traumatic avoidance learning: Acquisition in normal dogs. Psychological Monographs, 67(Whole No. 354).
Solomon, R. L., & Wynne, L. C. (1954). Traumatic avoidance learning: The principles of anxiety conservation and partial irreversibility. Psychological Review, 61, 353–385.
Starr, M. D., & Mineka, S. (1977). Determinants of fear over the course of avoidance learning. Learning & Motivation, 8, 332–350.
Stebbins, W. C. (1962). Response latency as a function of amount of reinforcement. Journal of the Experimental Analysis of Behavior, 5, 305–307.
Strub, H. (1963). Instrumental escape conditioning in a water alley: Shifts in magnitude of reinforcement under constant drive conditions. Unpublished master’s thesis, Hollins University, Roanoke, VA.
Suri, R. E., Bargas, J., & Arbib, M. A. (2001). Modeling functions of striatal dopamine modulation in learning and planning. Neuroscience, 103, 65–85.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Sutton, R. S., & Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In M. R. Gabriel & J. Moore (Eds.), Learning and computational neuroscience: Foundations of adaptive networks (pp. 497–537). Cambridge, MA: MIT Press.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Takahashi, Y., Schoenbaum, G., & Niv, Y. (2008). Silencing the critics: Understanding the effects of cocaine sensitization on dorsolateral and ventral striatum in the context of an actor/critic model. Frontiers in Neuroscience, 2, 86–99.
Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New Brunswick, NJ: Transaction.
Wahlsten, D. L., & Cole, M. (1972). Classical and avoidance training of leg flexion in the dog. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 379–408). New York: Appleton-Century-Crofts.
Weisman, R. G., & Litner, J. S. (1972). The role of Pavlovian events in avoidance training. In R. A. Boakes & M. S. Halliday (Eds.), Inhibition and learning. New York: Academic Press.
Williams, B. A. (2001). Two-factor theory has strong empirical evidence of validity. Journal of the Experimental Analysis of Behavior, 75, 362–378.
Williams, R. W., & Levis, D. J. (1991). A demonstration of persistent human avoidance in extinction. Bulletin of the Psychonomic Society, 29, 125–127.
Williams, Z. M., & Eskandar, E. N. (2006). Selective enhancement of associative learning by microstimulation of the anterior caudate. Nature Neuroscience, 9, 562–568.
Woods, P. J. (1967). Performance changes in escape conditioning following shifts in the magnitude of reinforcement. Journal of Experimental Psychology, 75, 487–491.
Zeaman, D. (1949). Response latency as a function of the amount of reinforcement. Journal of Experimental Psychology, 39, 466–483.
Zerbolio, D. J., Jr. (1968). Escape and approach responses in avoidance learning. Canadian Journal of Psychology, 22, 60–71.
Author information
Authors and Affiliations
Corresponding author
Additional information
The author is now at Columbia University and the New York State Psychiatric Institute. This article is based on the author’s doctoral dissertation in the Department of Psychology at Carnegie Mellon University. This work was supported in part by a Graduate Research Fellowship from the Calouste Gulbenkian Foundation. The author thanks James McClelland, John Anderson, Marlene Behrmann, and Ahmad Hariri for useful discussions about this work.
Rights and permissions
About this article
Cite this article
Maia, T.V. Two-factor theory, the actor-critic model, and conditioned avoidance. Learning & Behavior 38, 50–67 (2010). https://doi.org/10.3758/LB.38.1.50
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/LB.38.1.50