Regular Article
Overconfidence: It Depends on How, What, and Whom You Ask☆1,☆2,☆3,☆4,☆5,☆6,☆7,☆8,☆9,☆10

https://doi.org/10.1006/obhd.1999.2847Get rights and content

Abstract

Many studies have reported that the confidence people have in their judgments exceeds their accuracy and that overconfidence increases with the difficulty of the task. However, some common analyses confound systematic psychological effects with statistical effects that are inevitable if judgments are imperfect. We present three experiments using new methods to separate systematic effects from the statistically inevitable. We still find systematic differences between confidence and accuracy, including an overall bias toward overconfidence. However, these effects vary greatly with the type of judgment. There is little general overconfidence with two-choice questions and pronounced overconfidence with subjective confidence intervals. Over- and underconfidence also vary systematically with the domain of questions asked, but not as a function of difficulty. We also find stable individual differences. Determining why some people, some domains, and some types of judgments are more prone to overconfidence will be important to understanding how confidence judgments are made.

References (61)

  • P.W. Paese et al.

    Influences on the appropriateness of confidence in judgment: Practice, effort, information, and decision making

    Organizational Behavior and Human Decision Processes

    (1991)
  • P.E. Pfeifer

    Are we overconfident in the belief that probability forecasters are overconfident?

    Organizational Behavior and Human Decision Processes

    (1994)
  • J.B. Soll

    Determinants of overconfidence and miscalibration: The roles of random error and ecological structure

    Organizational Behavior and Human Decision Processes

    (1996)
  • L. Suantak et al.

    The hard–easy effect in subjective probability calibration

    Organizational Behavior and Human Decision Processes

    (1996)
  • P. Ayton et al.

    How real is overconfidence?

    Journal of Behavioral Decision Making

    (1997)
  • E. Babad

    Wishful thinking and objectivity among sports fans

    Social Behavior

    (1987)
  • M. Björkman et al.

    Realism of confidence in sensory discrimination: The underconfidence phenomenon

    Perception & Psychophysics

    (1993)
  • E. Brunswik

    The conceptual framework of psychology

    (1952)
  • D.V. Budescu et al.

    On the importance of random error in the study of probability judgment: Part I. New theoretical developments

    Journal of Behavioral Decision Making

    (1997)
  • D.V. Budescu et al.

    Stochastic and cognitive models of confidence [Special issue]

    Journal of Behavioral Decision Making

    (1997)
  • D.V. Budescu et al.

    On the importance of random error in the study of probability judgment: Part II. Applying the stochastic judgment model to detect systematic trends

    Journal of Behavioral Decision Making

    (1997)
  • R.S. Burt

    Structural holes

    (1992)
  • R.S. Burt

    The contingent value of social capital

    Administrative Science Quarterly

    (1997)
  • R.T. Clemen

    Calibration and the aggregation of probabilities

    Management Science

    (1986)
  • R.T. Clemen

    Combining forecasts: A review and annotated bibliography

    International Journal of Forecasting

    (1989)
  • R.M. Dawes

    Confidence in intellectual judgments vs. confidence in perceptual judgments

  • D. Dunning et al.

    The overconfidence effect in social prediction

    Journal of Personality and Social Psychology

    (1990)
  • I. Erev et al.

    Simultaneous over- and underconfidence: The role of error in judgment processes

    Psychological Review

    (1994)
  • W.R. Ferrell

    Discrete subjective probabilities and decision analysis: Elicitation, calibration and combination

  • B. Fischhoff

    Debiasing

  • Cited by (0)

    ☆1

    This research was supported by Grant SBR-9409627 from the Decision, Risk, and Management Sciences Program of the National Science Foundation. We thank Peter Juslin, Eldar Shafir, David Budescu, Robin Hogarth, Christopher Hsee, J. Edward Russo, William Ferrell, Terry Connolly, and an anonymous reviewer for their helpful comments on earlier drafts.

    ☆2

    Argentina 71, Canada 77.

    ☆3

    Approximately 200 calories.

    ☆4

    Gigerenzer et al. (1991) used both mixed- and single-domain question sets and found no difference. However, the two sets were not randomly selected. Rather, they were selected to be difficult, and equally so. Also, only one domain was tested in a single-domain presentation. Thus, the Gigerenzer et al. study does not afford a direct comparison of mixed- and single-domain procedures.

    ☆5

    They have also been referred to as “misleading” (May, 1986) and as “deceptive” (Fischhoff, Slovic, & Lichtenstein, 1977). We avoid these terms because they imply some special features that fool people; in fact, contrary questions merely fail to conform to one' prediction.

    ☆6

    In the models of Ferrell and colleagues, what we refer to as a signal is modeled as the separation between two signals, one for each alternative.

    ☆7

    In our usage, an item is one of the members of the list for a given domain (e.g., the poverty level of Vermont), as distinct from a question posed to participants (e.g., “Which of these states …”). Thus, two-choice questions require a comparison of two items.

    ☆8

    Recall that there are always two comparisons between samples of questions (i.e., two ways of comparing data from different questions and different participants within each domain). Each comparison uses separate data, so where both agree in sign, the results are more reliable than either alone.

    ☆9

    In the absence of complicated proper-scoring rules and incentives, participants could obtain this result by giving impossible answers on 10% of the questions and near-infinite ranges on 90%. However, neither we nor other researchers have found any evidence of this.

    ☆10

    The worst participant's answers were all off by orders of magnitude, and the single within-range answer seems to have been the result of a typing error. However, this participant did appear to take the task seriously (did not use the same number repeatedly, did not use arbitrary numbers such as 0, used different scales of numbers for the different domains, etc.) The next-worst participants had 8% and 9% of their answers within range. We reran our analyses eliminating the worst participant, in case he or she had misunderstood something. The results were substantively the same; the overall proportion of within-range answers rose to .45.

    f2

    Address correspondence and reprint requests to Joshua Klayman, Graduate School of Business, University of Chicago, 1101 East 58th Street, Chicago, IL 60637. E-mail:[email protected].

    View full text