Abstract
In mathematical modeling of cognition, it is important to have well-justified criteria for choosing among differing explanations (i.e., models) of observed data. This paper introduces a Bayesian model selection approach that formalizes Occam’s razor, choosing the simplest model that describes the data well. The choice of a model is carried out by taking into account not only the traditional model selection criteria (i.e., a model’s fit to the data and the number of parameters) but also the extension of the parameter space, and, most importantly, the functional form of the model (i.e., the way in which the parameters are combined in the model’s equation). An advantage of the approach is that it can be applied to the comparison of non-nested models as well as nested ones. Application examples are presented and implications of the results for evaluating models of cognition are discussed.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrox & F. Caski (Eds.),Second International Symposium on Information Theory (p. 267). Budapest: Akademiai Kiado.
Akaike, H. (1983). Information measures and model selection.Bulletin of the International Statistical Institute,50, 277–290.
Allan, L. G. (1980). A note on measurement of contingency between two binary variables in judgment tasks.Bulletin of the Psychonomic Society,15, 147–149.
Allan, L. G. (1993). Human contingency judgments: Rule based or associativity?Psychological Bulletin,114, 435–448.
Anderson, J. R. (1990).The adaptive character of thought. Hillsdale, NJ: Erlbaum.
Anderson, J. R., &Sheu, C.-F. (1995). Causal inferences as perceptual judgments.Memory & Cognition,23, 510–524.
Anderson, N. H. (1981).Foundations of information integration theory. New York: Academic Press.
Ashby, F. G. (1992). Multidimensional models of categorization. In F. G. Ashby (Ed.),Multidimensional models of perception and cognition (pp. 449–483). Hillsdale, NJ: Erlbaum.
Ashby, F. G., &Gott, R. E. (1988). Decision rules in the perception and categorization of multidimensional stimuli.Journal of Experimental Psychology: Learning, Memory, & Cognition,14, 33–53.
Ashby, F. G., &Townsend, J. T. (1986). Varieties of perceptual independence.Psychological Review,93, 154–179.
Balakrishnan, N., &Cohen, A. C. (1991).Order statistics and inference: Estimation methods. New York: Academic Press.
Bamber, D., &van Santen, J. P. H. (1985). How many parameters can a model have and still be testable?Journal of Mathematical Psychology,29, 443–473.
Berger, J. O. (1985).Statistical decision theory and Bayesian analysis (2nd ed.). New York: Springer-Verlag.
Berger, J. O., &Perrichi, L. R. (1996). The intrinsic Bayes factor for model selection.Journal of the American Statistical Association,91, 109–122.
Bickel, P. J., &Doksum, K. A. (1977).Mathematical statistics. Oakland, CA: Holden-Day.
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions.Psychometrika,52, 345–370.
Bretthorst, G. L. (1989). Bayesian model selection: Examples relevant to NMR. In J. Skilling (Ed.),Maximum entropy and Bayesian methods (pp. 377–388). Amsterdam: Kluwer.
Browne, M. W., &Cudeck, R. C. (1992). Alternative ways of assessing model fit.Sociological Methods & Research,21, 230–258.
Busemeyer, J. R., &Townsend, J. T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment.Psychological Review,100, 432–459.
Carlin, B. P., &Chib, S. (1995). Bayesian model choice via Markov chain Monte Carlo methods.Journal of the Royal Statistical Society: Series B,3, 473–484.
Chaitin, G. J. (1966). On the length of programs for computing binary sequences.Journal of the Association for Computing Machinery,13, 547–569.
Collyer, C. E. (1985). Comparing strong and weak models by fitting them to computer-generated data.Perception & Psychophysics,38, 476–481.
Cover, T. M., &Thomas, J. A. (1991).Elements of information theory. New York: Wiley.
Cryer, J. D. (1986).Time series analysis. Boston: PWS-Kent.
Cudeck, R., &Henly, S. J. (1991). Model selection in covariance structures analysis and the “problem” of sample size: A clarification.Psychological Bulletin,109, 512–519.
Cutting, J. E., Bruno, N., Brady, N. P., &Moore, C. (1992). Selectivity, scope, and simplicity of models: A lesson from fitting judgments of perceived depth.Journal of Experimental Psychology: General,121, 364–381.
De Bruijn, N. G. (1958).Asymptotic methods in analysis. Amsterdam: North-Holland.
Gelfand, A. E., &Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations.Journal of the Royal Statistical Society: Series B,56, 501–514.
Gelfand, A. E., &Smith, A. E. (1990). Sampling-based approaches to calculating marginal densities.Journal of the American Statistical Association,85, 398–409.
Geman, S., &Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.IEEE Transactions on Pattern Analysis & Machine Intelligence,6, 721–741.
Gillund, G., &Shiffrin, R. M. (1984). A retrieval model for both recognition and recall.Psychological Review,91, 1–67.
Green, D. M., &Swets, J. A. (1966).Signal detection theory and psychophysics. New York: Wiley.
Gregory, P. C., &Loredo, T. J. (1992). A new method for the detection of a periodic signal of unknown shape and period.Astrophysical Journal,398, 146–168.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chain and their applications.Biometrika,57, 97–109.
Hintzman, D. L. (1986). “Schema abstraction” in a multiple-trace memory model.Psychological Review,93, 411–428.
Hintzman, D. L. (1988). Judgments of frequency and recognition in a multiple-trace memory model.Psychological Review,84, 260–278.
Jacobs, A. M., &Grainger, J. (1994). Models of visual word recognition: Sampling the state of the art.Journal of Experimental Psychology: Human Perception & Performance,20, 1311–1334.
Jaynes, E. T. (1957). Information theory and statistical mechanics.Physical Review,106, 620–630;108, 171–190.
Jeffreys, H. (1961).Theory of probability (3rd ed.). New York: Oxford University Press.
Jeffreys, W. H., &Berger, J. O. (1992). Ockham’s razor and Bayesian analysis.American Scientist,80, 64–72.
Kapur, J. N., &Kesavan, H. K. (1992).Entropy optimization principles with applications. New York: Academic Press.
Kass, R. E., &Raftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association,90, 773–795.
Kolmogorov, A. N. (1968). Logical basis for information theory and probability theory.IEEE Transactions on Information Theory,14, 662–664.
Kruschke, J. (1992). ALCOVE: An exemplar-based connectionist model of category learning.Psychological Review,99, 22–44.
Kullback, S., &Leibler, R. A. (1951). On information and sufficiency.Annals of Mathematical Statistics,22, 79–86.
Le, N. D., &Raftery, A. E. (1996). Robust Bayesian model selection for autoregressive processes with additive outliers.Journal of the American Statistical Association,91, 123–131.
Li, M., &Vitanyi, P. (1993).An introduction to Kolmogorov complexity and its applications. New York: Springer-Verlag.
MacKay, D. J. C. (1992).Bayesian methods for adaptive models. Unpublished doctoral dissertation, California Institute of Technology, Pasadena.
Maddox, W. T., &Ashby, F. G. (1993). Comparing decision bound and exemplar models of categorization.Perception & Psychophysics,53, 49–70.
Marquardt, D. (1963). An algorithm for least-squares estimation of nonlinear parameters.SIAM Journal of Applied Mathematics,11, 431–441.
Massaro, D. W., &Cohen, M. M. (1993). The paradigm and the fuzzy logical model of perception are alive and well.Journal of Experimental Psychology: General,122, 115–124.
Massaro, D. W., &Friedman, D. (1990). Models of integration given multiple sources of information.Psychological Review,97, 225–252.
Medin, D. L., &Schaffer, M. M. (1978). Context theory of classification learning.Psychological Review,85, 207–238.
Metcalfe-Eich, J. (1982). A complete holographic associative recall model.Psychological Review,89, 627–661.
Murdock, B. B., Jr. (1982). A theory for the storage and retrieval of item and associative information.Psychological Review,89, 609–626.
Nosofsky, R. M. (1986). Attention, similarity, and the identification-categorization relationship.Journal of Experimental Psychology: General,115, 39–57.
Oden, G. C., &Massaro, D. W. (1978). Integration of featural information in speech perception.Psychological Review,85, 172–191.
O’Hagan, A. (1995). Fractional Bayes factors for model comparison.Journal of the Royal Statistical Society: Series B,57, 99–138.
Raftery, A. E. (1993). Bayesian model selection in structural equation models. In K. A. Bollen & J. S. Long (Eds.),Testing structural equation models (pp. 163–180). Thousand Oaks, CA: Sage.
Raftery, A. E. (1994).Approximate Bayes factors and accounting for model uncertainty in generalized linear models (Tech. Rep. 255). Seattle: University of Washington, Department of Statistics.
Raftery, A. E., &Lewis, S. (1991). How many iterations in the Gibbs sampler?Bayesian Statistics,4, 763–773.
Reed, S. K. (1972). Pattern recognition and categorization.Cognitive Pyschology,3, 382–407.
Rissanen, J. (1986). Stochastic complexity and modeling.Annals of Statistics,14, 1080–1100.
Rissanen, J. (1990). Complexity of models. In W. H. Zurek (Ed.),Complexity, entropy, and the physics of information (pp. 117–125). Reading, MA: Addison-Wesley.
Roberts, F. S. (1979).Measurement theory. Reading, MA: Addison-Wesley.
Schustack, M. W., &Sternberg, R. J. (1981). Evaluation of evidence in causal inference.Journal of Experimental Psychology: General,110, 101–120.
Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics,6, 461–464.
Smith, A. F. M. (1991). Bayesian computational methods.Philosophical Transactions of the Royal Society of London: Series A,337, 369–386.
Smith, A. F. M., &Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods.Journal of the Royal Statistical Society: Series B,55, 3–23.
Solomonoff, R. J. (1964). A formal theory of inductive inference.Information Control,7, 1–22, 224-254.
Steiger, J. H. (1990). Structural model evaulation and modification: An interval estimation approach.Multivariate Behavioral Research,25, 173–180.
Steiger, J. H., &Lind, J. C. (1980, November).Statistically based tests for the number of common factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City.
Takane, Y., &Shibayama, T. (1992). Structure in stimulus identification data. In F. G. Ashby (Ed.),Multidimensional models of perception and cognition (pp. 335–362). Hillsdale, NJ: Erlbaum.
Thisted, R. A. (1988).Elements of statistical computing: Numerical computation. New York: Chapman & Hall.
Tierney, L., &Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities.Journal of the American Statistical Association,81, 82–86.
Townsend, J. T. (1975). The mind-body equation revisited. In C. Cheng (Ed.),Philosophical aspects of the mind-body problem (pp. 200–218). Honolulu: Honolulu University Press.
Tribus, M. (1969).The principle of maximum entropy. Elmsford, NY: Pergamon.
Van Zandt, T., &Ratcliff, R. (1995). Statistical mimicking of reaction time data: Single-process models, parameter variability, and mixtures.Psychonomic Bulletin & Review,2, 20–54.
Wakefield, J. C., Smith, A. F. M., Racine-Poon, A., &Gelfand, A. E. (1994). Bayesian analysis of linear and non-linear population models by using the Gibbs sampler.Applied Statistics,43, 201–221.
Author information
Authors and Affiliations
Corresponding author
Additional information
A portion of this work was presented at the 27th annual meeting of the Society for Mathematical Psychology held at the University of California, Irvine, in August 1995. Many people provided very useful feedback on earlier versions of this paper. They include Greg Ashby, Michael Browne, Jerry Busemeyer, Dan Friedman, Lester Krueger, Duncan Luce, Robert MacCallum, Dominic Massaro, Richard Schweickert, James Townsend, Michael Wenger, and Patricia van Zandt. Greg Ashby and Lester Krueger were especially helpful in sharpening our thinking on model complexity. This research was supported in part by Ohio Supercomputer Center Grant PAS887-1.
Rights and permissions
About this article
Cite this article
Myung, I.J., Pitt, M.A. Applying Occam’s razor in modeling cognition: A Bayesian approach. Psychonomic Bulletin & Review 4, 79–95 (1997). https://doi.org/10.3758/BF03210778
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03210778