Abstract
The idea that at least some aspects of word meaning can be induced from patterns of word co-occurrence is becoming increasingly popular. However, there is less agreement about the precise computations involved, and the appropriate tests to distinguish between the various possibilities. It is important that the effect of the relevant design choices and parameter values are understood if psychological models using these methods are to be reliably evaluated and compared. In this article, we present a systematic exploration of the principal computational possibilities for formulating and validating representations of word meanings from word co-occurrence statistics. We find that, once we have identified the best procedures, a very simple approach is surprisingly successful and robust over a range of psychologically relevant evaluation measures.
Article PDF
Similar content being viewed by others
References
Aston, G., &Burnard, L. (1998).The BNC handbook: Exploring the British National Corpus with SARA Edinburgh: Edinburgh University Press.
Audet, C., &Burgess, C. (1999). Using a high-dimensional memory model to evaluate the properties of abstract and concrete words.Proceedings of the Twenty-First Annual Conference of the Cognitive Science Society (pp. 37–42). Mahwah, NJ: Erlbaum.
Battig, W. F., &Montague, W. E. (1969). Category norms for verbal items in 56 categories: A replication and extension of the Connecticut category norms.Journal of Experimental Psychology,80(3, Pt. 2), 1–46.
Bishop, C. M. (1995).Neural networks for pattern recognition. Oxford: Oxford University Press.
Bullinaria, J. A., &Huckle, C. C. (1997). Modelling lexical decision using corpus derived semantic representations in a connectionist network. In J. A. Bullinaria, D. W. Glasspool, & G. Houghton (Eds.),Fourth Neural Computation and Psychology Workshop: Connectionist Representations (pp. 213–226). London: Springer.
Burgess, C. (2000). Theory and operational definitions in computational memory models: A response to Glenberg and Robertson.Journal of Memory & Language,43, 402–408.
Burgess, C. (2001). Representing and resolving semantic ambiguity: A contribution from high-dimensional memory modeling. In D. S. Gorfein (Ed.),On the consequences of meaning selection: Perspectives on resolving lexical ambiguity. Washington, DC: American Psychological Association.
Burgess, C., &Conley, P. (1999). Representing proper names and objects in a common semantic space: A computational model.Brain & Cognition,40, 67–70.
Christiansen, M. H., Allen, J., &Seidenberg, M. S. (1998). Learning to segment speech using multiple cues: A connectionist model.Language & Cognitive Processes,13, 221–268.
Church, K. W., &Hanks, P. (1990). Word association norms, mutual information and lexicography.Computational Linguistics,16, 22–29.
Conley, P., Burgess, C., &Glosser, G. (2001). Age and Alzheimer’s: A computational model of changes in representation.Brain & Cognition,46, 86–90.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., &Harshman, R. (1990). Indexing by Latent Semantic Analysis.Journal of the American Society for Information Science,41(6), 391–407.
Denhière, G., &Lemaire, B. (2004). A computational model of children’s semantic memory. InProceedings Twenty-sixth Annual Meeting of the Cognitive Science Society (pp. 297–302). Mahwah, NJ: Erlbaum.
Finch, S. P., &Chater, N. (1992). Bootstrapping syntactic categories. InProceedings of the Fourteenth Annual Conference of the Cognitive Science Society of America (pp. 820–825). Hillsdale, NJ: Erlbaum.
Firth, J. R. (1957) A synopsis of linguistic theory 1930–1955. InStudies in linguistic analysis (pp. 1–32). Oxford: Philological Society. [Reprinted in F. R. Palmer (Ed.) (1968).Selected papers of J. R. Firth 1952–1959. London: Longman.]
French, R. M., &Labiouse, C. (2002). Four problems with extracting human semantics from large text corpora.Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society (pp. 316–322). Mahwah, NJ: Erlbaum.
Glenberg, A. M., &Robertson, D. A. (2000). Symbol grounding and meaning: A comparison of high-dimensional and embodied theories of meaning,Journal of Memory & Language,43, 379–401.
Harnad, S. (1990). The symbol grounding problem.Physica D,42, 335–346.
Haykin, S. (1999).Neural networks: A comprehensive foundation (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis.Machine Learning Journal,42, 177–196
Hu, X., Cai, Z., Franceschetti, D., Graesser, A. C., &Ventura, M. (2005). Similarity between semantic spaces. InProceedings of the Twenty-Seventh Annual Conference of the Cognitive Science Society (pp. 995–1000). Mahwah, NJ: Erlbaum.
Hu, X., Cai, Z., Franceschetti, D., Penumatsa, P., Graesser, A. C., Louwerse, M. M., McNamara, D. S., & TRG (2003). LSA: The first dimension and dimensional weighting. InProceedings of the Twenty-Fifth Annual Conference of the Cognitive Science Society (pp. 1–6). Mahwah, NJ: Erlbaum.
Kintsch, W. (2000). Metaphor comprehension: A computational theory.Psychonomic Bulletin & Review,7, 257–266.
Kintsch, W., &Bowles, A. R. (2002). Metaphor comprehension: What makes a metaphor difficult to understand?Metaphor & Symbol,17, 249–262.
Kohonen, T. (1997).Self-organizing maps (2nd ed.). Berlin: Springer.
Landauer, T. K., &Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge.Psychological Review,104, 211–240.
Letsche, T. A., &Berry, M. W. (1997). Large-scale information retrieval with Latent Semantic Indexing.Information Sciences—Applications,100, 105–137.
Levy, J. P., &Bullinaria, J. A. (2001). Learning lexical properties from word usage patterns: Which context words should be used? In R. F. French & J. P. Sougne (Eds.),Connectionist models of learning, development and evolution: Proceedings of the Sixth Neural Computation and Psychology Workshop (pp. 273–282). London: Springer.
Levy, J. P., Bullinaria, J. A., &Patel, M. (1998). Explorations in the derivation of semantic representations from word co-occurrence statistics.South Pacific Journal of Psychology,10, 99–111.
Lowe, W. (2001). Towards a theory of semantic space. InProceedings of the Twenty-Third Annual Conference of the Cognitive Science Society (pp. 576–581). Mahwah, NJ: Erlbaum.
Lowe, W., &McDonald, S. (2000). The direct route: Mediated priming in semantic space.Proceedings of the Twenty-Second Annual Conference of the Cognitive Science Society (pp. 806–811). Mahwah, NJ: Erlbaum.
Lund, K., &Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence.Behavior Research Methods, Instruments, & Computers,28, 203–208.
Manning, C. D., &Schütze, H. (1999).Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
McDonald, S., &Lowe, W. (1998). Modelling functional priming and the associative boost. InProceedings of the Twentieth Annual Conference of the Cognitive Science Society (pp. 675–680). Mahwah, NJ: Erlbaum.
McDonald, S. A., &Shillcock, R. C. (2001). Rethinking the word frequency effect: The neglected role of distributional information in lexical processing.Language & Speech,44, 295–323.
Miller, T. (2003). Essay assessment with latent semantic analysis.Journal of Educational Computing Research,28, 2003.
Monaghan, P., Chater, N., &Christiansen, M. H. (2005). The differential role of phonological and distributional cues in grammatical categorization,Cognition,96, 143–182.
O’Reilly, R. C. (1998). Six principles for biologically-based computational models of cortical cognition.Trends in Cognitive Sciences,2, 455–462.
Patel, M., Bullinaria, J. A., &Levy, J. P. (1997). Extracting semantic representations from large text corpora. In J. A. Bullinaria, D. W. Glasspool, & G. Houghton (Eds.),Fourth Neural Computation and Psychology Workshop: Connectionist Representations (pp. 199–212). London: Springer.
Redington, M., Chater, N., &Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories,Cognitive Science,22, 425–469.
Saussure, F. de (1916).Cours de linguistique générale. Paris: Payot.
Schütze, H. (1993). Word space. In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.),Advances in neural information processing systems (Vol. 5, pp. 895–902). San Mateo, CA: Morgan Kauffmann.
Schütze, H. (1998). Automatic word sense discrimination,Computational Linguistics,24, 97–123.
Turney, P. D. (2001). Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In L. De Raedt & P. A. Flach (Eds.),Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001) (pp. 491–502). Berlin: Springer.
Wolfe, M. B. W., &Goldman, S. R. (2003). Use of Latent Semantic Analysis for predicting psychological phenomena: Two issues and proposed solutions.Behavior Research Methods, Instruments, & Computers,35, 22–31.
Zhu, H. (1997). Bayesian geometric theory of learning algorithms. InProceedings of the International Conference on Neural Networks (ICNN ’97),2, 1041–1044.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bullinaria, J.A., Levy, J.P. Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39, 510–526 (2007). https://doi.org/10.3758/BF03193020
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03193020