Skip to main content
Log in

Exploring lexical co-occurrence space using HiDEx

  • Articles From the SCiP Conference
  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

Hyperspace analog to language (HAL) is a high-dimensional model of semantic space that uses the global co-occurrence frequency of words in a large corpus of text as the basis for a representation of semantic memory. In the original HAL model, many parameters were set without any a priori rationale. We have created and publicly released a computer application, the High Dimensional Explorer (HiDEx), that makes it possible to systematically alter the values of these parameters to examine their effect on the co-occurrence matrix that instantiates the model. We took an empirical approach to understanding the influence of the parameters on the measures produced by the models, looking at how well matrices derived with different parameters could predict human reaction times in lexical decision and semantic decision tasks. New parameter sets give us measures of semantic density that improve the model’s ability to predict behavioral measures. Implications for such models are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.

    Article  Google Scholar 

  • Baayen, R. H. (2001). Word frequency distributions. Boston: Kluwer.

    Book  Google Scholar 

  • Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX Lexical Database (Release 2) [CD-ROM]. Philadelphia: University of Pennsylvania, Linguistic Data Consortium.

    Google Scholar 

  • Baddeley, A. (2003). Working memory and language: An overview. Journal of Communication Disorders, 36, 189–208.

    Article  PubMed  Google Scholar 

  • Balota, D. A., Black, S. R., & Cheney, M. (1992). Automatic and attentional priming in young and older adults: Reevaluation of the two-process model. Journal of Experimental Psychology: Human Perception & Performance, 18, 485–502.

    Google Scholar 

  • ]Balota, D. A., Cortese, M. J., Hutchison, K. A., Neely, J. H., Nelson, D., Simpson, G. B., & Treiman, R. (2002). The English Lexicon Project: A Web-based repository of descriptive and behavioral measures for 40,481 English words and nonwords. Retrieved October 5, 2005, from http://elexicon.wustl.edu/.

  • Binder, J. R., Westbury, C. F., McKiernan, K. A., Possing, E. T., & Medler, D. A. (2005). Distinct brain systems for processing concrete and abstract concepts. Journal of Cognitive Neuroscience, 17, 905–917.

    Article  PubMed  Google Scholar 

  • Brants, T., & Franz, A. (2006). Web 1T 5-Gram Corpus (Version 1). Philadelphia: University of Pennsylvania, Linguistic Data Consortium.

    Google Scholar 

  • Buchanan, L., Burgess, C., & Lund, K. (1996). Overcrowding in semantic neighborhoods: Modeling deep dyslexia. Brain & Cognition, 32, 111–114.

    Google Scholar 

  • Buchanan, L., Westbury, C., & Burgess, C. (2001). Characterizing semantic space: Neighborhood effects in word recognition. Psychonomic Bulletin & Review, 8, 531–544.

    Article  Google Scholar 

  • Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39, 510–526.

    Article  PubMed  Google Scholar 

  • Burgess, C. (1998). From simple associations to the building blocks of language: Modeling meaning in memory with the HAL model. Behavior Research Methods, Instruments, & Computers, 30, 188–198.

    Article  Google Scholar 

  • Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis. Behavior Research Methods, Instruments, & Computers, 30, 272–277.

    Article  Google Scholar 

  • Burgess, C., Livesay, K., & Lund, K. (1998). Explorations in context space: Words, sentences, discourse. Discourse Processes, 25, 211–257.

    Article  Google Scholar 

  • Burgess, C., & Lund, K. (1997). Modelling parsing constraints with high-dimensional context space. Language & Cognitive Processes, 12, 177–210.

    Article  Google Scholar 

  • Burgess, C., & Lund, K. (2000). The dynamics of meaning in memory. In E. Dietrich & A. B. Markman (Eds.), Cognitive dynamics: Conceptual and representational change in humans and machines (pp. 117–156). Mahwah, NJ: Erlbaum.

    Google Scholar 

  • Chapman, B., Jost, G., van der Pas, R., & Kuck, D. (2007). Using OpenMP: Portable shared memory parallel programming. Cambridge, MA: MIT Press.

    Google Scholar 

  • Cree, G. S., McNorgan, C., & McRae, K. (2006). Distinctive features hold a privileged status in the computation of word meaning: Implications for theories of semantic memory. Journal of Experimental Psychology: Learning, Memory, & Cognition, 32, 643–658.

    Google Scholar 

  • Durda, K., & Buchanan, L. (2008). WINDSORS: Windsor improved norms of distance and similarity of representations of semantics. Behavior Research Methods, 40, 705–712.

    Article  PubMed  Google Scholar 

  • Durda, K., Buchanan, L., & Caron, R. (2009). Grounding co-occurrence: Identifying features in a lexical co-occurrence model of semantic memory. Behavior Research Methods, 41, 1210–1223.

    Article  PubMed  Google Scholar 

  • Fristrup, J. A. (1994). USENET: Netnews for everyone. Englewood Cliffs, NJ: Prentice Hall.

    Google Scholar 

  • Hollis, G., Westbury, C. F., & Peterson, J. B. (2006). NUANCE 3.0: Using genetic programming to model variable relationships. Behavior Research Methods, 38, 218–228.

    Article  PubMed  Google Scholar 

  • Jones, M. N., Kintsch, W., & Mewhort, D. J. K. (2006). Highdimensional semantic space accounts of priming. Journal of Memory & Language, 55, 534–552.

    Article  Google Scholar 

  • Jones, M. N., & Mewhort, D. J. K. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review, 114, 1–37.

    Article  PubMed  Google Scholar 

  • Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.

    Article  Google Scholar 

  • Lifchitz, A., Jhean-Larose, S., & Denhière, G. (2009). Effect of tuned parameters on an LSA multiple choice questions answering model. Behavior Research Methods, 41, 1201–1209.

    Article  PubMed  Google Scholar 

  • Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instrumentation, & Computers, 28, 203–208.

    Article  Google Scholar 

  • Mirman, D., & Magnuson, J. S. (2008). Attractor dynamics and semantic neighborhood density: Processing is slowed by near neighbors and speeded by distant neighbors. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 65–79.

    Google Scholar 

  • Moss, H. E., & Tyler, L. K. (1995). Investigating semantic memory impairments: The contribution of semantic priming. Memory, 3, 359–395.

    Article  PubMed  Google Scholar 

  • Murdock, B. B. (1982). A theory for the storage and retrieval of item and associative information. Psychological Review, 89, 609–626.

    Article  Google Scholar 

  • ]Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998). The University of South Florida word association, rhyme, and word fragment norms. Available from www.usf.edu/FreeAssociation/.

  • Pexman, P. M., Hino, Y., & Lupker, S. J. (2004). Semantic ambiguity and the process of generating meaning from print. Journal of Experimental Psychology: Learning, Memory, & Cognition, 30, 1252–1270.

    Google Scholar 

  • Recchia, G., & Jones, M. N. (2009). More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis. Behavior Research Methods, 41, 647–656.

    Article  PubMed  Google Scholar 

  • ]Rohde, D. L. T., Gonnerman, L. M., & Plaut, D. C. (2005). An improved method model of semantic similarity based on lexical co-occurrence. Unpublished manuscript. Retrieved April 20, 2007, from http://tedlab.mit.edu/~dr/.

  • Russell, B. (1910). The study of mathematics. In Philosophical essays. London: Longmans, Green.

    Google Scholar 

  • ]Shaoul, C., & Westbury, C. (2006a). USENET orthographic frequencies for the 40,481 words in the English lexicon project [Data file]. Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab/downloads.html.

  • Shaoul, C., & Westbury, C. (2006b). Word frequency effects in highdimensional co-occurrence models: A new approach. Behavior Research Methods, 38, 190–195.

    Article  PubMed  Google Scholar 

  • ]Shaoul, C., & Westbury, C. (2008). HiDEx: High Dimensional Explorer [Software]. Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab/downloads.usenetcorpus.html.

  • ]Shaoul, C., & Westbury, C. (2009). A USENET corpus (2005–2009). Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab/downloads.usenetcorpus.html.

  • Siakaluk, P. D., Buchanan, L., & Westbury, C. (2003). The effect of semantic distance in yes/no and go/no-go semantic categorization tasks. Memory & Cognition, 31, 100–113.

    Article  Google Scholar 

  • ]Song, D., & Bruza, P. (2001, September 10). Discovering information flow using a high dimensional conceptual space. Paper presented at the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans.

  • Song, D., Bruza, P., & Cole, R. (2004, July 30). Concept learning and information inferencing on a high-dimensional semantic space. Paper presented at the ACM SIGIR 2004 Workshop on Mathematical/ Formal Methods in Information Retrieval, Sheffield, U.K.

  • Song, D., Bruza, P., Huang, Z., & Lau, R. K. (2003). Classifying document titles based on information inference. In J. G. Carbonell & J. Siekmann (Eds.), Foundations of intelligent systems (pp. 297–306). Berlin: Springer.

    Chapter  Google Scholar 

  • ]Stallman, R. (2009). GNU General Public License. Available from www.fsf.org/licensing/.

  • ]Westbury, C. (2007). ACTUATE: Assessing Cases, The University of Alberta Testing Environment. Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab.

  • Yates, M., Locker, L., Jr., & Simpson, G. B. (2003). Semantic and phonological influences on the processing of words and pseudohomophones. Memory & Cognition, 31, 856–866.

    Article  Google Scholar 

  • Zipf, G. K. (1935). The psycho-biology of language: An introduction to dynamic philology. Boston: Houghton Mifflin.

    Google Scholar 

  • Zipf, G. K. (1949). Human behavior and the principle of least effort. New York: Addison-Wesley.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cyrus Shaoul.

Additional information

This research was supported by grants to the authors from the Natural Sciences and Engineering Research Council of Canada (to C.S. and C.W.) and the Alberta Heritage Foundation for Medical Research (to C.W.).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shaoul, C., Westbury, C. Exploring lexical co-occurrence space using HiDEx. Behavior Research Methods 42, 393–413 (2010). https://doi.org/10.3758/BRM.42.2.393

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3758/BRM.42.2.393

Keywords

Navigation