Abstract
Hyperspace analog to language (HAL) is a high-dimensional model of semantic space that uses the global co-occurrence frequency of words in a large corpus of text as the basis for a representation of semantic memory. In the original HAL model, many parameters were set without any a priori rationale. We have created and publicly released a computer application, the High Dimensional Explorer (HiDEx), that makes it possible to systematically alter the values of these parameters to examine their effect on the co-occurrence matrix that instantiates the model. We took an empirical approach to understanding the influence of the parameters on the measures produced by the models, looking at how well matrices derived with different parameters could predict human reaction times in lexical decision and semantic decision tasks. New parameter sets give us measures of semantic density that improve the model’s ability to predict behavioral measures. Implications for such models are discussed.
Similar content being viewed by others
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
Baayen, R. H. (2001). Word frequency distributions. Boston: Kluwer.
Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX Lexical Database (Release 2) [CD-ROM]. Philadelphia: University of Pennsylvania, Linguistic Data Consortium.
Baddeley, A. (2003). Working memory and language: An overview. Journal of Communication Disorders, 36, 189–208.
Balota, D. A., Black, S. R., & Cheney, M. (1992). Automatic and attentional priming in young and older adults: Reevaluation of the two-process model. Journal of Experimental Psychology: Human Perception & Performance, 18, 485–502.
]Balota, D. A., Cortese, M. J., Hutchison, K. A., Neely, J. H., Nelson, D., Simpson, G. B., & Treiman, R. (2002). The English Lexicon Project: A Web-based repository of descriptive and behavioral measures for 40,481 English words and nonwords. Retrieved October 5, 2005, from http://elexicon.wustl.edu/.
Binder, J. R., Westbury, C. F., McKiernan, K. A., Possing, E. T., & Medler, D. A. (2005). Distinct brain systems for processing concrete and abstract concepts. Journal of Cognitive Neuroscience, 17, 905–917.
Brants, T., & Franz, A. (2006). Web 1T 5-Gram Corpus (Version 1). Philadelphia: University of Pennsylvania, Linguistic Data Consortium.
Buchanan, L., Burgess, C., & Lund, K. (1996). Overcrowding in semantic neighborhoods: Modeling deep dyslexia. Brain & Cognition, 32, 111–114.
Buchanan, L., Westbury, C., & Burgess, C. (2001). Characterizing semantic space: Neighborhood effects in word recognition. Psychonomic Bulletin & Review, 8, 531–544.
Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39, 510–526.
Burgess, C. (1998). From simple associations to the building blocks of language: Modeling meaning in memory with the HAL model. Behavior Research Methods, Instruments, & Computers, 30, 188–198.
Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis. Behavior Research Methods, Instruments, & Computers, 30, 272–277.
Burgess, C., Livesay, K., & Lund, K. (1998). Explorations in context space: Words, sentences, discourse. Discourse Processes, 25, 211–257.
Burgess, C., & Lund, K. (1997). Modelling parsing constraints with high-dimensional context space. Language & Cognitive Processes, 12, 177–210.
Burgess, C., & Lund, K. (2000). The dynamics of meaning in memory. In E. Dietrich & A. B. Markman (Eds.), Cognitive dynamics: Conceptual and representational change in humans and machines (pp. 117–156). Mahwah, NJ: Erlbaum.
Chapman, B., Jost, G., van der Pas, R., & Kuck, D. (2007). Using OpenMP: Portable shared memory parallel programming. Cambridge, MA: MIT Press.
Cree, G. S., McNorgan, C., & McRae, K. (2006). Distinctive features hold a privileged status in the computation of word meaning: Implications for theories of semantic memory. Journal of Experimental Psychology: Learning, Memory, & Cognition, 32, 643–658.
Durda, K., & Buchanan, L. (2008). WINDSORS: Windsor improved norms of distance and similarity of representations of semantics. Behavior Research Methods, 40, 705–712.
Durda, K., Buchanan, L., & Caron, R. (2009). Grounding co-occurrence: Identifying features in a lexical co-occurrence model of semantic memory. Behavior Research Methods, 41, 1210–1223.
Fristrup, J. A. (1994). USENET: Netnews for everyone. Englewood Cliffs, NJ: Prentice Hall.
Hollis, G., Westbury, C. F., & Peterson, J. B. (2006). NUANCE 3.0: Using genetic programming to model variable relationships. Behavior Research Methods, 38, 218–228.
Jones, M. N., Kintsch, W., & Mewhort, D. J. K. (2006). Highdimensional semantic space accounts of priming. Journal of Memory & Language, 55, 534–552.
Jones, M. N., & Mewhort, D. J. K. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review, 114, 1–37.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.
Lifchitz, A., Jhean-Larose, S., & Denhière, G. (2009). Effect of tuned parameters on an LSA multiple choice questions answering model. Behavior Research Methods, 41, 1201–1209.
Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instrumentation, & Computers, 28, 203–208.
Mirman, D., & Magnuson, J. S. (2008). Attractor dynamics and semantic neighborhood density: Processing is slowed by near neighbors and speeded by distant neighbors. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 65–79.
Moss, H. E., & Tyler, L. K. (1995). Investigating semantic memory impairments: The contribution of semantic priming. Memory, 3, 359–395.
Murdock, B. B. (1982). A theory for the storage and retrieval of item and associative information. Psychological Review, 89, 609–626.
]Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998). The University of South Florida word association, rhyme, and word fragment norms. Available from www.usf.edu/FreeAssociation/.
Pexman, P. M., Hino, Y., & Lupker, S. J. (2004). Semantic ambiguity and the process of generating meaning from print. Journal of Experimental Psychology: Learning, Memory, & Cognition, 30, 1252–1270.
Recchia, G., & Jones, M. N. (2009). More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis. Behavior Research Methods, 41, 647–656.
]Rohde, D. L. T., Gonnerman, L. M., & Plaut, D. C. (2005). An improved method model of semantic similarity based on lexical co-occurrence. Unpublished manuscript. Retrieved April 20, 2007, from http://tedlab.mit.edu/~dr/.
Russell, B. (1910). The study of mathematics. In Philosophical essays. London: Longmans, Green.
]Shaoul, C., & Westbury, C. (2006a). USENET orthographic frequencies for the 40,481 words in the English lexicon project [Data file]. Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab/downloads.html.
Shaoul, C., & Westbury, C. (2006b). Word frequency effects in highdimensional co-occurrence models: A new approach. Behavior Research Methods, 38, 190–195.
]Shaoul, C., & Westbury, C. (2008). HiDEx: High Dimensional Explorer [Software]. Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab/downloads.usenetcorpus.html.
]Shaoul, C., & Westbury, C. (2009). A USENET corpus (2005–2009). Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab/downloads.usenetcorpus.html.
Siakaluk, P. D., Buchanan, L., & Westbury, C. (2003). The effect of semantic distance in yes/no and go/no-go semantic categorization tasks. Memory & Cognition, 31, 100–113.
]Song, D., & Bruza, P. (2001, September 10). Discovering information flow using a high dimensional conceptual space. Paper presented at the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans.
Song, D., Bruza, P., & Cole, R. (2004, July 30). Concept learning and information inferencing on a high-dimensional semantic space. Paper presented at the ACM SIGIR 2004 Workshop on Mathematical/ Formal Methods in Information Retrieval, Sheffield, U.K.
Song, D., Bruza, P., Huang, Z., & Lau, R. K. (2003). Classifying document titles based on information inference. In J. G. Carbonell & J. Siekmann (Eds.), Foundations of intelligent systems (pp. 297–306). Berlin: Springer.
]Stallman, R. (2009). GNU General Public License. Available from www.fsf.org/licensing/.
]Westbury, C. (2007). ACTUATE: Assessing Cases, The University of Alberta Testing Environment. Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab.
Yates, M., Locker, L., Jr., & Simpson, G. B. (2003). Semantic and phonological influences on the processing of words and pseudohomophones. Memory & Cognition, 31, 856–866.
Zipf, G. K. (1935). The psycho-biology of language: An introduction to dynamic philology. Boston: Houghton Mifflin.
Zipf, G. K. (1949). Human behavior and the principle of least effort. New York: Addison-Wesley.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by grants to the authors from the Natural Sciences and Engineering Research Council of Canada (to C.S. and C.W.) and the Alberta Heritage Foundation for Medical Research (to C.W.).
Rights and permissions
About this article
Cite this article
Shaoul, C., Westbury, C. Exploring lexical co-occurrence space using HiDEx. Behavior Research Methods 42, 393–413 (2010). https://doi.org/10.3758/BRM.42.2.393
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3758/BRM.42.2.393