Copyright © 1963 Published by Elsevier Science Ltd. All rights reserved.
The microstatistics of text
Available online 18 July 2002.
Abstract
The role of statistics in text analysis is reappraised, and current inhibiting influences in the use of statistics are discussed. The question of descriptive vs. predictive statistics is explored at some length. A distinction between macrostatistics and microstatistics is made, with the implication that the former should be used in describing libraries whereas the latter should be used in describing written language.
Secondly, a relationship between the probability of occurence of a word or word group in text and the cognitive effect of such a word or word group is suggested. This relation is illustrated by statistical data on word pairs; statistics of pairs which are directly linked in a sentence-structure tree are compared with statistics of pairs which, though the words are adjacent in text, are not directly linked in such a tree. This study of statistics as a function of sentence structure is then extended to units of text larger than a word pair.
In the final section, the problem of selecting and displaying content-indicative word groups in condensed representations of documents is discussed. The statistical approach, by itself or in conjunction with other techniques, is shown to be unavoidable in a problem such as automatic abstracting, and the difficulties of some non-statistical methods described in recent literature are exemplified.
Article Outline
* Center for Research in System Development, System Development Corporation, 2500 Colorado Avenue, Santa Monica, California.






E-mail Article
Add to my Quick Links

Cited By in Scopus (0)





