ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Purchase PDF (1902 K)

Article Toolbox
  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/0020-0271(63)90017-2    
How to Cite or Link Using DOI (Opens New Window)

Copyright © 1963 Published by Elsevier Science Ltd. All rights reserved.

The microstatistics of text

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Lauren B. Doyle*


Available online 18 July 2002.

Abstract

The role of statistics in text analysis is reappraised, and current inhibiting influences in the use of statistics are discussed. The question of descriptive vs. predictive statistics is explored at some length. A distinction between macrostatistics and microstatistics is made, with the implication that the former should be used in describing libraries whereas the latter should be used in describing written language.

Secondly, a relationship between the probability of occurence of a word or word group in text and the cognitive effect of such a word or word group is suggested. This relation is illustrated by statistical data on word pairs; statistics of pairs which are directly linked in a sentence-structure tree are compared with statistics of pairs which, though the words are adjacent in text, are not directly linked in such a tree. This study of statistics as a function of sentence structure is then extended to units of text larger than a word pair.

In the final section, the problem of selecting and displaying content-indicative word groups in condensed representations of documents is discussed. The statistical approach, by itself or in conjunction with other techniques, is shown to be unavoidable in a problem such as automatic abstracting, and the difficulties of some non-statistical methods described in recent literature are exemplified.

Article Outline

• References

* Center for Research in System Development, System Development Corporation, 2500 Colorado Avenue, Santa Monica, California.


 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.