skip to main content
10.3115/980845.980947dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free Access

Text segmentation using reiteration and collocation

Published:10 August 1998Publication History

ABSTRACT

A method is presented for segmenting text into subtopic areas. The proportion of related pairwise words is calculated between adjacent windows of text to determine their lexical similarity. The lexical cohesion relations of reiteration and collocation are used to identify related words. These relations are automatically located using a combination of three linguistic features: word repetition, collocation and relation weights. This method is shown to successfully detect known subject changes in text and corresponds well to the segmentations placed by test subjects.

References

  1. Beeferman D., Berger A. and Lafferty J. (1997) Text segmentation using exponential models, Proceedings of the 2nd Conference on Empirical Methods in Natural Language ProcessingGoogle ScholarGoogle Scholar
  2. Church K. W. and Hanks P. (1990) Word association norms, mutual information and lexicography, Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, pp. 76--83 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Grosz, B. J. and Sidner, C. L. (1986) Attention, intentions and the structure of discourse, Computational Linguistics, 12(3), pp. 175--204 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Halliday M. A. K. and Hasan R. (1976) Cohesion in English, Longman GroupGoogle ScholarGoogle Scholar
  5. Hearst M. A. (1993) TextTiling: A quantitative approach to discourse segmentation, Technical Report 93/24, Sequoia 2000, University of California, Berkeley Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hearst M. A. (1994) Multi-paragraph segmentation of expository texts, Report No. UCB/CSD 94/790, University of California, Berkeley Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jobbins A. C and Evett L. J. (1995) Automatic identification of cohesion in texts: Exploiting the lexical organisation of Roget's Thesaurus, Proceedings of ROCLING VIII, Taipei, TaiwanGoogle ScholarGoogle Scholar
  8. Jobbins A. C. and Evett L. J. (1998) Semantic Information from Roget's Thesaurus: Applied to the Correction of Cursive Script Recognition Output, Proceedings of the International Conference on Computational Linguistics, Speech and Document Processing, India, pp. 65--70Google ScholarGoogle Scholar
  9. Keenan F. G and Evett L. J. (1989) Lexical structure for natural language processing, Proceedings of the 1st International Lexical Acquisition Workshop at IJCAIGoogle ScholarGoogle Scholar
  10. Kozima H. (1993) Text segmentation based on similarity between words, Proceedings of the 31st Annual Meeting on the Association for Computational Linguistics, pp. 286--288 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Litman D. J. and Passonneau R. J. (1996) Combining knowledge sources for discourse segmentation, Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Morris J. and Hirst G. (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of text, Computational Linguistics, 17(1), pp. 21--48 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ponte J. M. and Croft W. B. (1997) Text Segmentation by Topic, 1st European Conference on Research and Advanced Technology for Digital Libraries (ECDL'97), pp. 113--125 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Reynar J. C. (1994) An automatic method of finding topic boundaries, Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (Student Session), pp. 331--333 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rotondo J. A. (1984) Clustering analysis of subjective partitions of text, Discourse Processes, 7, pp. 69--88Google ScholarGoogle ScholarCross RefCross Ref
  16. Salton G. and Buckley C. (1991) Global text matching for information retrieval, Science, 253, pp. 1012--1015Google ScholarGoogle ScholarCross RefCross Ref
  17. Salton G. and Buckley C. (1992) Automatic text structuring experiments in "Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval," P. S. Jacobs, ed, Lawrence Earlbaum Associates, New Jersey, pp. 199--210 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Salton G., Allen J. and Buckley C. (1994) Automatic structuring and retrieval of large text files, Communications of the Association for Computing Machinery, 37(2), pp. 97--108 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Stairmand M. A. (1997) Textual context analysis for information retrieval, Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, pp. 140--147 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yaari Y. (1997) Segmentation of expository texts by hierarchical agglomerative clustering, RANLP'97, BulgariaGoogle ScholarGoogle Scholar
  1. Text segmentation using reiteration and collocation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1
        August 1998
        768 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 10 August 1998

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate85of443submissions,19%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader