Article

Free Access

Text segmentation using reiteration and collocation

Authors:
Amanda C. Jobbins

Nottingham Trent University, Nottingham, UK

Nottingham Trent University, Nottingham, UK
View Profile

,
Lindsay J. Evett

Nottingham Trent University, Nottingham, UK

Nottingham Trent University, Nottingham, UK
View Profile

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1August 1998Pages 614–618https://doi.org/10.3115/980845.980947

Published:10 August 1998Publication History

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1

Pages 614–618

ABSTRACT

A method is presented for segmenting text into subtopic areas. The proportion of related pairwise words is calculated between adjacent windows of text to determine their lexical similarity. The lexical cohesion relations of reiteration and collocation are used to identify related words. These relations are automatically located using a combination of three linguistic features: word repetition, collocation and relation weights. This method is shown to successfully detect known subject changes in text and corresponds well to the segmentations placed by test subjects.

References

Beeferman D., Berger A. and Lafferty J. (1997) Text segmentation using exponential models, Proceedings of the 2nd Conference on Empirical Methods in Natural Language ProcessingGoogle Scholar
Church K. W. and Hanks P. (1990) Word association norms, mutual information and lexicography, Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, pp. 76--83 Google ScholarDigital Library
Grosz, B. J. and Sidner, C. L. (1986) Attention, intentions and the structure of discourse, Computational Linguistics, 12(3), pp. 175--204 Google ScholarDigital Library
Halliday M. A. K. and Hasan R. (1976) Cohesion in English, Longman GroupGoogle Scholar
Hearst M. A. (1993) TextTiling: A quantitative approach to discourse segmentation, Technical Report 93/24, Sequoia 2000, University of California, Berkeley Google ScholarDigital Library
Hearst M. A. (1994) Multi-paragraph segmentation of expository texts, Report No. UCB/CSD 94/790, University of California, Berkeley Google ScholarDigital Library
Jobbins A. C and Evett L. J. (1995) Automatic identification of cohesion in texts: Exploiting the lexical organisation of Roget's Thesaurus, Proceedings of ROCLING VIII, Taipei, TaiwanGoogle Scholar
Jobbins A. C. and Evett L. J. (1998) Semantic Information from Roget's Thesaurus: Applied to the Correction of Cursive Script Recognition Output, Proceedings of the International Conference on Computational Linguistics, Speech and Document Processing, India, pp. 65--70Google Scholar
Keenan F. G and Evett L. J. (1989) Lexical structure for natural language processing, Proceedings of the 1st International Lexical Acquisition Workshop at IJCAIGoogle Scholar
Kozima H. (1993) Text segmentation based on similarity between words, Proceedings of the 31st Annual Meeting on the Association for Computational Linguistics, pp. 286--288 Google ScholarDigital Library
Litman D. J. and Passonneau R. J. (1996) Combining knowledge sources for discourse segmentation, Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics Google ScholarDigital Library
Morris J. and Hirst G. (1991) Lexical cohesion computed by thesaural relations as an indicator of the structure of text, Computational Linguistics, 17(1), pp. 21--48 Google ScholarDigital Library
Ponte J. M. and Croft W. B. (1997) Text Segmentation by Topic, 1st European Conference on Research and Advanced Technology for Digital Libraries (ECDL'97), pp. 113--125 Google ScholarDigital Library
Reynar J. C. (1994) An automatic method of finding topic boundaries, Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (Student Session), pp. 331--333 Google ScholarDigital Library
Rotondo J. A. (1984) Clustering analysis of subjective partitions of text, Discourse Processes, 7, pp. 69--88Google ScholarCross Ref
Salton G. and Buckley C. (1991) Global text matching for information retrieval, Science, 253, pp. 1012--1015Google ScholarCross Ref
Salton G. and Buckley C. (1992) Automatic text structuring experiments in "Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval," P. S. Jacobs, ed, Lawrence Earlbaum Associates, New Jersey, pp. 199--210 Google ScholarDigital Library
Salton G., Allen J. and Buckley C. (1994) Automatic structuring and retrieval of large text files, Communications of the Association for Computing Machinery, 37(2), pp. 97--108 Google ScholarDigital Library
Stairmand M. A. (1997) Textual context analysis for information retrieval, Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, Philadelphia, pp. 140--147 Google ScholarDigital Library
Yaari Y. (1997) Segmentation of expository texts by hierarchical agglomerative clustering, RANLP'97, BulgariaGoogle Scholar

Text segmentation using reiteration and collocation
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact ...
Read More
Collocation extraction using monolingual word alignment method
EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Statistical bilingual word alignment has been well studied in the context of machine translation. This paper adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations from monolingual corpus. The monolingual corpus is ...
Read More
Synonymous collocation extraction using translation information
ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

Automatically acquiring synonymous collocation pairs such as <turn on, OBJ, light> and <switch on, OBJ, light> from corpora is a challenging task. For this task, we can, in general, have a large monolingual corpus and/or a very limited bilingual corpus. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1
August 1998
768 pages
Program Chairs:
Christian Boitet
Université Joseph Fourier, Grenoble, France
,
Pete Whitelock
Sharp Laboratories of Europe Ltd., Oxford, United Kingdom
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 10 August 1998
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 732
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Text segmentation using reiteration and collocation

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1

ABSTRACT

References

Cited By

Recommendations

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Collocation extraction using monolingual word alignment method

Synonymous collocation extraction using translation information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Text segmentation using reiteration and collocation

ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1

ABSTRACT

References

Cited By

Recommendations

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Collocation extraction using monolingual word alignment method

Synonymous collocation extraction using translation information

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media