Identifying signs of syntactic complexity for rule-based sentence simplification

RICHARD EVANS; CONSTANTIN ORĂSAN

doi:10.1017/S1351324918000384

Identifying signs of syntactic complexity for rule-based sentence simplification

Published online by Cambridge University Press: 31 October 2018

RICHARD EVANS

and

CONSTANTIN ORĂSAN

Show author details

RICHARD EVANS: Affiliation:
Research Institute in Information and Language Processing, University of Wolverhampton, Wolverhampton, UK e-mail: R.J.Evans@wlv.ac.uk, C.Orasan@wlv.ac.uk
CONSTANTIN ORĂSAN: Affiliation:
Research Institute in Information and Language Processing, University of Wolverhampton, Wolverhampton, UK e-mail: R.J.Evans@wlv.ac.uk, C.Orasan@wlv.ac.uk

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readers’ opinions about the accuracy, accessibility, and meaning of this output.

Type: Article
Information: Natural Language Engineering , Volume 25 , Issue 1 , January 2019 , pp. 69 - 119

DOI: https://doi.org/10.1017/S1351324918000384 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This work was supported by the European Commission under the Seventh (FP7-2007–2013) Framework Programme for Research and Technological Development [287607]. We gratefully acknowledge Emma Franklin, Zoë Harrison, and Laura Hasler for their contribution to the development of the datasets used in our research and Iustin Dornescu for his contribution to the development of the sign tagger. For their participation in the user surveys, we thank Martina Cotella, Francesca Della Moretta, Arianna Fabbri, and Victoria Yaneva. We gratefully acknowledge Larissa Sayuri Futino Castro dos Santos for assistance in collating our survey data.

References

Agarwal, R., and Boggess, L., 1992. A simple but useful approach to conjunct identification. In Proceedings of the 30th Annual Meeting for Computational Linguistics, Newark, Delaware: Association for Computational Linguistics, pp. 15–21.Google Scholar

Aluisio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., and Fortes, R. P. M., 2008a. Towards Brazilian Portuguese automatic text simplification systems. In Proceedings of the 8th ACM Symposium on Document Engineering (DocEng ’08), Sao Paulo, Brazil: ACM, pp. 240–8.Google Scholar

Aluisio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., Caseli, H. M., and Fortes, R. P. M., 2008b. A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems. In Proceedings of the 26th Annual ACM International Conference on Design of Communication (SIGDOC ’08), Lisbon, Portugal: ACM, pp. 15–22.Google Scholar

Angrosh, M. A., and Siddharthan, A., 2014. Text simplification using synchronous dependency grammars: generalising automatically harvested rules. In Proceedings of the 8th International Natural Language Generation Conference, Philadelphia, Pennsylvania: Association for Computational Linguistics, pp. 16–25.Google Scholar

Angrosh, M., Nomoto, T., and Siddharthan, A., 2014. Lexico-syntactic text simplification and compression with typed dependencies. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING 2014), Dublin, Ireland, pp. 1996–2006.Google Scholar

Bennetto, L., Pennington, B. F., and Rogers, S. J., 1996. Intact and impaired memory functions in autism. Child Development 67 (4): 1816–35.Google Scholar

Bos, J., 2008. Wide-coverage semantic analysis with boxer. In Proceedings of the 2008 Conference in Semantics in Text Processing, Venice, Italy, pp. 277–86.Google Scholar

Bott, S., Saggion, H., and Figueroa, D., 2012. A hybrid system for Spanish text simplification. In Proceedings of the NAACL-HLT 2012 Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Montréal, Canada, pp. 75–84.Google Scholar

Brill, E., 1994. Some advances in transformation-based part of speech tagging. In Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, Washington, pp. 722–7.Google Scholar

Brouwers, L., Bernhard, D., Ligozat, A.-L., and Francois, T., 2014. Syntactic sentence simplification for French. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) at EACL 2014, Gothenburg, Sweden: Association for Computational Linguistics, pp. 47–56.Google Scholar

Brown, C., Snodgrass, T., Kemper, S. J., Herman, R., and Covington, M. A., 2008. Automatic measurement of propositional idea density from part-of-speech tagging. Behavior Research Methods 40 (2): 540–5.Google Scholar

Canning, Y. 2002. Syntactic Simplification of Text. Ph.d. thesis, University of Sunderland.Google Scholar

Caplan, D., and Waters, G. S., 1999. Verbal working memory and sentence comprehension. Behavioural and Brain Sciences 22 (1): 77–126.Google Scholar

Chandrasekar, R., Doran, C., and Srinivas, B., 1996. Motivations and methods for text simplification. In Proceedings of the 16th International Conference on Computational Linguistics (COLING ’96), Copenhagen, Denmark, pp. 1041–4.Google Scholar

Chomsky, N. 1970. Remarks on nominalization. In Jacobs, R., and Rosenbaum, P. (eds.), Readings in English Transformational Grammar, pp. 184–221. Boston, Massachusetts: Ginn and Company.Google Scholar

Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20 (1): 37–46.Google Scholar

Cohn, T., and Lapata, M., 2009. Sentence compression as tree transduction. Journal of Artificial Intelligence Research 20 (34): 637–74.Google Scholar

Coster, W., and Kauchak, D., 2011. Simple English Wikipedia: a new text simplification task. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011), Portland, Oregon: Association of Computational Linguistics, pp. 665–9.Google Scholar

Daelemans, W., Höthker, A., and Tjong Kim Sang, E., 2004. Automatic sentence simplification for subtitling in Dutch and English. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal, pp. 1045–8.Google Scholar

De Belder, J., and Moens, M. F., 2010. Text simplification for children. In Proceedings of the SIGIR Workshop on Accessible Search Systems, Geneva, Switzerland, pp. 19–26.Google Scholar

DeFrancesco, C., and Perkins, K. 2012. An analysis of the proposition density, sentence and clause types, and nonfinite verbal usage in two college textbooks. In Plakhotnik, M. S., Nielsen, S. M., and Pane, D. M. (eds.), Proceedings of the 11th Annual College of Education & GSN Research Conference, pp. 20–5. Miami, Florida: Florida International University.Google Scholar

de Marneffe, M.-C., MacCartney, W., and Manning, C. D., 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Genoa, Italy: ELDA, pp. 449–54.Google Scholar

Dornescu, I., Evans, R., and Orasan, C., 2013. A tagging approach to identify complex constituents for text simplification. In Proceedings of the 9th International Conference on Recent Advances in Natural Language Processing (RANLP-2013), Hissar, Bulgaria, pp. 221–9.Google Scholar

Evans, R. 2011. Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing 26 (4), 371–88.Google Scholar

Evans, R., and Orasan, C. 2013. Annotating signs of syntactic complexity to support sentence simplification. In Habernal, I. and Matousek, V. (eds.), Text, Speech and Dialogue. Proceedings of the 16th International Conference TSD 2013, pp. 92–104. Plzen, Czech Republic: Springer.Google Scholar

Feblowitz, D., and Kauchak, D., 2013. Sentence simplification as tree transduction. In Proceedings of the 2nd Workshop on Predicting and Improving Text Readability for Target Reader Populations, Sofia, Bulgaria: Association for Computational Linguistics, pp. 1–10.Google Scholar

Ferrés, D., Marimon, M., and Saggion, H., 2015. A web-based text simplification system for english. Procesamiento del Lenguaje Natural 55: 191–4.Google Scholar

Gaizauskas, R., Foster, J., Wilks, Y. Arundel, J., Clough, P., and Piao, S., 2001. The Meter corpus: a corpus for analysing journalistic text reuse. In Proceedings of Corpus Linguistics 2001 Conference, Lancaster, UK: Lancaster University Centre for Computer Corpus Research on Language, pp. 214–23.Google Scholar

Glavas, G., and Stajner, S., 2013. Event-centered simplification of news stories. In Proceedings of the Student Workshop held in conjunction with RANLP 2013, Hissar, Bulgaria, pp. 71–8.Google Scholar

Gonzalez-Dios, I., Aranzabe, M. J., and Díaz de Ilarraza, A., 2018. The corpus of Basque simplified texts (CBST). Language Resources and Evaluation 52 (1): 217–47.Google Scholar

Grover, C., Matheson, C., Mikheev, A., and Moens, M., 2000. LT TTT – a flexible tokenisation tool. In Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece, pp. 1147–54.Google Scholar

Hepple, M. 2000. Independence and commitment: assumptions for rapid training and execution of rule-based POS taggers. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong: Association for Computational Linguistics, pp. 278–85.Google Scholar

Jay, T. B., 2003. The Psychology of Language. Upper Saddle Rive, NJ: Pearson.Google Scholar

Jelínek, T. 2014. Improvements to dependency parsing using automatic simplification of data. In Proceedings of Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland: European Language Resources Association, pp. 73–7.Google Scholar

Jonnalagadda, S., Tari, L., Hakenberg, J., Baral, C., and Gonzalez, G., 2009. Towards effective sentence simplification for automatic processing of biomedical text. In Proceedings of NAACL HLT 2009: Short Papers, Boulder, Colorado: Association for Computational Linguistics, pp. 177–80.Google Scholar

Kincaid, J. P., Fishburne, R. P., Rogers, R. L., and Chissom, B. S. 1975. Derivation of new readability formulas (Automatic readability index, fog count and flesch reading ease formula) for Navy enlisted personnel. CNTECHTRA Research Branch Report 8-75, CNTECHTRA.Google Scholar

Kintsch, W., and Welsch, D. M. 1991. The construction–integration model: a framework for studying memory for text. In Hockley, W. E., and Lewandowsky, S. (eds.), Relating Theory and Data: Essays on Human Memory, pp. 367–85. NJ, Erlbaum: Hillsdale.Google Scholar

Klerke, S., Goldberg, Y., and Søgaard, A., 2016. Improving sentence compression by learning to predict gaze. In Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2016), San Diego, California: Association for Computational Linguistics, pp. 1528–33.Google Scholar

Kudo, T. 2005. Crf++: yet another crf toolkit. http://crfpp.sourceforge.net.Google Scholar

Lafferty, J., McCallum, A., and Pereira, F. C., 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, Rümlang, Switzerland: Morgan Kaufmann, pp. 282–9.Google Scholar

Lei, C.-U., Man, K. L., and Ting, T. O. 2014. Using Coh-Metrix to analyse writing skills of students: a case study in a technological common core curriculum course. In Proceedings of the International MultiConference of Engineers and Computer Scientists 2014 Vol II (IMECS 2014), Hong Kong: IMECS, pp. 3–6.Google Scholar

Levenshtein, V. I., 1966. Binary codes capable of correcting deletions and insertions and reversals. Soviet Physics Doklady 10 (8): 707–10.Google Scholar

Maier, W., Kübler, S., Hinrichs, E., and Kriwanek, J., 2012. Annotating coordination in the penn treebank. In Proceedings of the 6th Linguistic Annotation Workshop, Jeju, Republic of Korea: Association for Computational Linguistics, pp. 166–74.Google Scholar

Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A., 1993. Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19 (2): 313–30.Google Scholar

Martos, J., Freire, S., González, A., Gil, D., Evans, R., Jordanova, V., Cerga, A., Shishkova, A., and Orasan, C. 2013. User preferences: Updated. Technical Report D2.2, Deletrea, Madrid, Spain.Google Scholar

Max, A. 2000. Syntactic Simplification – An Application to Text for Aphasic Readers. Mphil in Computer Speech and Language Processing, Wolfson College, University of Cambridge.Google Scholar

McDonald, R. T., and Nivre, J. 2011. Analyzing and integrating dependency parsers. Computational Linguistics, 37 (1): 197–230.Google Scholar

McNamara, D. S., Graesser, A. C., McCarthy, P. M., and Cai, Z., 2014. Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge, UK: Cambridge University Press.Google Scholar

Mishra, K., Soni, A., Sharma, R., and Sharma, D. 2014. Exploring the effects of sentence simplification on Hindi to English machine translation system In Proceedings of the Workshop on Automatic Text Simplification: Methods and Applications in the Multilingual Society, Dublin, Ireland: Association for Computational Linguistics, pp. 21–9.Google Scholar

Miwa, M., Sætre, R., Miyao, Y., and Tsujii, J., 2010. Entity-focused sentence simplification for relation extraction. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China: Association for Computational Linguistics, pp. 788–96.Google Scholar

Narayan, S., and Gardent, C., 2014. Hybrid simplification using deep semantics and machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland: Association for Computational Linguistics, pp. 435–45.Google Scholar

Ogden, C. K., 1932. Basic English: A General Introduction with Rules and Grammar. London: K. Paul, Trench, Trubner & Co., Ltd.Google Scholar

Paetzold, G. H., and Specia, L., 2013. Text simplification as tree transduction. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology, Fortaleza, CE, Brazil: Sociedade Brasileira de Computação, pp. 116–25.Google Scholar

Papineni, K., Roukos, S., Ward, T., and Zhu, W. J., 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting for Computational Linguistics, Philadelphia, Pennsylvania: Association for Computational Linguistics, pp. 311–8.Google Scholar

Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J., 1985. A Comprehensive Grammar of the English Language. Harlow, Essex: Longman.Google Scholar

Rennes, E., and Jönsson, A., 2015. A tool for automatic simplification of Swedish texts. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), Vilnius, Lithuania: LiU Electronic Press, pp. 317–20.Google Scholar

Rindflesch, T. C., Rajan, J. V., and Hunter, L., 2000. Extracting molecular binding relationships from biomedical text. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, Washington: Association of Computational Linguistics, pp. 188–95.Google Scholar

Saggion, H., S̆tajner, S., Bott, S., Mille, S., Rello, L., and Drndarevic, B., 2015. Making it simplext: implementation and evaluation of a text simplification system for Spanish. ACM Transactions on Accessible Computing (TACCESS) – Special Issue on Speech and Language Processing for AT (Part 2) 6 (4): 14:1–14:36.Google Scholar

Scarton, C., Palmero Aprosio, A., Tonelli, S., Martin-Wanton, T., and Specia, L. 2017. MUSST: a multilingual syntactic simplification tool. In The Companion Volume of the IJCNLP 2017 Proceedings: System Demonstrations, Taipei, Taiwan: AFNLP, pp. 25–8.Google Scholar

Seretan, V., 2012. Acquisition of syntactic simplification rules for French. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey: European Language Resources Association (ELRA), pp. 4019–26.Google Scholar

Sheremetyeva, S., 2014. Automatic text simplification for handling intellectual property (The case of multiple patent claims). In Proceedings of the Workshop on Automatic Text Simplification: Methods and Applications in the Multilingual Society, Dublin, Ireland: Association for Computational Linguistics, pp. 41–52.Google Scholar

Siddharthan, A. 2004. Syntactic Simplification and Text Cohesion. Ph.d. thesis, University of Cambridge.Google Scholar

Siddharthan, A., 2006. Syntactic simplification and text cohesion. Research on Language and Computation 4 (1): 77–109.Google Scholar

Siddharthan, A., 2011. Text simplification using typed dependencies: a comparison of the robustness of different generation strategies. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG ’11), Nancy, France: Association for Computational Linguistics, pp. 2–11.Google Scholar

Siddharthan, A., and Angrosh, M. A., 2014. Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden: Association for Computational Linguistics, pp. 722–31.Google Scholar

S̆tajner, S., Calixto, I., and Saggion, H., 2015. Automatic text simplification for Spanish: comparative evaluation of various simplification strategies. In Proceedings of Recent Advances in Natural Language Processing (RANLP-2015), Hissar, Bulgaria, pp. 618–26.Google Scholar

Suter, J., Ebling, S., and Volk, M., 2016. Rule-based automatic text simplification for German. In Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), Bochum, Germany: Bochumer Linguistische Arbeitsberichte (BLA), pp. 279–87.Google Scholar

Sutton, C., and McCallum, A., 2011. An introduction to conditional random fields. Foundations and Trends in Machine Learning 4 (4): 267–373.Google Scholar

Tomita, M., 1985. Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Norwell, MA, USA: Kluwer Academic Publishers.Google Scholar

Van Delden, S., and Gomez, F., 2002. Combining finite state automata and a greedy learning algorithm to determine the syntactic roles of commas. In Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence (ICTAI ’02), Washington, DC, USA: IEEE Computer Society, pp. 293–301.Google Scholar

Vickrey, D., and Koller, D., 2008. Sentence simplification for semantic role labeling. In Proceedings of the Association for Computational Linguistics: Human Language Technologies (ACL ’08: HLT), Columbus, Ohio, USA: Association for Computational Linguistics, pp. 344–52.Google Scholar

Vu, T. T., Tran, G. B., and Pham, S. B. 2014. Learning to simplify children stories with limited data. In Nguyen, N. T., Attachoo, B., Trawiski, B., and Somboonviwat, K. (eds.), Intelligent Information and Database Systems (ACIIDS 2014), pp. 31–41. Bangkok, Thailand: Springer.Google Scholar

Woodsend, K., and Lapata, M., 2011. Learning to simplify sentences with quasi-synchronous grammar and integer programming. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland: Association for Computational Linguistics, pp. 409–20.Google Scholar

Wubben, S., van den Bosch, A., and Krahmer, E., 2012. Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea: Association for Computational Linguistics, pp. 1015–24.Google Scholar

Xu, W., Callison-Burch, C., and Napoles, C., 2015. Problems in current text simplification research: new data can help. Transactions of the Association for Computational Linguistics 3: 283–97.Google Scholar

Xu, W., Napoles, C., Pavlick, E., Chen, Q., and Callison-Burch, C., 2016. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics 4: 401–15.Google Scholar

Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., and Lee, L., 2010. For the sake of simplicity: unsupervised extraction of lexical simplifications from wikipedia. In Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California: Association of Computational Linguistics, pp. 365–8.Google Scholar

Zhang, X., and Lapata, M., 2017. Sentence simplification with deep reinforcement learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 584–94.Google Scholar

Zhu, Z., Bernhard, D., and Gurevych, I., 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China, pp. 1353–61.Google Scholar

Article contents

Identifying signs of syntactic complexity for rule-based sentence simplification

Abstract

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests