Using the crowd for readability prediction

ORPHÉE DE CLERCQ; VÉRONIQUE HOSTE; BART DESMET; PHILIP VAN OOSTEN; MARTINE DE COCK; LIEVE MACKEN

doi:10.1017/S1351324912000344

Using the crowd for readability prediction

Published online by Cambridge University Press: 14 December 2012

MARTINE DE COCK and

ORPHÉE DE CLERCQ: Affiliation:
Faculty of Applied Language Studies, University College Ghent, Ghent, Belgium e-mail: orphee.declercq@hogent.be, veronique.hoste@hogent.be, bart.desmet@hogent.be, philip.vanoosten@hogent.be, lieve.macken@hogent.be Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium e-mail: martine.decock@ugent.be
VÉRONIQUE HOSTE: Affiliation:
Faculty of Applied Language Studies, University College Ghent, Ghent, Belgium e-mail: orphee.declercq@hogent.be, veronique.hoste@hogent.be, bart.desmet@hogent.be, philip.vanoosten@hogent.be, lieve.macken@hogent.be Department of Linguistics, Ghent University, Ghent, Belgium
BART DESMET: Affiliation:
Faculty of Applied Language Studies, University College Ghent, Ghent, Belgium e-mail: orphee.declercq@hogent.be, veronique.hoste@hogent.be, bart.desmet@hogent.be, philip.vanoosten@hogent.be, lieve.macken@hogent.be Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium e-mail: martine.decock@ugent.be
PHILIP VAN OOSTEN: Affiliation:
Faculty of Applied Language Studies, University College Ghent, Ghent, Belgium e-mail: orphee.declercq@hogent.be, veronique.hoste@hogent.be, bart.desmet@hogent.be, philip.vanoosten@hogent.be, lieve.macken@hogent.be Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium e-mail: martine.decock@ugent.be
MARTINE DE COCK: Affiliation:
Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium e-mail: martine.decock@ugent.be
LIEVE MACKEN: Affiliation:
Faculty of Applied Language Studies, University College Ghent, Ghent, Belgium e-mail: orphee.declercq@hogent.be, veronique.hoste@hogent.be, bart.desmet@hogent.be, philip.vanoosten@hogent.be, lieve.macken@hogent.be Department of Applied Mathematics and Computer Science, Ghent University, Ghent, Belgium e-mail: martine.decock@ugent.be

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

While human annotation is crucial for many natural language processing tasks, it is often very expensive and time-consuming. Inspired by previous work on crowdsourcing, we investigate the viability of using non-expert labels instead of gold standard annotations from experts for a machine learning approach to automatic readability prediction. In order to do so, we evaluate two different methodologies to assess the readability of a wide variety of text material: A more traditional setup in which expert readers make readability judgments and a crowdsourcing setup for users who are not necessarily experts. To this purpose two assessment tools were implemented: a tool where expert readers can rank a batch of texts based on readability, and a lightweight crowdsourcing tool, which invites users to provide pairwise comparisons. To validate this approach, readability assessments for a corpus of written Dutch generic texts were gathered. By collecting multiple assessments per text, we explicitly wanted to level out readers' background knowledge and attitude. Our findings show that the assessments collected through both methodologies are highly consistent and that crowdsourcing is a viable alternative to expert labeling. This is a good news as crowdsourcing is more lightweight to use and can have access to a much wider audience of potential annotators. By performing a set of basic machine learning experiments using a feature set that mainly encodes basic lexical and morpho-syntactic information, we further illustrate how the collected data can be used to perform text comparisons or to assign an absolute readability score to an individual text. We do not focus on optimising the algorithms to achieve the best possible results for the learning tasks, but carry them out to illustrate the various possibilities of our data sets. The results on different data sets, however, show that our system outperforms the readability formulas and a baseline language modelling approach. We conclude that readability assessment by comparing texts is a polyvalent methodology, which can be adapted to specific domains and target audiences if required.

Type: Articles
Information: Natural Language Engineering , Volume 20 , Issue 3 , July 2014 , pp. 293 - 325

DOI: https://doi.org/10.1017/S1351324912000344 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anderson, R. C., and Davison, A. 1986. Conceptual and empirical bases of readability formulas. Technical Report 392, University of Illinois at Urbana-Champaign, Urbana, IL, USA.Google Scholar

Bailin, A., and Grafstein, A. 2001. The linguistic assumptions underlying readability formulae: a critique. Language & Communication 21 (3): 285–301.Google Scholar

Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. 2000. Bridging the lexical chasm: statistical approaches to answer finding. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'00), New York, USA, pp. 192–9.Google Scholar

Brouwer, R. H. M. 1963. Onderzoek naar de leesmoeilijkheden van Nederlands proza. Pedagogische Studiën 40: 454–64.Google Scholar

Church, K., and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 22–9.Google Scholar

Collins-Thompson, K., and Callan, J. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL 2004, Boston, MA, USA.Google Scholar

Collins-Thompson, K., and Callan, J. 2005. Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology 56: 1448–62.Google Scholar

Cox, E. P. 1980. The optimal number of response alternatives for a scale: a review. Journal of Marketing Research 17 (4): 407–22.Google Scholar

Dale, E., and Chall, J. S. 1948. A formula for predicting readability. Educational Research Bulletin 27: 11–20.Google Scholar

Davison, A., and Kantor, R. N. 1982. On the failure of readability formulas to define readable texts: a case study from adaptations. Reading Research Quarterly 17 (2): 187–209.Google Scholar

Douma, W. 1960. De leesbaarheid van landbouwbladen: een onderzoek naar en een toepassing van leesbaarheidsformules. Bulletin, 17.Google Scholar

DuBay, W. H. 2004. The Principles of Readability. Costa Mesa, CA: Impact Information.Google Scholar

DuBay, W. H. (ed.) 2007. Unlocking Language: The Classic Readability Sstudies. Costa Mesa, CA: BookSurge.Google Scholar

Feng, L., Elhadad, N., and Huenerfauth, M. 2009. Cognitively motivated features for readability assessment. In Proceedings of the 12th Conference of the European Chapter of the ACL, Boulder, CO, USA, pp. 229–37.Google Scholar

Feng, L., Jansche, M., Huenerfauth, M., and Elhadad, N. 2010. A comparison of features for automatic readability assessment. In Proceedings of COLING 2010, Poster Vol. 23–27, Beijing, China, pp. 276–84.Google Scholar

Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., and Dredze, M. 2010. Annotating named entities in Twitter data with crowd sourcing. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon(tm)s Mechanical Turk, Los Angeles, CA, USA, pp. 80–8.Google Scholar

Flesch, R. 1948. A new readability yardstick. Journal of Applied Psychology 32 (3): 221–33.Google Scholar

François, T. 2009. Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL. In Proceedings of the EACL 2009 Student Research Workshop, Athens, Greece.Google Scholar

Graesser, A. C., McNamara, D. S., Louwerse, M. M., and Cai, Z. 2004. Coh-Metrix: analysis of text on cohesion and language. Behavior Research Methods, Instruments and Computers 36: 193–202.Google Scholar

Gunning, R. 1952. The Technique of Clear Writing. New York: McGraw-Hill.Google Scholar

Heilman, M., Collins-Thompson, K., and Eskenazi, M. 2008. An analysis of statistical models and features for reading difficulty prediction. In The Third Workshop on Innovative Use of NLP for Building Educational Applications, Columbus, OH, USA.Google Scholar

Hoste, V., Vanopstal, K., Lefever, E., and Delaere, I. 2010. Classification-based scientific term detection in patient information. Terminology 16: 1–29.Google Scholar

Kanungo, T., and Orr, D. 2009. Predicting the readability of short web summaries. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM'09), New York, NY, USA, pp. 202–11.Google Scholar

Kate, R. J., Luo, X., Patwardhan, S., Franz, M., Florian, R., Mooney, R. J., Roukos, S., and Welty, C. 2010. Learning to predict readability using diverse linguistic features. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China.Google Scholar

Kincaid, J. P., Jr., R. P. F., Rogers, R. L., and Chissom, B. S. 1975. Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for navy-enlisted personnel. Technical Report, Naval Technical Training Command Millington Tenn Research Branch, Department of Navy, Washington, DC.Google Scholar

Kraf, R., and Pander Maat, H. 2009. Leesbaarheidsonderzoek: oude problemen, nieuwe kansen. Tijdschrift voor Taalbeheersing 31 (2): 97–123.Google Scholar

Leroy, G., and Endicott, J. 2011. Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries. In Proceeding of the International Conference on Asia-Pacific Digital Libraries (ICADL 2011), Beijing, China.Google Scholar

Leroy, G., Helmreich, S., and Cowie, J. 2010. The influence of text characteristics on perceived and actual difficulty of health information. International Journal of Medical Informatics 79 (6): 438–49.CrossRef Google Scholar PubMed

McNamara, D. S., Kintsch, E., Songer, N. B., and Kintsch, W. 1993. Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text. Technical Report, Institute of Cognitive Science, University of Colorado, Boulder, CO, USA.Google Scholar

Petersen, S., and Ostendorf, M. 2009. A machine learning approach to reading level assessment. Computer Speech & Language, 23 (1): 89–106.Google Scholar

Pitler, E., and Nenkova, A. 2008. Revisiting readability: a unified framework for predicting text quality. In Proceedings of EMNLP, Waikiki, HI, USA, pp. 186–95.Google Scholar

Poesio, M., Kruschwitz, U., and Chamberlain, J. 2008. ANAWIKI: creating anaphorically annotated resources through web cooperation. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, European Language Resources Association (ELRA).Google Scholar

Rankin, E. F. 1959. The cloze procedure: its validity and utility. Eighth Yearbook of the National Reading Conference 8: 131–44.Google Scholar

Rayson, P., and Garside, R. 2000. Comparing corpora using frequency profiling. In Proceedings of the Workshop on Comparing Corpora, 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, China, pp. 1–6.Google Scholar

Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis and Retrieval of Information by Computer. Boston, MA: Addison Wesley Longman.Google Scholar

Schuurman, I., Hoste, V., and Monachesi, P. 2010. Interacting semantic layers of annotation in SoNaR, a reference corpus of contemporary written Dutch. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta, European Language Resources Association (ELRA), pp. 2471–7.Google Scholar

Schwarm, S. E., and Ostendorf, M. 2005. Reading level assessment using support vector machines and statistical language models. In Proceedings of the 43rd Annual Meeting of the ACL, Ann Arbor, MI, pp. 523–30.Google Scholar

Si, L., and Callan, J. 2001. A statistical model for scientific readability. In Proceedings of the 10th International Conference on Information Knowledge Management, Atlanta, GA, USA, pp. 574–6.Google Scholar

Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Y. 2008. Cheap and fast – but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, October 25–27, Honolulu, HI, USA, pp. 254–63.Google Scholar

Staphorsius, G. 1994. Leesbaarheid en Leesvaardigheid. De Ontwikkeling van een Domeingericht Meetinstrument. Arnhem, Netherlands: Cito.Google Scholar

Staphorsius, G., and Krom, R. S. 1985. Cito Leesbaarheidsindex voor het Basisonderwijs: Verslag van een Leesbaarheidsonderzoek. Number 36 in Specialistisch Bulletin. Arnhem, Netherlands: Cito.Google Scholar

Stolcke, A. 2002. Srilm – an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP-2002), Denver, CO, USA.Google Scholar

Tanaka-Ishii, K., Tezuka, S., and Terada, H. 2010. Sorting texts by readability. Computational Linguistics 36 (2): 203–27.Google Scholar

van den Bosch, A., Busser, B., Daelemans, W., and Canisius, S. 2007. An efficient memory-based morphosyntactic tagger and parser for Dutch. In Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 191–206.Google Scholar

van Noord, G. J., Bouma, G., van Eynde, F., de Kok, D., van der Linde, J., Schuurman, I., Sang, E. T. K., and Vandeghinste, V. 2012. Large-scale syntactic annotation of written Dutch: LASSY. In Essential Speech and Language Technology for Dutch. Series: Theory and Applications of Natural Language Processing. New York: Springer, pp. 147–164.Google Scholar

van Oosten, P., Tanghe, D., and Hoste, V. 2010. Towards an improved methodology for automated readability prediction. In Proceedings of the seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta, European Language Resources Association.Google Scholar

vor der Brück, T., Hartrumpf, S., and Helbig, H. 2008. A readability checker with supervised learning using deep indicators. Informatica 4: 429–35.Google Scholar

Zeng, Q., Goryachev, S., Tse, T., Keselman, A., and Boxwala, A. 2008. Estimating consumer familiarity with health terminology: a context-based approach. JAMIA Journal of the American Medical Informatics Association 15 (3): 349–56.Google Scholar

Article contents

Using the crowd for readability prediction

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests