Skip to main content

Coreference Resolution for French Oral Data: Machine Learning Experiments with ANCOR

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Abstract

We present CROC (Coreference Resolution for Oral Corpus), the first machine learning system for coreference resolution in French. One specific aspect of the system is that it has been trained on data that come exclusively from transcribed speech, namely ANCOR (ANaphora and Coreference in ORal corpus), the first large-scale French corpus with anaphorical relation annotations. In its current state, the CROC system requires pre-annotated mentions. We detail the features used for the learning algorithms, and we present a set of experiments with these features. The scores we obtain are close to those of state-of-the-art systems for written English.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)

    Article  Google Scholar 

  2. Bengtson, E., Roth, D.: Understanding the value of features for coreference resolution. In: Proceedings of EMNLP 2008, pp. 236–243 (2008)

    Google Scholar 

  3. Broda, B., Niton, B., Gruszczynski, W., Ogrodniczuk, M.: Measuring readability of polish texts: baseline experiments. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014)

    Google Scholar 

  4. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960)

    Article  Google Scholar 

  5. van Deemter, K., Kibble, R.: On coreferring: coreference in MUC and related annotation schemes. Comput. Linguist. 26(4), 629–637 (2000)

    Article  Google Scholar 

  6. Denis, P.: New learning models for robust reference resolution. Ph.D. thesis, University of Texas at Austin (2007)

    Google Scholar 

  7. Denis, P., Baldridge, J.: Specialized models and ranking for coreference resolution. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, pp. 660–669 (2008)

    Google Scholar 

  8. Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. In: Proceedings of Interspeech (2009)

    Google Scholar 

  9. Gardent, C., Manuélian, H.: Création d’un corpus annoté pour le traitement des descriptions définies. TAL 46(1), 115–139 (2005)

    Google Scholar 

  10. Haghighi, A., Klein, D.: Coreference resolution in a modular, entity-centered model. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 385–393 (2010)

    Google Scholar 

  11. Krippendorff, K.: Content Analysis: An Introduction to Its Methodology. SAGE Publications Inc., Thousand Oaks (2004)

    Google Scholar 

  12. Lassalle, E.: Structured learning with latent trees: a joint approach to coreference resolution. Ph.D. thesis, Université Paris Diderot (2015)

    Google Scholar 

  13. Longo, L.: Vers des moteurs de recherche intelligents: un outil de détection automatique de thèmes. Ph.D. thesis, Université de Strasbourg (2013)

    Google Scholar 

  14. Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., Roukos, S.: A mention-synchronous coreference resolution algorithm based on the bell tree. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (2004)

    Google Scholar 

  15. Mathet, Y., Widlöcher, A.: Une approche holiste et unifiée de l’alignement et de la mesure d’accord inter-annotateurs. In: Actes de TALN, pp. 1–12. ATALA (2011)

    Google Scholar 

  16. Muzerelle, J., Lefeuvre, A., Schang, E., Antoine, J.Y., Pelletier, A., Maurel, D., Eshkol, I., Villaneau, J.: Ancor_centre, a large free spoken French coreference corpus: description of the resource and reliability measures. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014)

    Google Scholar 

  17. Ng, V., Cardie, C.: Improving machine learning approcahes to corefrence resolution. In: Proceedings of ACL 2002, pp. 104–111 (2002)

    Google Scholar 

  18. Nicolas, P., Letellier-Zarshenas, S., Schadle, I., Antoine, J.Y., Caelen, J.: Towards a large corpus of spoken dialogue in French that will be freely available: the parole publique project and its first realisations. In: Proceedings of LREC (2002)

    Google Scholar 

  19. Passonneau, R.J.: Computing reliability for coreference annotation. In: Proceedings of LREC, pp. 1503–1506 (2004)

    Google Scholar 

  20. Recasens, M.: Coreference: theory, resolution, annotation and evaluation. Ph.D. thesis, University of Barcelona (2010)

    Google Scholar 

  21. Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)

    Article  MATH  Google Scholar 

  22. Scott, W.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955)

    Article  Google Scholar 

  23. Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001)

    Article  Google Scholar 

  24. Stoyanov, V., Cardie, C., Gilbert, N., Riloff, E., Buttler, D., Hysom, D.: Reconcile: a coreference resolution research platform. Technical report, Cornell University (2010)

    Google Scholar 

  25. Stoyanov, V., Gilbert, N., Cardie, C., Riloff, E.: Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing, pp. 656–664 (2009)

    Google Scholar 

  26. Tellier, I., Eshkol, I., Taalab, S., Prost, J.P.: POS-tagging for oral texts with CRF and category decomposition. Res. Comput. Sci. 46, 79–90 (2010)

    Google Scholar 

  27. Trouilleux, F.: Identification des reprises et interprétation automatique des expressions pronominales dans des textes en français. Ph.D. thesis, Université Blaise Pascal (2001)

    Google Scholar 

  28. Vieira, R., Salmon-Alt, S., Schang, E.: Multilingual corpora annotation for processing definite descriptions. In: Proceedings of PorTAL (2002)

    Google Scholar 

  29. Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with java implementations (1999)

    Google Scholar 

  30. Yang, X., Su, J., Lang, J., Tan, C.L., Liu, T., Li, S.: An entity-mention model for coreference resolution with inductive logic programming. In: Proceedings of ACL 2008, pp. 843–851 (2008)

    Google Scholar 

  31. Yang, X., Zhou, G., Su, J., Tan, C.L.: Coreference resolution using competition learning approach. In: Proceedings of ACL 2003, pp. 176–183 (2003)

    Google Scholar 

Download references

Acknowledgments

This work was supported by grant ANR-15-CE38-0008 (“DEMOCRAT” project) from the French National Research Agency (ANR), and by APR Centre-Val-de-Loire region (“ANCOR” project).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frédéric Landragin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Désoyer, A., Landragin, F., Tellier, I., Lefeuvre, A., Antoine, JY., Dinarelli, M. (2018). Coreference Resolution for French Oral Data: Machine Learning Experiments with ANCOR. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75477-2_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75476-5

  • Online ISBN: 978-3-319-75477-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics