Coreference Resolution for French Oral Data: Machine Learning Experiments with ANCOR

Désoyer, Adèle; Landragin, Frédéric; Tellier, Isabelle; Lefeuvre, Anaïs; Antoine, Jean-Yves; Dinarelli, Marco

doi:10.1007/978-3-319-75477-2_36

Adèle Désoyer^{14,15,16,17,18},
Frédéric Landragin ORCID: orcid.org/0000-0002-0030-7200^14,15,16,17,
Isabelle Tellier^14,15,16,17,
Anaïs Lefeuvre^19,20,
Jean-Yves Antoine¹⁹ &
…
Marco Dinarelli^14,15,16,17

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1373 Accesses
1 Citations

Abstract

We present CROC (Coreference Resolution for Oral Corpus), the first machine learning system for coreference resolution in French. One specific aspect of the system is that it has been trained on data that come exclusively from transcribed speech, namely ANCOR (ANaphora and Coreference in ORal corpus), the first large-scale French corpus with anaphorical relation annotations. In its current state, the CROC system requires pre-annotated mentions. We detail the features used for the learning algorithms, and we present a set of experiments with these features. The scores we obtain are close to those of state-of-the-art systems for written English.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Article Google Scholar
Bengtson, E., Roth, D.: Understanding the value of features for coreference resolution. In: Proceedings of EMNLP 2008, pp. 236–243 (2008)
Google Scholar
Broda, B., Niton, B., Gruszczynski, W., Ogrodniczuk, M.: Measuring readability of polish texts: baseline experiments. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960)
Article Google Scholar
van Deemter, K., Kibble, R.: On coreferring: coreference in MUC and related annotation schemes. Comput. Linguist. 26(4), 629–637 (2000)
Article Google Scholar
Denis, P.: New learning models for robust reference resolution. Ph.D. thesis, University of Texas at Austin (2007)
Google Scholar
Denis, P., Baldridge, J.: Specialized models and ranking for coreference resolution. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, pp. 660–669 (2008)
Google Scholar
Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. In: Proceedings of Interspeech (2009)
Google Scholar
Gardent, C., Manuélian, H.: Création d’un corpus annoté pour le traitement des descriptions définies. TAL 46(1), 115–139 (2005)
Google Scholar
Haghighi, A., Klein, D.: Coreference resolution in a modular, entity-centered model. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 385–393 (2010)
Google Scholar
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology. SAGE Publications Inc., Thousand Oaks (2004)
Google Scholar
Lassalle, E.: Structured learning with latent trees: a joint approach to coreference resolution. Ph.D. thesis, Université Paris Diderot (2015)
Google Scholar
Longo, L.: Vers des moteurs de recherche intelligents: un outil de détection automatique de thèmes. Ph.D. thesis, Université de Strasbourg (2013)
Google Scholar
Luo, X., Ittycheriah, A., Jing, H., Kambhatla, N., Roukos, S.: A mention-synchronous coreference resolution algorithm based on the bell tree. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (2004)
Google Scholar
Mathet, Y., Widlöcher, A.: Une approche holiste et unifiée de l’alignement et de la mesure d’accord inter-annotateurs. In: Actes de TALN, pp. 1–12. ATALA (2011)
Google Scholar
Muzerelle, J., Lefeuvre, A., Schang, E., Antoine, J.Y., Pelletier, A., Maurel, D., Eshkol, I., Villaneau, J.: Ancor_centre, a large free spoken French coreference corpus: description of the resource and reliability measures. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland (2014)
Google Scholar
Ng, V., Cardie, C.: Improving machine learning approcahes to corefrence resolution. In: Proceedings of ACL 2002, pp. 104–111 (2002)
Google Scholar
Nicolas, P., Letellier-Zarshenas, S., Schadle, I., Antoine, J.Y., Caelen, J.: Towards a large corpus of spoken dialogue in French that will be freely available: the parole publique project and its first realisations. In: Proceedings of LREC (2002)
Google Scholar
Passonneau, R.J.: Computing reliability for coreference annotation. In: Proceedings of LREC, pp. 1503–1506 (2004)
Google Scholar
Recasens, M.: Coreference: theory, resolution, annotation and evaluation. Ph.D. thesis, University of Barcelona (2010)
Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J.C., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Comput. 13(7), 1443–1471 (2001)
Article MATH Google Scholar
Scott, W.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955)
Article Google Scholar
Soon, W.M., Ng, H.T., Lim, D.C.Y.: A machine learning approach to coreference resolution of noun phrases. Comput. Linguist. 27(4), 521–544 (2001)
Article Google Scholar
Stoyanov, V., Cardie, C., Gilbert, N., Riloff, E., Buttler, D., Hysom, D.: Reconcile: a coreference resolution research platform. Technical report, Cornell University (2010)
Google Scholar
Stoyanov, V., Gilbert, N., Cardie, C., Riloff, E.: Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing, pp. 656–664 (2009)
Google Scholar
Tellier, I., Eshkol, I., Taalab, S., Prost, J.P.: POS-tagging for oral texts with CRF and category decomposition. Res. Comput. Sci. 46, 79–90 (2010)
Google Scholar
Trouilleux, F.: Identification des reprises et interprétation automatique des expressions pronominales dans des textes en français. Ph.D. thesis, Université Blaise Pascal (2001)
Google Scholar
Vieira, R., Salmon-Alt, S., Schang, E.: Multilingual corpora annotation for processing definite descriptions. In: Proceedings of PorTAL (2002)
Google Scholar
Witten, I.H., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.J.: Weka: practical machine learning tools and techniques with java implementations (1999)
Google Scholar
Yang, X., Su, J., Lang, J., Tan, C.L., Liu, T., Li, S.: An entity-mention model for coreference resolution with inductive logic programming. In: Proceedings of ACL 2008, pp. 843–851 (2008)
Google Scholar
Yang, X., Zhou, G., Su, J., Tan, C.L.: Coreference resolution using competition learning approach. In: Proceedings of ACL 2003, pp. 176–183 (2003)
Google Scholar

Download references

Acknowledgments

This work was supported by grant ANR-15-CE38-0008 (“DEMOCRAT” project) from the French National Research Agency (ANR), and by APR Centre-Val-de-Loire region (“ANCOR” project).

Author information

Authors and Affiliations

Lattice, CNRS, ENS, Paris, Orléans, France
Adèle Désoyer, Frédéric Landragin, Isabelle Tellier & Marco Dinarelli
Université de Paris 3, Paris, Orléans, France
Adèle Désoyer, Frédéric Landragin, Isabelle Tellier & Marco Dinarelli
Université Sorbonne Paris Cité, Paris, Orléans, France
Adèle Désoyer, Frédéric Landragin, Isabelle Tellier & Marco Dinarelli
PSL Research University, Paris, Orléans, France
Adèle Désoyer, Frédéric Landragin, Isabelle Tellier & Marco Dinarelli
Modyco, CNRS, Université Paris Ouest – Nanterre La Défense, Nanterre, Orléans, France
Adèle Désoyer
LIFAT, CNRS, Université François Rabelais de Tours, Tours, Orléans, France
Anaïs Lefeuvre & Jean-Yves Antoine
LIFO, Université d’Orléans, Orléans, France
Anaïs Lefeuvre

Authors

Adèle Désoyer
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Landragin
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Tellier
View author publications
You can also search for this author in PubMed Google Scholar
Anaïs Lefeuvre
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Yves Antoine
View author publications
You can also search for this author in PubMed Google Scholar
Marco Dinarelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frédéric Landragin .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Désoyer, A., Landragin, F., Tellier, I., Lefeuvre, A., Antoine, JY., Dinarelli, M. (2018). Coreference Resolution for French Oral Data: Machine Learning Experiments with ANCOR. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-75477-2_36
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics