Discovering Linguistic Patterns Using Sequence Mining

Béchet, Nicolas; Cellier, Peggy; Charnois, Thierry; Crémilleux, Bruno

doi:10.1007/978-3-642-28604-9_13

Discovering Linguistic Patterns Using Sequence Mining

Nicolas Béchet¹⁷,
Peggy Cellier¹⁸,
Thierry Charnois¹⁷ &
…
Bruno Crémilleux¹⁷

Conference paper

2088 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7181))

Abstract

In this paper, we present a method based on data mining techniques to automatically discover linguistic patterns matching appositive qualifying phrases. We develop an algorithm mining sequential patterns made of itemsets with gap and linguistic constraints. The itemsets allow several kinds of information to be associated with one term. The advantage is the extraction of linguistic patterns with more expressiveness than the usual sequential patterns. In addition, the constraints enable to automatically prune irrelevant patterns. In order to manage the set of generated patterns, we propose a solution based on a partial ordering. A human user can thus easily validate them as relevant linguistic patterns. We illustrate the efficiency of our approach over two corpora coming from a newspaper.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE. IEEE (1995)
Google Scholar
Bonchi, F.: On closed constrained frequent pattern mining. In: Proc. IEEE Int. Conf. on Data Mining, ICDM 2004, pp. 35–42. Press (2004)
Google Scholar
Califf, M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. In: AAAI 1999, pp. 328–334 (1999)
Google Scholar
Cellier, P., Charnois, T., Plantevit, M.: Sequential Patterns to Discover and Characterise Biological Relations. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 537–548. Springer, Heidelberg (2010)
Chapter Google Scholar
Davey, B.A., Priestley, H.A.: Introduction To Lattices And Order. Cambridge University Press (1990)
Google Scholar
Dong, G., Pei, J.: Sequence Data Mining. Springer, Heidelberg (2007)
MATH Google Scholar
Ferr, S.: Camelis: a logical information system to organize and browse a collection of documents. Int. J. General Systems 38(4) (2009)
Google Scholar
Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in databases: An overview. In: KDD, pp. 1–30. AAAI/MIT Press (1991)
Google Scholar
Fundel, K., Küffner, R., Zimmer, R.: RelEx - relation extraction using dependency parse trees. Bioinformatics 23(3), 365–371 (2007)
Article Google Scholar
Giuliano, C., Lavelli, A., Romano, L.: Exploiting shallow linguistic information for relation extraction from biomedical literature. In: EACL (2006)
Google Scholar
Hobbs, J.R., Riloff, E.: Information extraction. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton, FL (2010)
Google Scholar
Jackiewicz, A.: Structures avec constituants détachés et jugements d’évaluation. Document Numérique 13(3), 11–40 (2010)
Article Google Scholar
Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology 9 (2008)
Google Scholar
Nédellec, C.: Machine learning for information extraction in genomics - state of the art and perspectives. In: Text Mining and its Applications: Results of the NEMIS Launch Conf., Studies in Fuzziness and Soft Comp., Sirmakessis, Spiros (2004)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp. 215–224. IEEE Computer Society (2001)
Google Scholar
Riloff, E.: Automatically generating extraction patterns from untagged text. In: AAAI/IAAI 1996 (1996)
Google Scholar
Sagot, B., Clément, L., de La Clergerie, E., Boullier, P.: The lefff 2 syntactic lexicon for french: architecture, acquisition, use. In: LREC 2006, Głnes, Italy (2009)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing (September 1994)
Google Scholar
Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)
Google Scholar
Wang, J., Han, J.: Bide: Efficient mining of frequent closed sequences. In: ICDE, pp. 79–90. IEEE Computer Society (2004)
Google Scholar
Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large databases. In: Barbará, D., Kamath, C. (eds.) SDM. SIAM (2003)
Google Scholar
Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal 42(1/2 ), 31–60 (2001) (special issue on Unsupervised Learning)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

GREYC Université de Caen Basse-Normandie, Campus II science 3, 14032, Caen CEDEX, France
Nicolas Béchet, Thierry Charnois & Bruno Crémilleux
INSA Rennes/IRISA, Campus de Beaulieu, 35042, Rennes cedex, France
Peggy Cellier

Authors

Nicolas Béchet
View author publications
You can also search for this author in PubMed Google Scholar
Peggy Cellier
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Charnois
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Crémilleux
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Béchet, N., Cellier, P., Charnois, T., Crémilleux, B. (2012). Discovering Linguistic Patterns Using Sequence Mining. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28604-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-28604-9_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28603-2
Online ISBN: 978-3-642-28604-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics