Skip to main content

Discovering Linguistic Patterns Using Sequence Mining

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7181))

Abstract

In this paper, we present a method based on data mining techniques to automatically discover linguistic patterns matching appositive qualifying phrases. We develop an algorithm mining sequential patterns made of itemsets with gap and linguistic constraints. The itemsets allow several kinds of information to be associated with one term. The advantage is the extraction of linguistic patterns with more expressiveness than the usual sequential patterns. In addition, the constraints enable to automatically prune irrelevant patterns. In order to manage the set of generated patterns, we propose a solution based on a partial ordering. A human user can thus easily validate them as relevant linguistic patterns. We illustrate the efficiency of our approach over two corpora coming from a newspaper.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE. IEEE (1995)

    Google Scholar 

  2. Bonchi, F.: On closed constrained frequent pattern mining. In: Proc. IEEE Int. Conf. on Data Mining, ICDM 2004, pp. 35–42. Press (2004)

    Google Scholar 

  3. Califf, M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. In: AAAI 1999, pp. 328–334 (1999)

    Google Scholar 

  4. Cellier, P., Charnois, T., Plantevit, M.: Sequential Patterns to Discover and Characterise Biological Relations. In: Gelbukh, A. (ed.) CICLing 2010. LNCS, vol. 6008, pp. 537–548. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Davey, B.A., Priestley, H.A.: Introduction To Lattices And Order. Cambridge University Press (1990)

    Google Scholar 

  6. Dong, G., Pei, J.: Sequence Data Mining. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  7. Ferr, S.: Camelis: a logical information system to organize and browse a collection of documents. Int. J. General Systems 38(4) (2009)

    Google Scholar 

  8. Frawley, W.J., Piatetsky-Shapiro, G., Matheus, C.J.: Knowledge discovery in databases: An overview. In: KDD, pp. 1–30. AAAI/MIT Press (1991)

    Google Scholar 

  9. Fundel, K., Küffner, R., Zimmer, R.: RelEx - relation extraction using dependency parse trees. Bioinformatics 23(3), 365–371 (2007)

    Article  Google Scholar 

  10. Giuliano, C., Lavelli, A., Romano, L.: Exploiting shallow linguistic information for relation extraction from biomedical literature. In: EACL (2006)

    Google Scholar 

  11. Hobbs, J.R., Riloff, E.: Information extraction. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton, FL (2010)

    Google Scholar 

  12. Jackiewicz, A.: Structures avec constituants détachés et jugements d’évaluation. Document Numérique 13(3), 11–40 (2010)

    Article  Google Scholar 

  13. Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology 9 (2008)

    Google Scholar 

  14. Nédellec, C.: Machine learning for information extraction in genomics - state of the art and perspectives. In: Text Mining and its Applications: Results of the NEMIS Launch Conf., Studies in Fuzziness and Soft Comp., Sirmakessis, Spiros (2004)

    Google Scholar 

  15. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp. 215–224. IEEE Computer Society (2001)

    Google Scholar 

  16. Riloff, E.: Automatically generating extraction patterns from untagged text. In: AAAI/IAAI 1996 (1996)

    Google Scholar 

  17. Sagot, B., Clément, L., de La Clergerie, E., Boullier, P.: The lefff 2 syntactic lexicon for french: architecture, acquisition, use. In: LREC 2006, Głnes, Italy (2009)

    Google Scholar 

  18. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing (September 1994)

    Google Scholar 

  19. Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)

    Google Scholar 

  20. Wang, J., Han, J.: Bide: Efficient mining of frequent closed sequences. In: ICDE, pp. 79–90. IEEE Computer Society (2004)

    Google Scholar 

  21. Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large databases. In: Barbará, D., Kamath, C. (eds.) SDM. SIAM (2003)

    Google Scholar 

  22. Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal 42(1/2 ), 31–60 (2001) (special issue on Unsupervised Learning)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Béchet, N., Cellier, P., Charnois, T., Crémilleux, B. (2012). Discovering Linguistic Patterns Using Sequence Mining. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28604-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28604-9_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28603-2

  • Online ISBN: 978-3-642-28604-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics