Skip to main content

Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles

  • Chapter
Natural Language Processing and Text Mining

Abstract

Several tasks approached by using text mining techniques, like text categorization, document clustering, or information retrieval, operate on the document level, making use of the so-called bag-of-words model. Other tasks, like document summarization, information extraction, or question answering, have to operate on the sentence level, in order to fulfill their specific requirements. While both groups of text mining tasks are typically affected by the problem of data sparsity, this is more accentuated for the latter group of tasks. Thus, while the tasks of the first group can be tackled by statistical and machine learning methods based on a bag-of-words approach alone, the tasks of the second group need natural language processing (NLP) at the sentence or paragraph level in order to produce more informative features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 149.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. A. Blum and T. Mitchell. Combining labeled and unlabeled data with cotraining. In Proc. of the Workshop on Computational Learning Theory, COLT’ 98, Madison, WI, pages 92–100, 1998.

    Google Scholar 

  2. A. J. Carlson, C. M. Cumby, N. D. Rizzolo, J. L. Rosen, and D. Roth. SNoW: Sparse Network of Winnow. 2004.

    Google Scholar 

  3. X. Carreras and L. Márquez. Introduction to the coNLL shared task: Semantic role labeling. In Proc. of 8th Conference of Natural Language Learning, pages 89–97, Boston, MA, 2004.

    Google Scholar 

  4. X. Carreras and L. Márquez. Introduction to the coNLL-2005 shared task: Semantic role labeling. In Proc. of 9th Conference of Natural Language Learning, pages 152–165, Ann Arbor, MI, June 2005.

    Google Scholar 

  5. M. Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania, 1999.

    Google Scholar 

  6. W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. TiMBL: Tilburg Memory Based Learner. 2004.

    Google Scholar 

  7. A. Dubey. Statistical Parsing for German. PhD thesis, University of Saarland, Germany, 2003.

    Google Scholar 

  8. A. Dubey. What to do when lexicalization fails: Parsing German with suffix analysis and smoothing. In Proc. of 43rd Annual Meeting of ACL, Ann Arbor, MI, pages 314–321, 2005.

    Google Scholar 

  9. M. Ellsworth, K. Erk, P. Kingsbury, and S. Padó. PropBank, SALSA, and FrameNet: How design determines product. In Proc. of the LREC 2004 Workshop on Building Lexical Resources from Semantically Annotated Corpora, Lisbon, Portugal, 2004.

    Google Scholar 

  10. K. Erk, A. Kowalski, and S. Padó. The Salsa annotation tool-demo description. In Proc. of the 6th Lorraine-Saarland Workshop, Nancy, France, pages 111–113, 2003.

    Google Scholar 

  11. K. Erk, A. Kowalski, S. Padó, and M. Pinkal. Towards a resource for lexical semantics: A large German corpus with extensive semantic annotation. In Proc. of 41st Annual Meeting of ACL, Saporo, Japan, pages 537–544, 2003.

    Google Scholar 

  12. C. J. Fillmore. Frame semantics and the nature of language. In Annals of the New York Academy of Sciences: Conf. on the Origin and Development of Language and Speech, volume 280, pages 20–32, 1976.

    Google Scholar 

  13. R. Ghani and R. Jones. A comparison of efficacy of bootstrapping of algorithms for information extraction. In Proc. of LREC 2002 Workshop on Linguistic Knowledge Acquisition, Las Palmas, Spain, 2002.

    Google Scholar 

  14. D. Gildea and D. Jurafsky. Automatic labeling of semantic roles. In Computational Linguistics, volume 23, pages 245–288, 2002.

    Article  Google Scholar 

  15. S. Schulte im Walde. Experiments on the Automatic Induction of German Semantic Verb Classes. PhD thesis, Universitat Stuttgart, Germany, 2003.

    Google Scholar 

  16. R. Jones, R. Ghani, T. Mitchell, and E. Riloff. Active learning for information extraction with multiple view features sets. In Proc. of Adaptive Text Extraction and Mining, EMCL/PKDD-03, Cavtat-Dubrovnik, Croatia, pages 26–34, 2003.

    Google Scholar 

  17. W. Lezius. Morphy-German morphology, part-of-speech tagging and applications. In Proc. of 9th Euralex International Congress, Stuttgart, Germany, pages 619–623, 2000.

    Google Scholar 

  18. C. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, MA, 1999.

    MATH  Google Scholar 

  19. A. K. McCallum. MALLET: A machine learning for language toolkit, 2002.

    Google Scholar 

  20. R. J. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explor. Newsl., 7(1):3–10, 2005.

    Article  Google Scholar 

  21. M. Palmer and D. Gildea. The proposition bank: An annotated corpus of semantic roles. In Computational Linguistics, volume 31, pages 71–106, 2005.

    Article  Google Scholar 

  22. S. Pradhan, K. Hacioglu, V. Kruglery, W. Ward, J. H. Martin, and D. Jurafsky. Support vector learning for semantic argument classification. Machine Learning Journal, Kluwer Academic Publishers, 59:1–29, 2005.

    Google Scholar 

  23. E. Rillof and M. Schelzenbach. An empirical approach to conceptual frame acquisition. In Proc. of 6th Workshop on Very Large Corpora, Montreal, Canada, pages 49–56, 1998.

    Google Scholar 

  24. J. Ruppenhofer, M. Ellsworth, M. R. L. Petruck, and C. R. Johnson. FrameNet: Theory and Practice. 2005.

    Google Scholar 

  25. M. Schiehlen. Annotation strategies for probabilistic parsing in German. In Proc. of CoLing’04, Geneva, Switzerland, 2004.

    Google Scholar 

  26. H. Schmid. Improvement in part-of-speech tagging with an application to German. In Proc. of the ACL SIGDAT-Workshop, Dublin, Ireland, pages 47–50, 1995.

    Google Scholar 

  27. H. Schmid. Efficient parsing of highly ambiguous context-free grammars with bit vectors. In Proc. of CoLing’04, Geneva, Switzerland, 2004.

    Google Scholar 

  28. G. Schreiber, H. Akkermans, A. Anjewierden, R. deHoog, N. Shadbolt, W. VandeVelde, and B. Wielinga. Knowledge Engineering and Management: The CommonKADS Methodology. The MIT Press, Cambridge, MA, 2000.

    Google Scholar 

  29. M. Tang, X. Luo, and S. Roukos. Active learning for statistical natural language parsing. In Proc. of the ACL 40th Anniversary Meeting, Philadelphia, PA, pages 120–127, 2002.

    Google Scholar 

  30. S. Weiss, N. Indurkhya, T. Zhang, and F. Damerau. Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer, New York, NY, 2004.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag London Limited

About this chapter

Cite this chapter

Mustafaraj, E., Hoof, M., Freisleben, B. (2007). Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles. In: Kao, A., Poteet, S.R. (eds) Natural Language Processing and Text Mining. Springer, London. https://doi.org/10.1007/978-1-84628-754-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-84628-754-1_4

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84628-175-4

  • Online ISBN: 978-1-84628-754-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics