Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles

Mustafaraj, Eni; Hoof, Martin; Freisleben, Bernd

doi:10.1007/978-1-84628-754-1_4

Eni Mustafaraj²,
Martin Hoof³ &
Bernd Freisleben²

4759 Accesses
1 Citations

Abstract

Several tasks approached by using text mining techniques, like text categorization, document clustering, or information retrieval, operate on the document level, making use of the so-called bag-of-words model. Other tasks, like document summarization, information extraction, or question answering, have to operate on the sentence level, in order to fulfill their specific requirements. While both groups of text mining tasks are typically affected by the problem of data sparsity, this is more accentuated for the latter group of tasks. Thus, while the tasks of the first group can be tackled by statistical and machine learning methods based on a bag-of-words approach alone, the tasks of the second group need natural language processing (NLP) at the sentence or paragraph level in order to produce more informative features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 159.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Role of Feature Selection in Text Mining in the Process of Discovering Missing Clinical Annotations – Case Study

Using Text Mining to Validate Diagnoses of Acute Myocardial Infarction

Knowledge Extraction from a Small Corpus of Unstructured Safeguarding Reports

References

A. Blum and T. Mitchell. Combining labeled and unlabeled data with cotraining. In Proc. of the Workshop on Computational Learning Theory, COLT’ 98, Madison, WI, pages 92–100, 1998.
Google Scholar
A. J. Carlson, C. M. Cumby, N. D. Rizzolo, J. L. Rosen, and D. Roth. SNoW: Sparse Network of Winnow. 2004.
Google Scholar
X. Carreras and L. Márquez. Introduction to the coNLL shared task: Semantic role labeling. In Proc. of 8th Conference of Natural Language Learning, pages 89–97, Boston, MA, 2004.
Google Scholar
X. Carreras and L. Márquez. Introduction to the coNLL-2005 shared task: Semantic role labeling. In Proc. of 9th Conference of Natural Language Learning, pages 152–165, Ann Arbor, MI, June 2005.
Google Scholar
M. Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania, 1999.
Google Scholar
W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. TiMBL: Tilburg Memory Based Learner. 2004.
Google Scholar
A. Dubey. Statistical Parsing for German. PhD thesis, University of Saarland, Germany, 2003.
Google Scholar
A. Dubey. What to do when lexicalization fails: Parsing German with suffix analysis and smoothing. In Proc. of 43rd Annual Meeting of ACL, Ann Arbor, MI, pages 314–321, 2005.
Google Scholar
M. Ellsworth, K. Erk, P. Kingsbury, and S. Padó. PropBank, SALSA, and FrameNet: How design determines product. In Proc. of the LREC 2004 Workshop on Building Lexical Resources from Semantically Annotated Corpora, Lisbon, Portugal, 2004.
Google Scholar
K. Erk, A. Kowalski, and S. Padó. The Salsa annotation tool-demo description. In Proc. of the 6th Lorraine-Saarland Workshop, Nancy, France, pages 111–113, 2003.
Google Scholar
K. Erk, A. Kowalski, S. Padó, and M. Pinkal. Towards a resource for lexical semantics: A large German corpus with extensive semantic annotation. In Proc. of 41st Annual Meeting of ACL, Saporo, Japan, pages 537–544, 2003.
Google Scholar
C. J. Fillmore. Frame semantics and the nature of language. In Annals of the New York Academy of Sciences: Conf. on the Origin and Development of Language and Speech, volume 280, pages 20–32, 1976.
Google Scholar
R. Ghani and R. Jones. A comparison of efficacy of bootstrapping of algorithms for information extraction. In Proc. of LREC 2002 Workshop on Linguistic Knowledge Acquisition, Las Palmas, Spain, 2002.
Google Scholar
D. Gildea and D. Jurafsky. Automatic labeling of semantic roles. In Computational Linguistics, volume 23, pages 245–288, 2002.
Article Google Scholar
S. Schulte im Walde. Experiments on the Automatic Induction of German Semantic Verb Classes. PhD thesis, Universitat Stuttgart, Germany, 2003.
Google Scholar
R. Jones, R. Ghani, T. Mitchell, and E. Riloff. Active learning for information extraction with multiple view features sets. In Proc. of Adaptive Text Extraction and Mining, EMCL/PKDD-03, Cavtat-Dubrovnik, Croatia, pages 26–34, 2003.
Google Scholar
W. Lezius. Morphy-German morphology, part-of-speech tagging and applications. In Proc. of 9th Euralex International Congress, Stuttgart, Germany, pages 619–623, 2000.
Google Scholar
C. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, MA, 1999.
MATH Google Scholar
A. K. McCallum. MALLET: A machine learning for language toolkit, 2002.
Google Scholar
R. J. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explor. Newsl., 7(1):3–10, 2005.
Article Google Scholar
M. Palmer and D. Gildea. The proposition bank: An annotated corpus of semantic roles. In Computational Linguistics, volume 31, pages 71–106, 2005.
Article Google Scholar
S. Pradhan, K. Hacioglu, V. Kruglery, W. Ward, J. H. Martin, and D. Jurafsky. Support vector learning for semantic argument classification. Machine Learning Journal, Kluwer Academic Publishers, 59:1–29, 2005.
Google Scholar
E. Rillof and M. Schelzenbach. An empirical approach to conceptual frame acquisition. In Proc. of 6th Workshop on Very Large Corpora, Montreal, Canada, pages 49–56, 1998.
Google Scholar
J. Ruppenhofer, M. Ellsworth, M. R. L. Petruck, and C. R. Johnson. FrameNet: Theory and Practice. 2005.
Google Scholar
M. Schiehlen. Annotation strategies for probabilistic parsing in German. In Proc. of CoLing’04, Geneva, Switzerland, 2004.
Google Scholar
H. Schmid. Improvement in part-of-speech tagging with an application to German. In Proc. of the ACL SIGDAT-Workshop, Dublin, Ireland, pages 47–50, 1995.
Google Scholar
H. Schmid. Efficient parsing of highly ambiguous context-free grammars with bit vectors. In Proc. of CoLing’04, Geneva, Switzerland, 2004.
Google Scholar
G. Schreiber, H. Akkermans, A. Anjewierden, R. deHoog, N. Shadbolt, W. VandeVelde, and B. Wielinga. Knowledge Engineering and Management: The CommonKADS Methodology. The MIT Press, Cambridge, MA, 2000.
Google Scholar
M. Tang, X. Luo, and S. Roukos. Active learning for statistical natural language parsing. In Proc. of the ACL 40th Anniversary Meeting, Philadelphia, PA, pages 120–127, 2002.
Google Scholar
S. Weiss, N. Indurkhya, T. Zhang, and F. Damerau. Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer, New York, NY, 2004.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str., D-35032, Marburg, Germany
Eni Mustafaraj & Bernd Freisleben
Department of Electrical Engineering, FH Kaiserslautern, Morlauterer Str. 31, D-67657, Kaiserslautern, Germany
Martin Hoof

Authors

Eni Mustafaraj
View author publications
You can also search for this author in PubMed Google Scholar
Martin Hoof
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Freisleben
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bellevue, WA, 98008, USA
Anne Kao BA, MA, MS, PhD & Stephen R. Poteet BA, MA, CPhil &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mustafaraj, E., Hoof, M., Freisleben, B. (2007). Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles. In: Kao, A., Poteet, S.R. (eds) Natural Language Processing and Text Mining. Springer, London. https://doi.org/10.1007/978-1-84628-754-1_4

Download citation

DOI: https://doi.org/10.1007/978-1-84628-754-1_4
Publisher Name: Springer, London
Print ISBN: 978-1-84628-175-4
Online ISBN: 978-1-84628-754-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles

Abstract

Access this chapter

Preview

Similar content being viewed by others

The Role of Feature Selection in Text Mining in the Process of Discovering Missing Clinical Annotations – Case Study

Using Text Mining to Validate Diagnoses of Acute Myocardial Infarction

Knowledge Extraction from a Small Corpus of Unstructured Safeguarding Reports

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles

Abstract

Access this chapter

Preview

Similar content being viewed by others

The Role of Feature Selection in Text Mining in the Process of Discovering Missing Clinical Annotations – Case Study

Using Text Mining to Validate Diagnoses of Acute Myocardial Infarction

Knowledge Extraction from a Small Corpus of Unstructured Safeguarding Reports

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation