Abstract
Several tasks approached by using text mining techniques, like text categorization, document clustering, or information retrieval, operate on the document level, making use of the so-called bag-of-words model. Other tasks, like document summarization, information extraction, or question answering, have to operate on the sentence level, in order to fulfill their specific requirements. While both groups of text mining tasks are typically affected by the problem of data sparsity, this is more accentuated for the latter group of tasks. Thus, while the tasks of the first group can be tackled by statistical and machine learning methods based on a bag-of-words approach alone, the tasks of the second group need natural language processing (NLP) at the sentence or paragraph level in order to produce more informative features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Blum and T. Mitchell. Combining labeled and unlabeled data with cotraining. In Proc. of the Workshop on Computational Learning Theory, COLT’ 98, Madison, WI, pages 92–100, 1998.
A. J. Carlson, C. M. Cumby, N. D. Rizzolo, J. L. Rosen, and D. Roth. SNoW: Sparse Network of Winnow. 2004.
X. Carreras and L. Márquez. Introduction to the coNLL shared task: Semantic role labeling. In Proc. of 8th Conference of Natural Language Learning, pages 89–97, Boston, MA, 2004.
X. Carreras and L. Márquez. Introduction to the coNLL-2005 shared task: Semantic role labeling. In Proc. of 9th Conference of Natural Language Learning, pages 152–165, Ann Arbor, MI, June 2005.
M. Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania, 1999.
W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. TiMBL: Tilburg Memory Based Learner. 2004.
A. Dubey. Statistical Parsing for German. PhD thesis, University of Saarland, Germany, 2003.
A. Dubey. What to do when lexicalization fails: Parsing German with suffix analysis and smoothing. In Proc. of 43rd Annual Meeting of ACL, Ann Arbor, MI, pages 314–321, 2005.
M. Ellsworth, K. Erk, P. Kingsbury, and S. Padó. PropBank, SALSA, and FrameNet: How design determines product. In Proc. of the LREC 2004 Workshop on Building Lexical Resources from Semantically Annotated Corpora, Lisbon, Portugal, 2004.
K. Erk, A. Kowalski, and S. Padó. The Salsa annotation tool-demo description. In Proc. of the 6th Lorraine-Saarland Workshop, Nancy, France, pages 111–113, 2003.
K. Erk, A. Kowalski, S. Padó, and M. Pinkal. Towards a resource for lexical semantics: A large German corpus with extensive semantic annotation. In Proc. of 41st Annual Meeting of ACL, Saporo, Japan, pages 537–544, 2003.
C. J. Fillmore. Frame semantics and the nature of language. In Annals of the New York Academy of Sciences: Conf. on the Origin and Development of Language and Speech, volume 280, pages 20–32, 1976.
R. Ghani and R. Jones. A comparison of efficacy of bootstrapping of algorithms for information extraction. In Proc. of LREC 2002 Workshop on Linguistic Knowledge Acquisition, Las Palmas, Spain, 2002.
D. Gildea and D. Jurafsky. Automatic labeling of semantic roles. In Computational Linguistics, volume 23, pages 245–288, 2002.
S. Schulte im Walde. Experiments on the Automatic Induction of German Semantic Verb Classes. PhD thesis, Universitat Stuttgart, Germany, 2003.
R. Jones, R. Ghani, T. Mitchell, and E. Riloff. Active learning for information extraction with multiple view features sets. In Proc. of Adaptive Text Extraction and Mining, EMCL/PKDD-03, Cavtat-Dubrovnik, Croatia, pages 26–34, 2003.
W. Lezius. Morphy-German morphology, part-of-speech tagging and applications. In Proc. of 9th Euralex International Congress, Stuttgart, Germany, pages 619–623, 2000.
C. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, MA, 1999.
A. K. McCallum. MALLET: A machine learning for language toolkit, 2002.
R. J. Mooney and R. Bunescu. Mining knowledge from text using information extraction. SIGKDD Explor. Newsl., 7(1):3–10, 2005.
M. Palmer and D. Gildea. The proposition bank: An annotated corpus of semantic roles. In Computational Linguistics, volume 31, pages 71–106, 2005.
S. Pradhan, K. Hacioglu, V. Kruglery, W. Ward, J. H. Martin, and D. Jurafsky. Support vector learning for semantic argument classification. Machine Learning Journal, Kluwer Academic Publishers, 59:1–29, 2005.
E. Rillof and M. Schelzenbach. An empirical approach to conceptual frame acquisition. In Proc. of 6th Workshop on Very Large Corpora, Montreal, Canada, pages 49–56, 1998.
J. Ruppenhofer, M. Ellsworth, M. R. L. Petruck, and C. R. Johnson. FrameNet: Theory and Practice. 2005.
M. Schiehlen. Annotation strategies for probabilistic parsing in German. In Proc. of CoLing’04, Geneva, Switzerland, 2004.
H. Schmid. Improvement in part-of-speech tagging with an application to German. In Proc. of the ACL SIGDAT-Workshop, Dublin, Ireland, pages 47–50, 1995.
H. Schmid. Efficient parsing of highly ambiguous context-free grammars with bit vectors. In Proc. of CoLing’04, Geneva, Switzerland, 2004.
G. Schreiber, H. Akkermans, A. Anjewierden, R. deHoog, N. Shadbolt, W. VandeVelde, and B. Wielinga. Knowledge Engineering and Management: The CommonKADS Methodology. The MIT Press, Cambridge, MA, 2000.
M. Tang, X. Luo, and S. Roukos. Active learning for statistical natural language parsing. In Proc. of the ACL 40th Anniversary Meeting, Philadelphia, PA, pages 120–127, 2002.
S. Weiss, N. Indurkhya, T. Zhang, and F. Damerau. Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer, New York, NY, 2004.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag London Limited
About this chapter
Cite this chapter
Mustafaraj, E., Hoof, M., Freisleben, B. (2007). Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles. In: Kao, A., Poteet, S.R. (eds) Natural Language Processing and Text Mining. Springer, London. https://doi.org/10.1007/978-1-84628-754-1_4
Download citation
DOI: https://doi.org/10.1007/978-1-84628-754-1_4
Publisher Name: Springer, London
Print ISBN: 978-1-84628-175-4
Online ISBN: 978-1-84628-754-1
eBook Packages: Computer ScienceComputer Science (R0)