Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods

Authors:
Georgios Petasis

Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece

Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece
View Profile

,
Alessandro Cucchiarelli

Istituto di Informatica, Università di Ancona, Via Brecce Bianche, Ancona

Istituto di Informatica, Università di Ancona, Via Brecce Bianche, Ancona
View Profile

,
Paola Velardi

Dip. di Scienze dell'Informazione, Università di Roma 'La Sapienza', Via Salaria 113, Roma

Dip. di Scienze dell'Informazione, Università di Roma 'La Sapienza', Via Salaria 113, Roma
View Profile

,
Georgios Paliouras

Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece

Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece
View Profile

,
Vangelis Karkaletsis

Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece

Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece
View Profile

,
Constantine D. Spyropoulos

Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece

Software and Knowledge Engineering Laboratory, Institute of Informatics and Telecommunications, National Centre for Scientific Research 'Demokritos', 153 10 Ag. Paraskevi, Athens, Greece
View Profile

SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrievalJuly 2000Pages 128–135https://doi.org/10.1145/345508.345563

Published:01 July 2000Publication History

SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval

Pages 128–135

ABSTRACT

The recognition of Proper Nouns (PNs) is considered an important task in the area of Information Retrieval and Extraction. However the high performance of most existing PN classifiers heavily depends upon the availability of large dictionaries of domain-specific Proper Nouns, and a certain amount of manual work for rule writing or manual tagging. Though it is not a heavy requirement to rely on some existing PN dictionary (often these resources are available on the web), its coverage of a domain corpus may be rather low, in absence of manual updating. In this paper we propose a technique for the automatic updating of an PN Dictionary through the cooperation of an inductive and a probabilistic classifier. In our experiments we show that, whenever an existing PN Dictionary allows the identification of 50% of the proper nouns within a corpus, our technique allows, without additional manual effort, the successful recognition of about 90% of the remaining 50%.

References

1.Basili, g., Pazienza M.T., Velardi P., A (not-so) shallow parser for colloeational analysis. Proc. of Coling '94, Kyoto, Japan, 1994. Google ScholarDigital Library
2.Basili, R., Marziali A., Pazienza M.T., Modelling syntax uncertainty in lexical acquisition from texts. Journal of Quantitative Linguistics, vol. 1, n. 1, 1994.Google Scholar
3.Bikel D., Miller S., Schwartz R. and Weischedel R., Nymble: a High-Performance Learning Name-finder. Proc. of 5th Conference on Applied natural Language Processing, Washington, 1997 Google ScholarDigital Library
4.A. Borthwick, J. Sterling, E. Agichten and R. Gnshman. NYU: Description of the MENE named Entity system as Used in MUC-7. Proc. of MUC-7, 1998Google Scholar
5.Brill, E., Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging, Computational Linguistics, vol. 21, n. 24, 1995. Google ScholarDigital Library
6.Cowie, J. Description of the CRL/NMSU System Used for MUC-6. In {DARPA 1995}. Google ScholarDigital Library
7.Cucchiarelli A. and Velardi P., Finding a Domain- Appropriate Sense Inventory for Semantically Tagging a Corpus. Int. Journal on Natural Language Engineering, December 1998 Google ScholarDigital Library
8.Cucchiarelli A. and Velardi P, Using Corpus Evidence for Automatic Gazetteer Extension. Proc. of Conf, on Language Resources and Evaluation, Granada, Spain, 28-30 May 1998Google Scholar
9.Defense Advanced Research Projects Agency. Proceedings of the Sixth Message Understanding Conference (MUC-6), Morgan Kaufinann.Google Scholar
10.Defense Advanced Research Projects Agency. Proceedings of the Seventh Message Understanding Conference (MUC- 7), Morgan Kaufmann.Google Scholar
11.Day, D., Robinson, P., Vilain, M., and Yeh, A. Description of the ALEMBIC system as used for MUC-7. In {DARPA 1998}.Google Scholar
12.Gale, W. K. Church and D. Yarowsky. One sense per discourse. Proc. of the DARPA speech and Natural Language workshop, Harriman, NY, February 1992 Google ScholarDigital Library
13.Grishman, R., J. Sterling, Generalizing Automatically Generated Selectional Patterns. Proc. of COLING '94, Kyoto, August 1994. Google ScholarDigital Library
14.Humphreys, K., Gaizauskas, R., Cunningham, H., and Azzam, S. VIE Technical Specifications. Department of Computer Science, University of Sheffield.Google Scholar
15.Miller, George A., WordNet: a lexical database for English. Communications of the ACM 38 (11), November 1995, pp. 39 - 41 Google ScholarDigital Library
16.Quinlan, J. R., C4.5: Programs for machine learning, Morgan-Kaufmann, San Mateo, CA, 1993. Google ScholarDigital Library
17.S. Sekine, NYU System for Japanese NE-MET2. Proc. of MUC-7, 1998Google Scholar
18.Vilain, M., and Day, D., Finite-state phrase parsing by rule sequences. Proceedings of COLING.96, vol. 1, pp. 274-279. Google ScholarDigital Library
19.Yarowsky D., Word-Sense disambiguation using statistical models of Roget's categories trained on large corpora. Proc. of COLING 92, Nantes, July 1992. Google ScholarDigital Library

Index Terms

Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods

Recommendations

English-Arabic proper-noun transliteration-pairs creation

Proper nouns may be considered the most important query words in information retrieval. If the two languages use the same alphabet, the same proper nouns can be found in either language. However, if the two languages use different alphabets, the names ...
Read More
Converting on-line bilingual dictionaries from human-readable to machine-readable form
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval

We describe a language called ABET that allows rapid conversion of on-line human-readable bilingual dictionaries to machine-readable form.

Read More
Machine Learning-based approach to automatic POS tagging of Macedonian language
BCI '17: Proceedings of the 8th Balkan Conference in Informatics

This paper presents the research that has contributed to the creation of an automatic part-of-speech (POS) tagger of Macedonian, a Slavic language that has a rich morphology, but limited language resources and contributions towards establishing of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
July 2000
396 pages
ISBN:1581132263
DOI:10.1145/345508
Chairmen:
Emmanuel Yannakoudakis
Athens Univ. of Economics and Business, Greece
,
Nicholas J. Belkin
Rutgers Univ.
,
Mun-Kew Leong
Kent Ridge Digital Labs
,
Peter Ingwersen
Royal School of Library and Information Science
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information extraction
machine learning and IR
natural language processing for IR
text data mining
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 136
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods

SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

English-Arabic proper-noun transliteration-pairs creation

Converting on-line bilingual dictionaries from human-readable to machine-readable form

Machine Learning-based approach to automatic POS tagging of Macedonian language