Towards a Stepwise Method for Unifying and Reconciling Corporate Names in Public Contracts Metadata: The CORFU Technique

Álvarez-Rodríguez, Jose María; Ordoñez de Pablos, Patricia; Vafopoulos, Michail; Labra-Gayo, José Emilio

doi:10.1007/978-3-319-03437-9_31

Towards a Stepwise Method for Unifying and Reconciling Corporate Names in Public Contracts Metadata: The CORFU Technique

Jose María Álvarez-Rodríguez³,
Patricia Ordoñez de Pablos⁴,
Michail Vafopoulos⁵ &
…
José Emilio Labra-Gayo⁴

Conference paper

1325 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 390))

Abstract

The present paper introduces a technique to deal with coporate names heterogeneities in the context of public procurement metadata. Public bodies are currently facing a big challenge trying to improve both the performance and the transparency of administrative processes. The e-Government and Open Linked Data initiatives have emerged as efforts to tackle existing interoperability and integration issues among ICT-based systems but the creation of a real transparent environment requires much more than the simple publication of data and information in specific open formats; data and information quality is the next major step in the pubic sector. More specifically in the e-Procurement domain there is a vast amount of valuable metadata that is already available via the Internet protocols and formats and can be used for the creation of new added-value services. Nevertheless the simple extraction of statistics or creation of reports can imply extra tasks with regards to clean, prepare and reconcile data. On the other hand, transparency has become a major objective in public administractions and, in the case of public procurement, one of the most interesting services lies in tracking rewarded contracts (mainly type, location, and supplier). Although it seems a basic kind of reporting service the truth is that its generation can turn into a complex task due to a lack of standardization in supplier names or the use of different descriptors for the type of contract. In this paper, a stepwise method based on natural language processing and semantics to address the unfication of corporate names is defined and implemented. Moreover a research study to evaluate the precision and recall of the proposed technique, using as use case the public dataset of rewarded public contracts in Australia during the period 2004-2012, is also presented. Finally some discussion, conclusions and future work are also outlined.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Araujo, S., Hidders, J., Schwabe, D., De Vries, A.P.: SERIMI Resource Description Similarity, RDF Instance Matching and Interlinking. In: WebDB 2012 (2011)
Google Scholar
Erickson, J.: TWC RPI’s OrgPedia Technology Demonstrator (May 2013), http://tw.rpi.edu/orgpedia/
Directorate-General for Informatics European Commission. The eProcurement Map. a map of activities having an impact on the development of european interoperable eprocurement solutions (August 2011), http://www.epractice.eu/en/library/5319079
Galvez, C., Moya-Anegón, F.: The unification of institutional addresses applying parametrized finite-state graphs (P-FSG). Scientometrics 69(2), 323–345 (2006)
Article Google Scholar
Galvez, C., Moya-Anegón, F.: A Dictionary-Based Approach to Normalizing Gene Names in One Domain of Knowledge from the Biomedical Literature. Journal of Documentation 68(1), 5–30 (2012)
Article Google Scholar
Isele, R., Jentzsch, A., Bizer, C.: Silk Server - Adding missing Links while consuming Linked Data. In: COLD (2010)
Google Scholar
Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named entity recognition with character-level models. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, CONLL 2003, vol. 4, pp. 180–183. Association for Computational Linguistics, Stroudsburg (2003)
Chapter Google Scholar
Krauthammer, M., Nenadic, G.: Term identification in the biomedical literature. J. of Biomedical Informatics 37(6), 512–526 (2004)
Article Google Scholar
Stanford Natural Language Processing Lecture. Spelling Correction and the Noisy Channel. The Spelling Correction Task (March 2013), http://www.stanford.edu/class/cs124/lec/spelling.pdf
Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.-S.: TwiNER: Named entity recognition in targeted twitter stream. In: Proc. of the 35th International ACM SIGIR, SIGIR 2012, pp. 721–730. ACM, New York (2012)
Google Scholar
Loper, E., Bird, S.: NLTK: The Natural Language Toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pp. 62–69. Association for Computational Linguistics, Somerset (2002), http://arXiv.org/abs/cs/0205028
Google Scholar
Maali, F., Cyganiak, R., Peristeras, V.: Re-using Cool URIs: Entity Reconciliation Against LOD Hubs. In: Bizer, C., Heath, T., Berners-Lee, T., Hausenblas, M. (eds.) LDOW, CEUR Workshop Proceedings. CEUR-WS.org (2011)
Google Scholar
Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proc. of the 7th International Conference on Semantic Systems, I-Semantics 2011, pp. 1–8. ACM, New York (2011)
Google Scholar
Vafolopoulos, M.M.M., Xidias, G., et al.: Publicspending. gr: Interconnecting and visualizing Greek public expenditure following Linked Open Data directives (July 2012)
Google Scholar
Michalec, G., Bender-deMoll, S.: Browser and API for CorpWatch (May 2013), http://croctail.corpwatch.org/
Morillo, F., Aparicio, J., González-Albo, B., Moreno, L.: Towards the automation of address identification. Scientometrics 94(1), 207–224 (2013)
Article Google Scholar
Nadeau, D.: Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision. PhD thesis, School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada (2007)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Norvig, P.: How to Write a Spelling Corrector (March 2013), http://norvig.com/spell-correct.html
Rodríguez, J.M.Á., Gayo, J.E.L., Silva, F.A.C., Alor-Hernández, G., Sánchez, C., Luna, J.A.G.: Towards a Pan-European E-Procurement Platform to Aggregate, Publish and Search Public Procurement Notices Powered by Linked Open Data: the Moldeas Approach. International Journal of Software Engineering and Knowledge Engineering 22(3), 365–384 (2012)
Google Scholar
Rodíguez, J.M.A., Gayo, J.E.L., De Pablos, P.O.: Enabling the Matchmaking of Organizations and Public Procurement Notices by Means of Linked Open Data. Cases on Open-Linked Data and Semantic Web Applications 1(1), 105–131 (2013)
Google Scholar
Rodríguez, J.M.A., Paredes, L.P., Azcona, E.R., González, A.R., Gayo, J.E.L., De Pablos, P.O.: Enhancing the Access to Public Procurement Notices by Promoting Product Scheme Classifications to the Linked Open Data Initiative. Cases on Open-Linked Data and Semantic Web Applications 1(1), 1–27 (2013)
Google Scholar
Taggart, C., McKinnon, R.: The Open Database of The Corporate World (May 2013), http://opencorporates.com/
Vafolopoulos, M.: The Web economy: goods, users, models and policies. Foundations and Trends® in Web Science, vol. 1. Now Publishers Inc. (2012)
Google Scholar
Wang, Y.: Annotating and recognising named entities in clinical notes. In: Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, ACLstudent 2009, pp. 18–26. Association for Computational Linguistics, Stroudsburg (2009)
Chapter Google Scholar
Yeates, S.: Automatic Extraction of Acronyms from Text. In: University of Waikato, pp. 117–124 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

South East European Research Center, 54622, Thessaloniki, Greece
Jose María Álvarez-Rodríguez
WESO Research Group, Department of Computer Science, University of Oviedo, 33007, Oviedo, Spain
Patricia Ordoñez de Pablos & José Emilio Labra-Gayo
Multimedia Technology Laboratory, National Technical University of Athens, 15773, Athens, Greece
Michail Vafopoulos

Authors

Jose María Álvarez-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Patricia Ordoñez de Pablos
View author publications
You can also search for this author in PubMed Google Scholar
Michail Vafopoulos
View author publications
You can also search for this author in PubMed Google Scholar
José Emilio Labra-Gayo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Alexander Technological Educational Institute of Thessaloniki, Macedonia, Greece
Emmanouel Garoufallou
School of Library and Information Science, University of North Carolina at Chapel Hill, 27599-3360, Chapel Hill, NC
Jane Greenberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Álvarez-Rodríguez, J.M., Ordoñez de Pablos, P., Vafopoulos, M., Labra-Gayo, J.E. (2013). Towards a Stepwise Method for Unifying and Reconciling Corporate Names in Public Contracts Metadata: The CORFU Technique. In: Garoufallou, E., Greenberg, J. (eds) Metadata and Semantics Research. MTSR 2013. Communications in Computer and Information Science, vol 390. Springer, Cham. https://doi.org/10.1007/978-3-319-03437-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-03437-9_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03436-2
Online ISBN: 978-3-319-03437-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics