Skip to main content

Towards a Stepwise Method for Unifying and Reconciling Corporate Names in Public Contracts Metadata: The CORFU Technique

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 390))

Abstract

The present paper introduces a technique to deal with coporate names heterogeneities in the context of public procurement metadata. Public bodies are currently facing a big challenge trying to improve both the performance and the transparency of administrative processes. The e-Government and Open Linked Data initiatives have emerged as efforts to tackle existing interoperability and integration issues among ICT-based systems but the creation of a real transparent environment requires much more than the simple publication of data and information in specific open formats; data and information quality is the next major step in the pubic sector. More specifically in the e-Procurement domain there is a vast amount of valuable metadata that is already available via the Internet protocols and formats and can be used for the creation of new added-value services. Nevertheless the simple extraction of statistics or creation of reports can imply extra tasks with regards to clean, prepare and reconcile data. On the other hand, transparency has become a major objective in public administractions and, in the case of public procurement, one of the most interesting services lies in tracking rewarded contracts (mainly type, location, and supplier). Although it seems a basic kind of reporting service the truth is that its generation can turn into a complex task due to a lack of standardization in supplier names or the use of different descriptors for the type of contract. In this paper, a stepwise method based on natural language processing and semantics to address the unfication of corporate names is defined and implemented. Moreover a research study to evaluate the precision and recall of the proposed technique, using as use case the public dataset of rewarded public contracts in Australia during the period 2004-2012, is also presented. Finally some discussion, conclusions and future work are also outlined.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Araujo, S., Hidders, J., Schwabe, D., De Vries, A.P.: SERIMI Resource Description Similarity, RDF Instance Matching and Interlinking. In: WebDB 2012 (2011)

    Google Scholar 

  2. Erickson, J.: TWC RPI’s OrgPedia Technology Demonstrator (May 2013), http://tw.rpi.edu/orgpedia/

  3. Directorate-General for Informatics European Commission. The eProcurement Map. a map of activities having an impact on the development of european interoperable eprocurement solutions (August 2011), http://www.epractice.eu/en/library/5319079

  4. Galvez, C., Moya-Anegón, F.: The unification of institutional addresses applying parametrized finite-state graphs (P-FSG). Scientometrics 69(2), 323–345 (2006)

    Article  Google Scholar 

  5. Galvez, C., Moya-Anegón, F.: A Dictionary-Based Approach to Normalizing Gene Names in One Domain of Knowledge from the Biomedical Literature. Journal of Documentation 68(1), 5–30 (2012)

    Article  Google Scholar 

  6. Isele, R., Jentzsch, A., Bizer, C.: Silk Server - Adding missing Links while consuming Linked Data. In: COLD (2010)

    Google Scholar 

  7. Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named entity recognition with character-level models. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, CONLL 2003, vol. 4, pp. 180–183. Association for Computational Linguistics, Stroudsburg (2003)

    Chapter  Google Scholar 

  8. Krauthammer, M., Nenadic, G.: Term identification in the biomedical literature. J. of Biomedical Informatics 37(6), 512–526 (2004)

    Article  Google Scholar 

  9. Stanford Natural Language Processing Lecture. Spelling Correction and the Noisy Channel. The Spelling Correction Task (March 2013), http://www.stanford.edu/class/cs124/lec/spelling.pdf

  10. Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.-S.: TwiNER: Named entity recognition in targeted twitter stream. In: Proc. of the 35th International ACM SIGIR, SIGIR 2012, pp. 721–730. ACM, New York (2012)

    Google Scholar 

  11. Loper, E., Bird, S.: NLTK: The Natural Language Toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pp. 62–69. Association for Computational Linguistics, Somerset (2002), http://arXiv.org/abs/cs/0205028

    Google Scholar 

  12. Maali, F., Cyganiak, R., Peristeras, V.: Re-using Cool URIs: Entity Reconciliation Against LOD Hubs. In: Bizer, C., Heath, T., Berners-Lee, T., Hausenblas, M. (eds.) LDOW, CEUR Workshop Proceedings. CEUR-WS.org (2011)

    Google Scholar 

  13. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proc. of the 7th International Conference on Semantic Systems, I-Semantics 2011, pp. 1–8. ACM, New York (2011)

    Google Scholar 

  14. Vafolopoulos, M.M.M., Xidias, G., et al.: Publicspending. gr: Interconnecting and visualizing Greek public expenditure following Linked Open Data directives (July 2012)

    Google Scholar 

  15. Michalec, G., Bender-deMoll, S.: Browser and API for CorpWatch (May 2013), http://croctail.corpwatch.org/

  16. Morillo, F., Aparicio, J., González-Albo, B., Moreno, L.: Towards the automation of address identification. Scientometrics 94(1), 207–224 (2013)

    Article  Google Scholar 

  17. Nadeau, D.: Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision. PhD thesis, School of Information Technology and Engineering, University of Ottawa, Ottawa, Canada (2007)

    Google Scholar 

  18. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  19. Norvig, P.: How to Write a Spelling Corrector (March 2013), http://norvig.com/spell-correct.html

  20. Rodríguez, J.M.Á., Gayo, J.E.L., Silva, F.A.C., Alor-Hernández, G., Sánchez, C., Luna, J.A.G.: Towards a Pan-European E-Procurement Platform to Aggregate, Publish and Search Public Procurement Notices Powered by Linked Open Data: the Moldeas Approach. International Journal of Software Engineering and Knowledge Engineering 22(3), 365–384 (2012)

    Google Scholar 

  21. Rodíguez, J.M.A., Gayo, J.E.L., De Pablos, P.O.: Enabling the Matchmaking of Organizations and Public Procurement Notices by Means of Linked Open Data. Cases on Open-Linked Data and Semantic Web Applications 1(1), 105–131 (2013)

    Google Scholar 

  22. Rodríguez, J.M.A., Paredes, L.P., Azcona, E.R., González, A.R., Gayo, J.E.L., De Pablos, P.O.: Enhancing the Access to Public Procurement Notices by Promoting Product Scheme Classifications to the Linked Open Data Initiative. Cases on Open-Linked Data and Semantic Web Applications 1(1), 1–27 (2013)

    Google Scholar 

  23. Taggart, C., McKinnon, R.: The Open Database of The Corporate World (May 2013), http://opencorporates.com/

  24. Vafolopoulos, M.: The Web economy: goods, users, models and policies. Foundations and Trends® in Web Science, vol. 1. Now Publishers Inc. (2012)

    Google Scholar 

  25. Wang, Y.: Annotating and recognising named entities in clinical notes. In: Proceedings of the ACL-IJCNLP 2009 Student Research Workshop, ACLstudent 2009, pp. 18–26. Association for Computational Linguistics, Stroudsburg (2009)

    Chapter  Google Scholar 

  26. Yeates, S.: Automatic Extraction of Acronyms from Text. In: University of Waikato, pp. 117–124 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Álvarez-Rodríguez, J.M., Ordoñez de Pablos, P., Vafopoulos, M., Labra-Gayo, J.E. (2013). Towards a Stepwise Method for Unifying and Reconciling Corporate Names in Public Contracts Metadata: The CORFU Technique. In: Garoufallou, E., Greenberg, J. (eds) Metadata and Semantics Research. MTSR 2013. Communications in Computer and Information Science, vol 390. Springer, Cham. https://doi.org/10.1007/978-3-319-03437-9_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03437-9_31

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03436-2

  • Online ISBN: 978-3-319-03437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics