ABSTRACT
Tagging has become a wide-spread tool for organising content, from photos and music, to research paper and data-visualisations. Organising tags in a taxonomy adds hierarchical structure and relationships, this can be helpful, both for finding and applying tags to new content, as well as for enabling query expansion when searching. However, taxonomies can be very time-consuming to create and maintain. If a hierarchical taxonomy could be automatically built and adapted to a particular domain, the entry cost for using taxonomies for structuring information would go down. Small and medium enterprises (SMEs) do not currently have sufficient resources to invest in Enterprise 2.0 technologies like taxonomies, wikis or blogging as the entry cost it too high. The OrganiK project aims to make Enterprise 2.0 features available with low entry- and maintenance costs.
In this paper, an algorithm and methodology to automatically create and maintain taxonomies is presented. It analyses enterprise document corpora and uses background information from domain-specific data sources or from the Linked Open Data cloud to improve and contextualise the created SKOS taxonomy. Content created in a Drupal-based Enterprise 2.0 content management system is automatically categorised, and the automatically created taxonomy is extended when needed. The system has been tested with corpora of medical abstracts, computer science papers, and the Enron email collection, and is in productive use.
- C. Beylier, F. Pourroy, F. Villeneuve, and A. Mille. A collaboration-centred approach to manage engineering knowledge: a case study of an engineering sme. Journal of Engineering Design, 20(6):523--542, December 2009. cited in OrganiK D1.1.Google ScholarCross Ref
- D. Bibikas, E. Kargioti, D. Panagiotou, K. Christidis, L. Sauermann, A. C. Vasconcelos, and A. Bernardi. D2.1 organik km framework specification. Deliverable 2.1, OrganiK Consortium, Leading Partner: SEERC, June 2009. Public.Google Scholar
- D. Bibikas, D. Kourtesis, I. Paraskakis, A. Bernardi, L. Sauermann, D. Apostolou, G. Mentzas, and A. C. Vasconcelos. A sociotechnical approach to knowledge management in the era of enterprise 2.0: the case of organik. Scalable Computing: Practice and Experience Scientific International Journal for Parallel and Distributed Computing, 9(4):315--327, December 2008. Special Issue: The Web on the Move.Google Scholar
- S. Brants and S. Hansen. Developments in the tiger annotation scheme and their realization in the corpus. In In Proceedings of the Third Conference on Language Resources and Evaluation LREC-02. Las Palmas de Gran Canaria, pages 1643--1649, 2002.Google Scholar
- Y. Y. Bryan Klimt. Introducing the enron corpus. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), July 30 and 31 2004. Mountain View, CA.Google Scholar
- P. Cimiano. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer, New York, 2006. Google ScholarDigital Library
- P. Cimiano, A. Hotho, and S. Staab. Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence Research, 24:305--339, 2005. Google ScholarDigital Library
- K. Dellschaft and S. Staab. On How to Perform a Gold Standard Based Evaluation of Ontology Learning. In In Proceedings of the 5th International Semantic Web Conference (ISWC2006), volume 4273 of LNCS, Athens, GA, USA, November 2006. Google ScholarDigital Library
- M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Fourteenth International Conference on Computational Linguistics, pages 539--545, Nantes, France, July 1992. Google ScholarDigital Library
- G. Kobilarov, T. Scott, Y. Raimond, S. Oliver, C. Sizemore, M. Smethurst, C. Bizer, and R. Lee. Media meets semantic web - how the bbc uses dbpedia and linked data to make connections. In L. Aroyo, P. Traverso, F. Ciravegna, P. Cimiano, T. Heath, E. HyvÃűnen, R. Mizoguchi, E. Oren, M. Sabou, and E. P. B. Simperl, editors, ESWC, volume 5554 of Lecture Notes in Computer Science, pages 723--737. Springer, 2009. Google ScholarDigital Library
- A. Kreiser, A. Nauerz, and F. Bakalov. A web 3.0 approach for improving tagging systems. In In Proceedings of Workshop on Web 3.0: Merging Semantic Web and Social Web 2009, volume 467 of CEUR Workshop Proceedings, Turin, Italy, June 29 2009. ISSN 1613--0073.Google Scholar
- A. Ratnaparkhi. Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania, Philadelphia, PA, 1998. Google ScholarDigital Library
- S. Schoenmackers, O. Etzioni, and D. S. Weld. Scaling textual inference to the web. In EMNLP '08: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 79--88, Morristown, NJ, USA, 2008. Association for Computational Linguistics. Google ScholarDigital Library
- M. Ummel. Sea change: Toward a new world semantic enterprise architecture. Cutter IT Journal, 22(11):34--39, November 2009.Google Scholar
- F. Wu and D. S. Weld. Automatically refining the wikipedia infobox ontology. In WWW '08: Proceeding of the 17th international conference on World Wide Web, pages 635--644, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- T. Zijlstra, A. Vasconcelos, G. Mentzas, D. Bibikas, I. Paraskakis, D. Panagiotou, G. Grimnes, and A. Bernardi. D1.1 state-of-the-art review: Knowledge management in smes. Deliverable 1.1, OrganiK Consortium, Leading Partner: USFD, March 26 2009. Public.Google Scholar
Index Terms
- Using linked open data to bootstrap corporate knowledge management in the OrganiK project
Recommendations
Enterprise Content Management Systems as a Knowledge Infrastructure: The Knowledge-Based Content Management Framework
The rise of the knowledge-based economy has significantly transformed the economies of developed countries from managed economies into entrepreneurial economies, which deal with knowledge as both input and output. Consequently, knowledge has become a ...
Towards a Knowledge-Based Framework for Enterprise Content Management
HICSS '14: Proceedings of the 2014 47th Hawaii International Conference on System SciencesNowadays, critical information that is contained in mostly unstructured documents is increasingly becoming a key business resource. Accordingly, enterprises need a foundation for managing content to understand its value and transform it into information ...
Automatic tag expansion using visual similarity for photo sharing websites
In this paper we present an automatic photo tag expansion method designed for photo sharing websites. The purpose of the method is to suggest tags that are relevant to the visual content of a given photo at upload time. Both textual and visual cues are ...
Comments