ABSTRACT
Identifying upcoming topics from a news stream is a challenging and time consuming task for editors since they have to recognize proper keywords, actively search with them, and need to browse the located media assets. To this end, our goal is to enhance an existing newsroom environment to automatically detect upcoming global and regional topics which are suggested for editors further work. To understand the impact of a topic, we provide its evolution over the time and the relations to other subjects as helpful indicators. To achieve our goals, we designed and prototypically implemented an automatic, semantics-based workflow which heavily relies on non-ambiguous named entities extracted from the media assets. Further, we discuss the challenges encountered and point to proper solutions for building your own enterprise-scaled semantics-based application.
- F. Abel, C. Hauff, G.-J. Houben, R. Stronkman, and K. Tao. Twitcident: Fighting Fire with Information from Social Web Stream. In International Conference on Hypertext and Social Media, Milwaukee, USA. ACM, 2012.Google ScholarDigital Library
- C. Best, E. van der Goot, K. Blackler, T. Garcia, and D. Horby. Europe media monitor - system description. Technical Report EUR 22173 EN, European Commission, 2005.Google Scholar
- M. Ehrmann, M. Turchi, and R. Steinberger. Building a multilingual named entity-annotated corpus using annotation projection. In Proceedings of Recent Advances in Natural Language Processing, pages 118--124, 2011.Google Scholar
- T. Groza, S. Handschuh, and K. Moeller. The NEPOMUK Project - on the way to the social semantic desktop. Technical report, Digital Enterprise Research Institute (DERI), 2007.Google Scholar
- T. Heath and C. Bizer. Linked data: Evolving the web into a global data space. Synthesis Lectures on the Semantic Web: Theory and Technology, 1(1):1--136, 2011. Google ScholarCross Ref
- J. Hoffart, F. M. Suchanek, K. Berberich, E. Lewis-Kelham, G. de Melo, and G. Weikum. Yago2: exploring and querying world knowledge in time, space, context, and many languages. In Proceedings of the 20th international conference companion on World wide web, WWW '11, pages 229--232, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- E. Klemm. Das Problem der Distanzbindungen in der hierarchischen Clusteranalyse. Lang, Frankfurt am Main {u.a.}, 1995.Google Scholar
- P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. Dbpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, I-Semantics '11, pages 1--8, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- J. Potoniec and A. Aawrynowicz. Rmonto: Ontological extension to rapidminer. In 10th International Semantic Web Conference (ISWC2011), 2011.Google Scholar
- G. Rizzo and R. Troncy. Nerd: Evaluating named entity recognition tools in the web of data. In Workshop on Web Scale Knowledge Extraction, ISWC, 2011.Google Scholar
- D. Shahaf, C. Guestrin, and E. Horvitz. Trains of thought: generating information maps. In Proceedings of the 21st international conference on World Wide Web, WWW '12, pages 899--908, New York, NY, USA, 2012. ACM. Google ScholarDigital Library
- M. Voigt, A. Mitschick, and J. Schulz. Yet another triple store benchmark? practical experiences with real-world data. In 2nd International Workshop on Semantic Digital Archives (SDA2012), 2012.Google Scholar
Index Terms
- Towards topics-based, semantics-assisted news search
Recommendations
Towards Faceted Search for Named Entity Queries
Advances in Web and Network Technologies, and Information ManagementA considerable fraction of the web queries contain named entities. This, coupled with the fact that a proper name might refer to multiple entities, imposes the ever-increasing need that search engines handle efficiently named entity queries. In this ...
Disambiguating toponyms in news
HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language ProcessingThis research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ...
A distributional semantics approach to simultaneous recognition of multiple classes of named entities
CICLing'10: Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text ProcessingNamed Entity Recognition and Classification is being studied for last two decades. Since semantic features take huge amount of training time and are slow in inference, the existing tools apply features and rules mainly at the word level or use lexicons. ...
Comments