ABSTRACT
Enterprise mashup scenarios often involve feeds derived from data created primarily for eye consumption, such as email, news, calendars, blogs, and web feeds. These data sources can test the capabilities of current data mashup products, as the attributes needed to perform join, aggregation, and other operations are often buried within unstructured feed text. Information extraction technology is a key enabler in such scenarios, using annotators to convert unstructured text into structured information that can facilitate mashup operations.
Our demo presents the integration of SystemT, an information extraction system from IBM Research, with IBM's InfoSphere MashupHub. We show how to build domain-specific annotators with SystemT's declarative rule language, AQL, and how to use these annotators to combine structured and unstructured information in an enterprise mashup.
- A. Jhingran, "Enterprise Information Mashups: Integrating Information, Simply", VLDB 2006: 3--4. Google ScholarDigital Library
- IBM Infosphere MashupHub, http://www-01.ibm.com/software/data/info20/how-it-works.htmlGoogle Scholar
- Simmen, D., Altinel, M., Markl, V., Padmanaban S., Singh, A. Damia: Data Mashups for Intranet Applications. Sigmod 2008 Google ScholarDigital Library
- Calais, http://www.opencalais.comGoogle Scholar
- Reiss, F., Raghavan, S., Krishnamurthy, R., Zhu, H.,Vaithyanathan, S.: An Algebraic Approach to Rule-Based Information Extraction. ICDE 2008 Google ScholarDigital Library
- SystemT, http://www.alphaworks.ibm.com/tech/systemt/Google Scholar
- R. Krishnamurthy et al., "SystemT: A System for Declarative Information Extraction", to appear, SIGMOD Record. Google ScholarDigital Library
Index Terms
- Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT
Recommendations
The SystemT IDE: an integrated development environment for information extraction rules
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataInformation Extraction (IE)-the problem of extracting structured information from unstructured text - has become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE ...
SystemT: an algebraic approach to declarative information extraction
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational LinguisticsAs information extraction (IE) becomes more central to enterprise applications, rule-based IE engines have become increasingly important. In this paper, we describe SystemT, a rule-based IE system whose basic design removes the expressivity and ...
SystemT: a declarative information extraction system
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems DemonstrationsEmerging text-intensive enterprise applications such as social analytics and semantic search pose new challenges of scalability and usability to Information Extraction (IE) systems. This paper presents SystemT, a declarative IE system that addresses ...
Comments