demonstration

Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT

Authors:
David E. Simmen

IBM Almaden Research Center, San Jose, CA, USA

IBM Almaden Research Center, San Jose, CA, USA
View Profile

,
Frederick Reiss

IBM Almaden Research Center, San Jose , CA, USA

IBM Almaden Research Center, San Jose , CA, USA
View Profile

,
Yunyao Li

IBM Almaden Research Center, San Jose , CA, USA

IBM Almaden Research Center, San Jose , CA, USA
View Profile

,
Suresh Thalamati

IBM Almaden Research Center, San Jose , CA, USA

IBM Almaden Research Center, San Jose , CA, USA
View Profile

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataJune 2009Pages 1123–1126https://doi.org/10.1145/1559845.1559999

Published:29 June 2009Publication History

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Pages 1123–1126

ABSTRACT

Enterprise mashup scenarios often involve feeds derived from data created primarily for eye consumption, such as email, news, calendars, blogs, and web feeds. These data sources can test the capabilities of current data mashup products, as the attributes needed to perform join, aggregation, and other operations are often buried within unstructured feed text. Information extraction technology is a key enabler in such scenarios, using annotators to convert unstructured text into structured information that can facilitate mashup operations.

Our demo presents the integration of SystemT, an information extraction system from IBM Research, with IBM's InfoSphere MashupHub. We show how to build domain-specific annotators with SystemT's declarative rule language, AQL, and how to use these annotators to combine structured and unstructured information in an enterprise mashup.

References

A. Jhingran, "Enterprise Information Mashups: Integrating Information, Simply", VLDB 2006: 3--4. Google ScholarDigital Library
IBM Infosphere MashupHub, http://www-01.ibm.com/software/data/info20/how-it-works.htmlGoogle Scholar
Simmen, D., Altinel, M., Markl, V., Padmanaban S., Singh, A. Damia: Data Mashups for Intranet Applications. Sigmod 2008 Google ScholarDigital Library
Calais, http://www.opencalais.comGoogle Scholar
Reiss, F., Raghavan, S., Krishnamurthy, R., Zhu, H.,Vaithyanathan, S.: An Algebraic Approach to Rule-Based Information Extraction. ICDE 2008 Google ScholarDigital Library
SystemT, http://www.alphaworks.ibm.com/tech/systemt/Google Scholar
R. Krishnamurthy et al., "SystemT: A System for Declarative Information Extraction", to appear, SIGMOD Record. Google ScholarDigital Library

Index Terms

Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT
1. Information systems

Recommendations

The SystemT IDE: an integrated development environment for information extraction rules
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Information Extraction (IE)-the problem of extracting structured information from unstructured text - has become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE ...
Read More
SystemT: an algebraic approach to declarative information extraction
ACL '10: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

As information extraction (IE) becomes more central to enterprise applications, rule-based IE engines have become increasingly important. In this paper, we describe SystemT, a rule-based IE system whose basic design removes the expressivity and ...
Read More
SystemT: a declarative information extraction system
HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations

Emerging text-intensive enterprise applications such as social analytics and semantic search pose new challenges of scalability and usability to Information Extraction (IE) systems. This paper presents SystemT, a declarative IE system that addresses ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
June 2009
1168 pages
ISBN:9781605585512
DOI:10.1145/1559845
Editors:
Carsten Binnig,
Benoit Dageville,
General Chairs:
Uğur Çetintemel
Brown University, USA
,
Stan Zdonik
Brown University, USA
,
Program Chair:
Donald Kossmann
ETH Zurich, Switzerland
Copyright © 2009 Copyright is held by the owner/author(s)
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 June 2009
Check for updates
Author Tags
feeds
information integration
mashups
text analytics
Qualifiers
- demonstration
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 384
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

The SystemT IDE: an integrated development environment for information extraction rules

SystemT: an algebraic approach to declarative information extraction

SystemT: a declarative information extraction system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

The SystemT IDE: an integrated development environment for information extraction rules

SystemT: an algebraic approach to declarative information extraction

SystemT: a declarative information extraction system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media