research-article

Building Mashups by Demonstration

Authors:
Rattapoom Tuchinda

National Electronics and Computer Technology Center

National Electronics and Computer Technology Center
View Profile

,
Craig A. Knoblock

University of Southern California

University of Southern California
View Profile

,
Pedro Szekely

University of Southern California

University of Southern California
View Profile

Authors Info & Claims

ACM Transactions on the Web Volume 5 Issue 3Article No.: 16pp 1–45https://doi.org/10.1145/1993053.1993058

Published:01 July 2011Publication History

ACM Transactions on the Web

Abstract

The latest generation of WWW tools and services enables Web users to generate applications that combine content from multiple sources. This type of Web application is referred to as a mashup. Many of the tools for constructing mashups rely on a widget paradigm, where users must select, customize, and connect widgets to build the desired application. While this approach does not require programming, the users must still understand programming concepts to successfully create a mashup. As a result, they are put off by the time, effort, and expertise needed to build a mashup. In this article, we describe our programming-by-demonstration approach to building mashup by example. Instead of requiring a user to select and customize a set of widgets, the user simply demonstrates the integration task by example. Our approach addresses the problems of extracting data from Web sources, cleaning and modeling the extracted data, and integrating the data across sources. We implemented these ideas in a system called Karma, and evaluated Karma on a set of 23 users. The results show that, compared to other mashup construction tools, Karma allows more of the users to successfully build mashups and makes it possible to build these mashups significantly faster compared to using a widget-based approach.

Supplemental Material

Available for Download

pdf

a16-tuchinda_appendix.pdf (35.6 KB)

The proof is given in an electronic appendix, available online in the ACM Digital Library.

References

Abiteboul, S., Cluet, S., Milo, T., Mogilevsky, P., Simeon, J., and Zohar, S. 1999. Tools for data translation and integration. IEEE Data Engin. Bull. 22, 1, 3--8.Google Scholar
Allen, J., Chambers, N., Ferguson, G., Galescu, L., Jung, H., Swift, M., and Taysom, W. 2007. PLOW: A collaborative task learning agent. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI’07). AAAI Press, 1514--1519. Google ScholarDigital Library
Bergamaschi, S., Castano, S., Vincini, M., and Beneventano, D. 2001. Semantic integration of heterogeneous information sources. Data Knowl. Engin. 36, 3, 215--249. Google ScholarDigital Library
Burnett, M. M. and Baker, M. J. 1994. Classification system for visual programming languages. J. Vis. Lang. Comput. 5, 3, 287--300.Google ScholarCross Ref
Chaudhuri, S. and Dayal, U. 1997. An overview of data warehousing and OLAP technology. SIGMOD Rec. 26, 1, 65--74. Google ScholarDigital Library
Chen, W., Kifer, M., and Warren, D. S. 1993. HILOG: A foundation for higher-order logic programming. J. Logic Program. 15, 3, 187--230. Google ScholarDigital Library
Cohen, W. W., Hurst, M., and Jensen, L. S. 2002. A flexible learning system for wrapping tables and lists in html documents. In Proceedings of the 11th International Conference on World Wide Web (WWW’02). ACM, New York, 232--241. Google ScholarDigital Library
Cohen, W. W., Ravikumar, P., and Fienberg, S. E. 2003. A comparison of string distance metrics for name-matching tasks. In Proceedings of the International Joint Conferences on Artificial Intelligence Workshop on Information Integration. 73--78.Google Scholar
Crescenzi, V. and Mecca, G. 2004. Automatic information extraction from large websites. J. ACM 51, 5, 731--779. Google ScholarDigital Library
Cui, Y. 2001. Lineage tracing in data warehouses. Ph.D. thesis, Stanford University. Google ScholarDigital Library
Cunningham, H., Maynard, D., Bontcheva, K., and Tablan, V. 2002. GATE: An architecture for development of robust HLT applications. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, 168--175. Google ScholarDigital Library
Cypher, A., Halbert, D. C., Kurlander, D., Lieberman, H., Maulsby, D., Myers, B. A., and Turransky, A., Eds. 1993. Watch What I Do: Programming by Demonstration. MIT Press, Cambridge, MA. Google ScholarDigital Library
DeRose, P., Chai, X., Gao, B. J., Shen, W., Doan, A., Bohannon, P., and Zhu, X. 2008. Building community Wikipedias: A machine-human partnership approach. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE’08). IEEE Computer Society, Washington, DC, 646--655. Google ScholarDigital Library
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., and Domingos, P. 2004. iMAP: Discovering complex semantic matches between database schemas. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’04). ACM, 383--394. Google ScholarDigital Library
Doan, A., Domingos, P., and Levy, A. 2000. Learning source descriptions for data integration. In Proceedings of the International Workshop on The Web and Databases (WebDB). Springer, 60--71.Google Scholar
Dontcheva, M., Drucker, S. M., Salesin, D., and Cohen, M. F. 2007. Relations, cards, and search templates: User-guided web data integration and layout. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (UIST’07). ACM, 61--70. Google ScholarDigital Library
Ennals, R. and Gay, D. 2007. User-Friendly functional programming for Web mashups. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’07). ACM, 223--234. Google ScholarDigital Library
Etzioni, O. and Etzioni, R. 1994. Statistical methods for analyzing speedup learning experiments. Mach. Learn. 14, 3, 333--347. Google ScholarDigital Library
Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., and Pollak, B. 2007. Towards domain-independent information extraction from Web tables. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, 71--80. Google ScholarDigital Library
Gibson, A., Gamble, M., Wolstencroft, K., Oinn, T., and Goble, C. 2007. The data playground: An intuitive workflow specification environment. In Proceedings of the 3rd IEEE International Conference on e-Science and Grid Computing (E-SCIENCE’07). IEEE Computer Society, 59--68. Google ScholarDigital Library
Gonzalez, H., Halevy, A. Y., Jensen, C. S., Langen, A., Madhavan, J., Shapley, R., and Shen, W. 2010a. Google fusion tables: Data management, integration and collaboration in the cloud. In Proceedings of the 1st Symposium on Cloud Computing, Industrial Track. 175--180. Google ScholarDigital Library
Gonzalez, H., Halevy, A. Y., Jensen, C. S., Langen, A., Madhavan, J., Shapley, R., Shen, W., and Goldberg-Kidon, J. 2010b. Google fusion tables: Web-Centered data management and collaboration. In Proceedings of SIGMOD, Industrial Track. 1061--1066. Google ScholarDigital Library
Halevy, A., Rajaraman, A., and Ordille, J. 2006. Data integration: The teenage years. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB’06). VLDB Endowment, 9--16. Google ScholarDigital Library
Hartmann, B., Wu, L., Collins, K., and Klemmer, S. R. 2007. Programming by a sample: Rapidly creating Web applications with d.mix. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (UIST’07). ACM, 241--250. Google ScholarDigital Library
Hills, M. and Armitage, P. 1979. The two-period cross-over clinical trial. Brit. J. Clin. Pharmacol. 8, 7--20.Google ScholarCross Ref
Huynh, D., Mazzocchi, S., and Karger, D. 2007. Piggy bank: Experience the semantic Web inside your Web browser. Web Semant. 5, 1, 16--27. Google ScholarDigital Library
Huynh, D. F., Miller, R. C., and Karger, D. R. 2008. Potluck: Data mash-up tool for casual users. Web Semant. 6, 4, 274--282. Google ScholarDigital Library
Ives, Z. G., Knoblock, C. A., Minton, S., Jacob, M., Talukdar, P. P., Tuchinda, R., Ambite, J. L., Muslea, M., and Gazen, C. 2009. Interactive data integration through smart copy & paste. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR’’09) (Online Proceedings).Google Scholar
Koudas, N., Marathe, A., and Srivastava, D. 2005. Spider: Flexible matching in databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’05). ACM, New York, 876--878. Google ScholarDigital Library
Kowalczykowski, K., Ong, K. W., Zhao, K. K., Deutsch, A., Papakonstantinou, Y., and Petropoulos, M. 2009. Do-It-Yourself custom forms-driven workflow applications. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR’09) (Online Proceedings).Google Scholar
Lakshmanan, V., Safris, F., and Subramaniant, I. 1996. Schemasql: A language for intereoperability in relational multi-database systems. In Proceedings of the 22th International Conference on Very Large Data Bases (VLDB’96). Morgan Kaufmann, 239--250. Google ScholarDigital Library
Lau, T. 2001. Programming by demonstration: A machine learning approach. Ph.D. thesis, University of Washington. Google ScholarDigital Library
Lau, T., Bergman, L., Castelli, V., and Oblinger, D. 2004. Sheepdog: Learning procedures for technical support. In Proceedings of the 9th International Conference on Intelligent User Interfaces (IUI’04). ACM, 109--116. Google ScholarDigital Library
Lerman, K., Getoor, L., Minton, S., and Knoblock, C. 2004. Using the structure of Web sites for automatic segmentation of tables. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’04). ACM, New York, 119--130. Google ScholarDigital Library
Li, W.-S., Clifton, C., and Liu, S.-Y. 2000. Database integration using neural networks: Implementation and experiences. Knowl. Inf. Syst. 2, 1, 73--96. Google ScholarDigital Library
Lieberman, H. 2001. Your Wish is My Command: Programming by Example. Morgan Kaufmann Publishers, San Francisco, CA.Google Scholar
Michelson, M. and Knoblock, C. A. 2007a. An automatic approach to semantic annotation of unstructured, ungrammatical sources: A first look. In Proceedings of the International Joint Conferences on Artificial Intelligence Workshop on Analytics for Noisy Unstructured Text. 123--130.Google Scholar
Michelson, M. and Knoblock, C. A. 2007b. Unsupervised information extraction from unstructured, ungrammatical data sources on the world wide web. Int. J. Document Anal. Recogn. (Special Issue on Noisy Text Analytics), 10, 3, 211--226. Google ScholarDigital Library
Milo, T. and Zohar, S. 1998. Using schema matching to simplify heterogeneous data translation. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB’98). Morgan Kaufmann Publishers, San Francisco, CA, 122--133. Google ScholarDigital Library
Muslea, I., Minton, S. N., and Knoblock, C. A. 2003. Active learning with strong and weak views: A case study on wrapper induction. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03). Morgan Kaufmann Publishers, San Francisco, CA, 415--420. Google ScholarDigital Library
Perkowitz, M. and Etzioni, O. 1995. Category translation: Learning to understand information on the internet. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI’95). Morgan Kaufmann Publishers, San Francisco, CA, 930--936. Google ScholarDigital Library
Raghavan, S. and Garcia-Molina, H. 2001. Crawling the hidden Web. In Proceedings of the 27th International Conference on Very Large Data Base (VLDB’01). Morgan Kaufmann Publishers, San Francisco, CA, 129--138. Google ScholarDigital Library
Rahm, E. and Bernstein, P. A. 2001. A survey of approaches to automatic schema matching. The VLDB J. 10, 4, 334--350. Google ScholarDigital Library
Raman, V. and Hellerstein, J. M. 2001. Potter’s wheel: An interactive data cleaning system. In Proceedings of the 27th International Conference on Very Large Data Base (VLDB’01). Morgan Kaufmann Publishers, 381--390. Google ScholarDigital Library
Reeve, L. and Han, H. 2005. Survey of semantic annotation platforms. In Proceedings of the ACM Symposium on Applied Computing (SAC’05). ACM, New York, 1634--1638. Google ScholarDigital Library
Riabov, A. V., Bouillet, E., Feblowitz, M. D., Lui, Z., and Ranganatham, A. 2008. Wishful search: Interactive composition of data mashups. In Proceeding of the 17th International Conference on World Wide Web (WWW’08). ACM, New York, 775--784. Google ScholarDigital Library
Segre, A., Elkan, C., and Russell, A. 1991. A critical look at experimental evaluations of EBL. Mach. Learn. 6, 2, 183--195. Google ScholarDigital Library
Sugiura, A. and Koseki, Y. 1998. Internet scrapbook: Automating Web browsing tasks by demonstration. In Proceedings of the 11th Annual ACM Symposium on User Interface Software and Technology (UIST’98). ACM, 9--18. Google ScholarDigital Library
Sutherland, W. R. 1966. The on-line graphical specification of computer procedures. Ph.D. thesis, Massachusetts Institute of Technology.Google Scholar
Tallis, M., Kim, J., and Gil, Y. 2001. User studies of knowledge acquisition tools: Methodology and lessons learned. J. Exper. Theor. Artif. Intell. 13, 4, 359--378.Google ScholarCross Ref
Tuchinda, R. 2008. Building mashups by example. Ph.D. thesis, University of Southern California. Google ScholarDigital Library
Tuchinda, R. and Knoblock, C. A. 2004. Agent wizard: Building information agents by answering questions. In Proceedings of the 9th International Conference on Intelligent User Interfaces (IUI’04). ACM, 340--342. Google ScholarDigital Library
Tuchinda, R., Szekely, P., and Knoblock, C. A. 2007. Building data integration queries by demonstration. In Proceedings of the 12th International Conference on Intelligent User Interfaces (IUI’07). ACM, 170--179. Google ScholarDigital Library
Tuchinda, R., Szekely, P., and Knoblock, C. A. 2008. Building mashups by example. In Proceedings of the 13th International Conference on Intelligent User Interfaces (IUI’08). ACM, New York, 139--148. Google ScholarDigital Library
Wong, J. and Hong, J. I. 2007. Making mashups with marmite: Towards end-user programming for the web. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’07). ACM, New York, 1435--1444. Google ScholarDigital Library
Woolson, R. and Lachencruch, P. 1980. Rank tests for censored matched pairs. Biometrika 67, 3, 597--606.Google ScholarCross Ref
Xu, L. and Embley, D. 2003. Using domain ontologies to discover direct and indirect matches for schema elements. In Proceedings of the 2nd International Semantic Integration Workshop (ISWC’03). 105--110.Google Scholar
Yang, F., Gupta, N., Botev, C., Churchill, E. F., Levchenko, G., and Shanmugasundaram, J. 2008. Wysiwyg development of data driven web applications. Proc. Very Large Data Bases Endowm. 1, 1, 163--175. Google ScholarDigital Library
Zloof, M. M. 1975. Query-by-Example: The invocation and definition of tables and forms. In Proceedings of the 1st International Conference on Very Large Data Bases (VLDB’75). ACM, 1--24. Google ScholarDigital Library

Index Terms

Building Mashups by Demonstration
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods
  2. Interaction design
    1. Interaction design process and methods
      1. User centered design

Recommendations

End-user programming of mashups with vegemite
IUI '09: Proceedings of the 14th international conference on Intelligent user interfaces

Mashups are an increasingly popular way to integrate data from multiple web sites to fit a particular need, but it often requires substantial technical expertise to create them. To lower the barrier for creating mashups, we have extended the CoScripter ...
Read More
Building Mashups by example
IUI '08: Proceedings of the 13th international conference on Intelligent user interfaces

Creating a Mashup, a web application that integrates data from multiple web sources to provide a unique service, involves solving multiple problems, such as extracting data from multiple web sources, cleaning it, and combining it together. Existing work ...
Read More
End-User Development of Mashups with NaturalMash

Context: The emergence of the long-tail in the market of software applications is shifting the role of end-users from mere consumers to becoming developers of applications addressing their unique, personal, and transient needs. On the Web, a popular ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on the Web Volume 5, Issue 3
July 2011
177 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/1993053
Issue’s Table of Contents

Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 July 2011
- Accepted: 1 December 2010
- Revised: 1 October 2010
- Received: 1 October 2008
Published in tweb Volume 5, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Mashups
information integration
programming by demonstration
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 777
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Building Mashups by Demonstration

ACM Transactions on the Web

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

End-user programming of mashups with vegemite

Building Mashups by example

End-User Development of Mashups with NaturalMash

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Building Mashups by Demonstration

ACM Transactions on the Web

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

End-user programming of mashups with vegemite

Building Mashups by example

End-User Development of Mashups with NaturalMash

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media