skip to main content
research-article

Building Mashups by Demonstration

Published:01 July 2011Publication History
Skip Abstract Section

Abstract

The latest generation of WWW tools and services enables Web users to generate applications that combine content from multiple sources. This type of Web application is referred to as a mashup. Many of the tools for constructing mashups rely on a widget paradigm, where users must select, customize, and connect widgets to build the desired application. While this approach does not require programming, the users must still understand programming concepts to successfully create a mashup. As a result, they are put off by the time, effort, and expertise needed to build a mashup. In this article, we describe our programming-by-demonstration approach to building mashup by example. Instead of requiring a user to select and customize a set of widgets, the user simply demonstrates the integration task by example. Our approach addresses the problems of extracting data from Web sources, cleaning and modeling the extracted data, and integrating the data across sources. We implemented these ideas in a system called Karma, and evaluated Karma on a set of 23 users. The results show that, compared to other mashup construction tools, Karma allows more of the users to successfully build mashups and makes it possible to build these mashups significantly faster compared to using a widget-based approach.

Skip Supplemental Material Section

Supplemental Material

References

  1. Abiteboul, S., Cluet, S., Milo, T., Mogilevsky, P., Simeon, J., and Zohar, S. 1999. Tools for data translation and integration. IEEE Data Engin. Bull. 22, 1, 3--8.Google ScholarGoogle Scholar
  2. Allen, J., Chambers, N., Ferguson, G., Galescu, L., Jung, H., Swift, M., and Taysom, W. 2007. PLOW: A collaborative task learning agent. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI’07). AAAI Press, 1514--1519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bergamaschi, S., Castano, S., Vincini, M., and Beneventano, D. 2001. Semantic integration of heterogeneous information sources. Data Knowl. Engin. 36, 3, 215--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Burnett, M. M. and Baker, M. J. 1994. Classification system for visual programming languages. J. Vis. Lang. Comput. 5, 3, 287--300.Google ScholarGoogle ScholarCross RefCross Ref
  5. Chaudhuri, S. and Dayal, U. 1997. An overview of data warehousing and OLAP technology. SIGMOD Rec. 26, 1, 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chen, W., Kifer, M., and Warren, D. S. 1993. HILOG: A foundation for higher-order logic programming. J. Logic Program. 15, 3, 187--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cohen, W. W., Hurst, M., and Jensen, L. S. 2002. A flexible learning system for wrapping tables and lists in html documents. In Proceedings of the 11th International Conference on World Wide Web (WWW’02). ACM, New York, 232--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cohen, W. W., Ravikumar, P., and Fienberg, S. E. 2003. A comparison of string distance metrics for name-matching tasks. In Proceedings of the International Joint Conferences on Artificial Intelligence Workshop on Information Integration. 73--78.Google ScholarGoogle Scholar
  9. Crescenzi, V. and Mecca, G. 2004. Automatic information extraction from large websites. J. ACM 51, 5, 731--779. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cui, Y. 2001. Lineage tracing in data warehouses. Ph.D. thesis, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cunningham, H., Maynard, D., Bontcheva, K., and Tablan, V. 2002. GATE: An architecture for development of robust HLT applications. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, 168--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cypher, A., Halbert, D. C., Kurlander, D., Lieberman, H., Maulsby, D., Myers, B. A., and Turransky, A., Eds. 1993. Watch What I Do: Programming by Demonstration. MIT Press, Cambridge, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. DeRose, P., Chai, X., Gao, B. J., Shen, W., Doan, A., Bohannon, P., and Zhu, X. 2008. Building community Wikipedias: A machine-human partnership approach. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE’08). IEEE Computer Society, Washington, DC, 646--655. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dhamankar, R., Lee, Y., Doan, A., Halevy, A., and Domingos, P. 2004. iMAP: Discovering complex semantic matches between database schemas. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’04). ACM, 383--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Doan, A., Domingos, P., and Levy, A. 2000. Learning source descriptions for data integration. In Proceedings of the International Workshop on The Web and Databases (WebDB). Springer, 60--71.Google ScholarGoogle Scholar
  16. Dontcheva, M., Drucker, S. M., Salesin, D., and Cohen, M. F. 2007. Relations, cards, and search templates: User-guided web data integration and layout. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (UIST’07). ACM, 61--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ennals, R. and Gay, D. 2007. User-Friendly functional programming for Web mashups. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’07). ACM, 223--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Etzioni, O. and Etzioni, R. 1994. Statistical methods for analyzing speedup learning experiments. Mach. Learn. 14, 3, 333--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., and Pollak, B. 2007. Towards domain-independent information extraction from Web tables. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gibson, A., Gamble, M., Wolstencroft, K., Oinn, T., and Goble, C. 2007. The data playground: An intuitive workflow specification environment. In Proceedings of the 3rd IEEE International Conference on e-Science and Grid Computing (E-SCIENCE’07). IEEE Computer Society, 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gonzalez, H., Halevy, A. Y., Jensen, C. S., Langen, A., Madhavan, J., Shapley, R., and Shen, W. 2010a. Google fusion tables: Data management, integration and collaboration in the cloud. In Proceedings of the 1st Symposium on Cloud Computing, Industrial Track. 175--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gonzalez, H., Halevy, A. Y., Jensen, C. S., Langen, A., Madhavan, J., Shapley, R., Shen, W., and Goldberg-Kidon, J. 2010b. Google fusion tables: Web-Centered data management and collaboration. In Proceedings of SIGMOD, Industrial Track. 1061--1066. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Halevy, A., Rajaraman, A., and Ordille, J. 2006. Data integration: The teenage years. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB’06). VLDB Endowment, 9--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hartmann, B., Wu, L., Collins, K., and Klemmer, S. R. 2007. Programming by a sample: Rapidly creating Web applications with d.mix. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (UIST’07). ACM, 241--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hills, M. and Armitage, P. 1979. The two-period cross-over clinical trial. Brit. J. Clin. Pharmacol. 8, 7--20.Google ScholarGoogle ScholarCross RefCross Ref
  26. Huynh, D., Mazzocchi, S., and Karger, D. 2007. Piggy bank: Experience the semantic Web inside your Web browser. Web Semant. 5, 1, 16--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Huynh, D. F., Miller, R. C., and Karger, D. R. 2008. Potluck: Data mash-up tool for casual users. Web Semant. 6, 4, 274--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ives, Z. G., Knoblock, C. A., Minton, S., Jacob, M., Talukdar, P. P., Tuchinda, R., Ambite, J. L., Muslea, M., and Gazen, C. 2009. Interactive data integration through smart copy & paste. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR’’09) (Online Proceedings).Google ScholarGoogle Scholar
  29. Koudas, N., Marathe, A., and Srivastava, D. 2005. Spider: Flexible matching in databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’05). ACM, New York, 876--878. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kowalczykowski, K., Ong, K. W., Zhao, K. K., Deutsch, A., Papakonstantinou, Y., and Petropoulos, M. 2009. Do-It-Yourself custom forms-driven workflow applications. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR’09) (Online Proceedings).Google ScholarGoogle Scholar
  31. Lakshmanan, V., Safris, F., and Subramaniant, I. 1996. Schemasql: A language for intereoperability in relational multi-database systems. In Proceedings of the 22th International Conference on Very Large Data Bases (VLDB’96). Morgan Kaufmann, 239--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lau, T. 2001. Programming by demonstration: A machine learning approach. Ph.D. thesis, University of Washington. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Lau, T., Bergman, L., Castelli, V., and Oblinger, D. 2004. Sheepdog: Learning procedures for technical support. In Proceedings of the 9th International Conference on Intelligent User Interfaces (IUI’04). ACM, 109--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lerman, K., Getoor, L., Minton, S., and Knoblock, C. 2004. Using the structure of Web sites for automatic segmentation of tables. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’04). ACM, New York, 119--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Li, W.-S., Clifton, C., and Liu, S.-Y. 2000. Database integration using neural networks: Implementation and experiences. Knowl. Inf. Syst. 2, 1, 73--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lieberman, H. 2001. Your Wish is My Command: Programming by Example. Morgan Kaufmann Publishers, San Francisco, CA.Google ScholarGoogle Scholar
  37. Michelson, M. and Knoblock, C. A. 2007a. An automatic approach to semantic annotation of unstructured, ungrammatical sources: A first look. In Proceedings of the International Joint Conferences on Artificial Intelligence Workshop on Analytics for Noisy Unstructured Text. 123--130.Google ScholarGoogle Scholar
  38. Michelson, M. and Knoblock, C. A. 2007b. Unsupervised information extraction from unstructured, ungrammatical data sources on the world wide web. Int. J. Document Anal. Recogn. (Special Issue on Noisy Text Analytics), 10, 3, 211--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Milo, T. and Zohar, S. 1998. Using schema matching to simplify heterogeneous data translation. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB’98). Morgan Kaufmann Publishers, San Francisco, CA, 122--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Muslea, I., Minton, S. N., and Knoblock, C. A. 2003. Active learning with strong and weak views: A case study on wrapper induction. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03). Morgan Kaufmann Publishers, San Francisco, CA, 415--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Perkowitz, M. and Etzioni, O. 1995. Category translation: Learning to understand information on the internet. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI’95). Morgan Kaufmann Publishers, San Francisco, CA, 930--936. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Raghavan, S. and Garcia-Molina, H. 2001. Crawling the hidden Web. In Proceedings of the 27th International Conference on Very Large Data Base (VLDB’01). Morgan Kaufmann Publishers, San Francisco, CA, 129--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Rahm, E. and Bernstein, P. A. 2001. A survey of approaches to automatic schema matching. The VLDB J. 10, 4, 334--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Raman, V. and Hellerstein, J. M. 2001. Potter’s wheel: An interactive data cleaning system. In Proceedings of the 27th International Conference on Very Large Data Base (VLDB’01). Morgan Kaufmann Publishers, 381--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Reeve, L. and Han, H. 2005. Survey of semantic annotation platforms. In Proceedings of the ACM Symposium on Applied Computing (SAC’05). ACM, New York, 1634--1638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Riabov, A. V., Bouillet, E., Feblowitz, M. D., Lui, Z., and Ranganatham, A. 2008. Wishful search: Interactive composition of data mashups. In Proceeding of the 17th International Conference on World Wide Web (WWW’08). ACM, New York, 775--784. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Segre, A., Elkan, C., and Russell, A. 1991. A critical look at experimental evaluations of EBL. Mach. Learn. 6, 2, 183--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Sugiura, A. and Koseki, Y. 1998. Internet scrapbook: Automating Web browsing tasks by demonstration. In Proceedings of the 11th Annual ACM Symposium on User Interface Software and Technology (UIST’98). ACM, 9--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Sutherland, W. R. 1966. The on-line graphical specification of computer procedures. Ph.D. thesis, Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  50. Tallis, M., Kim, J., and Gil, Y. 2001. User studies of knowledge acquisition tools: Methodology and lessons learned. J. Exper. Theor. Artif. Intell. 13, 4, 359--378.Google ScholarGoogle ScholarCross RefCross Ref
  51. Tuchinda, R. 2008. Building mashups by example. Ph.D. thesis, University of Southern California. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Tuchinda, R. and Knoblock, C. A. 2004. Agent wizard: Building information agents by answering questions. In Proceedings of the 9th International Conference on Intelligent User Interfaces (IUI’04). ACM, 340--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Tuchinda, R., Szekely, P., and Knoblock, C. A. 2007. Building data integration queries by demonstration. In Proceedings of the 12th International Conference on Intelligent User Interfaces (IUI’07). ACM, 170--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Tuchinda, R., Szekely, P., and Knoblock, C. A. 2008. Building mashups by example. In Proceedings of the 13th International Conference on Intelligent User Interfaces (IUI’08). ACM, New York, 139--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Wong, J. and Hong, J. I. 2007. Making mashups with marmite: Towards end-user programming for the web. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’07). ACM, New York, 1435--1444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Woolson, R. and Lachencruch, P. 1980. Rank tests for censored matched pairs. Biometrika 67, 3, 597--606.Google ScholarGoogle ScholarCross RefCross Ref
  57. Xu, L. and Embley, D. 2003. Using domain ontologies to discover direct and indirect matches for schema elements. In Proceedings of the 2nd International Semantic Integration Workshop (ISWC’03). 105--110.Google ScholarGoogle Scholar
  58. Yang, F., Gupta, N., Botev, C., Churchill, E. F., Levchenko, G., and Shanmugasundaram, J. 2008. Wysiwyg development of data driven web applications. Proc. Very Large Data Bases Endowm. 1, 1, 163--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Zloof, M. M. 1975. Query-by-Example: The invocation and definition of tables and forms. In Proceedings of the 1st International Conference on Very Large Data Bases (VLDB’75). ACM, 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Building Mashups by Demonstration

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 5, Issue 3
        July 2011
        177 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/1993053
        Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 July 2011
        • Accepted: 1 December 2010
        • Revised: 1 October 2010
        • Received: 1 October 2008
        Published in tweb Volume 5, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader