Abstract
The latest generation of WWW tools and services enables Web users to generate applications that combine content from multiple sources. This type of Web application is referred to as a mashup. Many of the tools for constructing mashups rely on a widget paradigm, where users must select, customize, and connect widgets to build the desired application. While this approach does not require programming, the users must still understand programming concepts to successfully create a mashup. As a result, they are put off by the time, effort, and expertise needed to build a mashup. In this article, we describe our programming-by-demonstration approach to building mashup by example. Instead of requiring a user to select and customize a set of widgets, the user simply demonstrates the integration task by example. Our approach addresses the problems of extracting data from Web sources, cleaning and modeling the extracted data, and integrating the data across sources. We implemented these ideas in a system called Karma, and evaluated Karma on a set of 23 users. The results show that, compared to other mashup construction tools, Karma allows more of the users to successfully build mashups and makes it possible to build these mashups significantly faster compared to using a widget-based approach.
Supplemental Material
Available for Download
The proof is given in an electronic appendix, available online in the ACM Digital Library.
- Abiteboul, S., Cluet, S., Milo, T., Mogilevsky, P., Simeon, J., and Zohar, S. 1999. Tools for data translation and integration. IEEE Data Engin. Bull. 22, 1, 3--8.Google Scholar
- Allen, J., Chambers, N., Ferguson, G., Galescu, L., Jung, H., Swift, M., and Taysom, W. 2007. PLOW: A collaborative task learning agent. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI’07). AAAI Press, 1514--1519. Google ScholarDigital Library
- Bergamaschi, S., Castano, S., Vincini, M., and Beneventano, D. 2001. Semantic integration of heterogeneous information sources. Data Knowl. Engin. 36, 3, 215--249. Google ScholarDigital Library
- Burnett, M. M. and Baker, M. J. 1994. Classification system for visual programming languages. J. Vis. Lang. Comput. 5, 3, 287--300.Google ScholarCross Ref
- Chaudhuri, S. and Dayal, U. 1997. An overview of data warehousing and OLAP technology. SIGMOD Rec. 26, 1, 65--74. Google ScholarDigital Library
- Chen, W., Kifer, M., and Warren, D. S. 1993. HILOG: A foundation for higher-order logic programming. J. Logic Program. 15, 3, 187--230. Google ScholarDigital Library
- Cohen, W. W., Hurst, M., and Jensen, L. S. 2002. A flexible learning system for wrapping tables and lists in html documents. In Proceedings of the 11th International Conference on World Wide Web (WWW’02). ACM, New York, 232--241. Google ScholarDigital Library
- Cohen, W. W., Ravikumar, P., and Fienberg, S. E. 2003. A comparison of string distance metrics for name-matching tasks. In Proceedings of the International Joint Conferences on Artificial Intelligence Workshop on Information Integration. 73--78.Google Scholar
- Crescenzi, V. and Mecca, G. 2004. Automatic information extraction from large websites. J. ACM 51, 5, 731--779. Google ScholarDigital Library
- Cui, Y. 2001. Lineage tracing in data warehouses. Ph.D. thesis, Stanford University. Google ScholarDigital Library
- Cunningham, H., Maynard, D., Bontcheva, K., and Tablan, V. 2002. GATE: An architecture for development of robust HLT applications. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, 168--175. Google ScholarDigital Library
- Cypher, A., Halbert, D. C., Kurlander, D., Lieberman, H., Maulsby, D., Myers, B. A., and Turransky, A., Eds. 1993. Watch What I Do: Programming by Demonstration. MIT Press, Cambridge, MA. Google ScholarDigital Library
- DeRose, P., Chai, X., Gao, B. J., Shen, W., Doan, A., Bohannon, P., and Zhu, X. 2008. Building community Wikipedias: A machine-human partnership approach. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE’08). IEEE Computer Society, Washington, DC, 646--655. Google ScholarDigital Library
- Dhamankar, R., Lee, Y., Doan, A., Halevy, A., and Domingos, P. 2004. iMAP: Discovering complex semantic matches between database schemas. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’04). ACM, 383--394. Google ScholarDigital Library
- Doan, A., Domingos, P., and Levy, A. 2000. Learning source descriptions for data integration. In Proceedings of the International Workshop on The Web and Databases (WebDB). Springer, 60--71.Google Scholar
- Dontcheva, M., Drucker, S. M., Salesin, D., and Cohen, M. F. 2007. Relations, cards, and search templates: User-guided web data integration and layout. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (UIST’07). ACM, 61--70. Google ScholarDigital Library
- Ennals, R. and Gay, D. 2007. User-Friendly functional programming for Web mashups. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP’07). ACM, 223--234. Google ScholarDigital Library
- Etzioni, O. and Etzioni, R. 1994. Statistical methods for analyzing speedup learning experiments. Mach. Learn. 14, 3, 333--347. Google ScholarDigital Library
- Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., and Pollak, B. 2007. Towards domain-independent information extraction from Web tables. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). ACM, New York, 71--80. Google ScholarDigital Library
- Gibson, A., Gamble, M., Wolstencroft, K., Oinn, T., and Goble, C. 2007. The data playground: An intuitive workflow specification environment. In Proceedings of the 3rd IEEE International Conference on e-Science and Grid Computing (E-SCIENCE’07). IEEE Computer Society, 59--68. Google ScholarDigital Library
- Gonzalez, H., Halevy, A. Y., Jensen, C. S., Langen, A., Madhavan, J., Shapley, R., and Shen, W. 2010a. Google fusion tables: Data management, integration and collaboration in the cloud. In Proceedings of the 1st Symposium on Cloud Computing, Industrial Track. 175--180. Google ScholarDigital Library
- Gonzalez, H., Halevy, A. Y., Jensen, C. S., Langen, A., Madhavan, J., Shapley, R., Shen, W., and Goldberg-Kidon, J. 2010b. Google fusion tables: Web-Centered data management and collaboration. In Proceedings of SIGMOD, Industrial Track. 1061--1066. Google ScholarDigital Library
- Halevy, A., Rajaraman, A., and Ordille, J. 2006. Data integration: The teenage years. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB’06). VLDB Endowment, 9--16. Google ScholarDigital Library
- Hartmann, B., Wu, L., Collins, K., and Klemmer, S. R. 2007. Programming by a sample: Rapidly creating Web applications with d.mix. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (UIST’07). ACM, 241--250. Google ScholarDigital Library
- Hills, M. and Armitage, P. 1979. The two-period cross-over clinical trial. Brit. J. Clin. Pharmacol. 8, 7--20.Google ScholarCross Ref
- Huynh, D., Mazzocchi, S., and Karger, D. 2007. Piggy bank: Experience the semantic Web inside your Web browser. Web Semant. 5, 1, 16--27. Google ScholarDigital Library
- Huynh, D. F., Miller, R. C., and Karger, D. R. 2008. Potluck: Data mash-up tool for casual users. Web Semant. 6, 4, 274--282. Google ScholarDigital Library
- Ives, Z. G., Knoblock, C. A., Minton, S., Jacob, M., Talukdar, P. P., Tuchinda, R., Ambite, J. L., Muslea, M., and Gazen, C. 2009. Interactive data integration through smart copy & paste. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR’’09) (Online Proceedings).Google Scholar
- Koudas, N., Marathe, A., and Srivastava, D. 2005. Spider: Flexible matching in databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’05). ACM, New York, 876--878. Google ScholarDigital Library
- Kowalczykowski, K., Ong, K. W., Zhao, K. K., Deutsch, A., Papakonstantinou, Y., and Petropoulos, M. 2009. Do-It-Yourself custom forms-driven workflow applications. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR’09) (Online Proceedings).Google Scholar
- Lakshmanan, V., Safris, F., and Subramaniant, I. 1996. Schemasql: A language for intereoperability in relational multi-database systems. In Proceedings of the 22th International Conference on Very Large Data Bases (VLDB’96). Morgan Kaufmann, 239--250. Google ScholarDigital Library
- Lau, T. 2001. Programming by demonstration: A machine learning approach. Ph.D. thesis, University of Washington. Google ScholarDigital Library
- Lau, T., Bergman, L., Castelli, V., and Oblinger, D. 2004. Sheepdog: Learning procedures for technical support. In Proceedings of the 9th International Conference on Intelligent User Interfaces (IUI’04). ACM, 109--116. Google ScholarDigital Library
- Lerman, K., Getoor, L., Minton, S., and Knoblock, C. 2004. Using the structure of Web sites for automatic segmentation of tables. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’04). ACM, New York, 119--130. Google ScholarDigital Library
- Li, W.-S., Clifton, C., and Liu, S.-Y. 2000. Database integration using neural networks: Implementation and experiences. Knowl. Inf. Syst. 2, 1, 73--96. Google ScholarDigital Library
- Lieberman, H. 2001. Your Wish is My Command: Programming by Example. Morgan Kaufmann Publishers, San Francisco, CA.Google Scholar
- Michelson, M. and Knoblock, C. A. 2007a. An automatic approach to semantic annotation of unstructured, ungrammatical sources: A first look. In Proceedings of the International Joint Conferences on Artificial Intelligence Workshop on Analytics for Noisy Unstructured Text. 123--130.Google Scholar
- Michelson, M. and Knoblock, C. A. 2007b. Unsupervised information extraction from unstructured, ungrammatical data sources on the world wide web. Int. J. Document Anal. Recogn. (Special Issue on Noisy Text Analytics), 10, 3, 211--226. Google ScholarDigital Library
- Milo, T. and Zohar, S. 1998. Using schema matching to simplify heterogeneous data translation. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB’98). Morgan Kaufmann Publishers, San Francisco, CA, 122--133. Google ScholarDigital Library
- Muslea, I., Minton, S. N., and Knoblock, C. A. 2003. Active learning with strong and weak views: A case study on wrapper induction. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03). Morgan Kaufmann Publishers, San Francisco, CA, 415--420. Google ScholarDigital Library
- Perkowitz, M. and Etzioni, O. 1995. Category translation: Learning to understand information on the internet. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI’95). Morgan Kaufmann Publishers, San Francisco, CA, 930--936. Google ScholarDigital Library
- Raghavan, S. and Garcia-Molina, H. 2001. Crawling the hidden Web. In Proceedings of the 27th International Conference on Very Large Data Base (VLDB’01). Morgan Kaufmann Publishers, San Francisco, CA, 129--138. Google ScholarDigital Library
- Rahm, E. and Bernstein, P. A. 2001. A survey of approaches to automatic schema matching. The VLDB J. 10, 4, 334--350. Google ScholarDigital Library
- Raman, V. and Hellerstein, J. M. 2001. Potter’s wheel: An interactive data cleaning system. In Proceedings of the 27th International Conference on Very Large Data Base (VLDB’01). Morgan Kaufmann Publishers, 381--390. Google ScholarDigital Library
- Reeve, L. and Han, H. 2005. Survey of semantic annotation platforms. In Proceedings of the ACM Symposium on Applied Computing (SAC’05). ACM, New York, 1634--1638. Google ScholarDigital Library
- Riabov, A. V., Bouillet, E., Feblowitz, M. D., Lui, Z., and Ranganatham, A. 2008. Wishful search: Interactive composition of data mashups. In Proceeding of the 17th International Conference on World Wide Web (WWW’08). ACM, New York, 775--784. Google ScholarDigital Library
- Segre, A., Elkan, C., and Russell, A. 1991. A critical look at experimental evaluations of EBL. Mach. Learn. 6, 2, 183--195. Google ScholarDigital Library
- Sugiura, A. and Koseki, Y. 1998. Internet scrapbook: Automating Web browsing tasks by demonstration. In Proceedings of the 11th Annual ACM Symposium on User Interface Software and Technology (UIST’98). ACM, 9--18. Google ScholarDigital Library
- Sutherland, W. R. 1966. The on-line graphical specification of computer procedures. Ph.D. thesis, Massachusetts Institute of Technology.Google Scholar
- Tallis, M., Kim, J., and Gil, Y. 2001. User studies of knowledge acquisition tools: Methodology and lessons learned. J. Exper. Theor. Artif. Intell. 13, 4, 359--378.Google ScholarCross Ref
- Tuchinda, R. 2008. Building mashups by example. Ph.D. thesis, University of Southern California. Google ScholarDigital Library
- Tuchinda, R. and Knoblock, C. A. 2004. Agent wizard: Building information agents by answering questions. In Proceedings of the 9th International Conference on Intelligent User Interfaces (IUI’04). ACM, 340--342. Google ScholarDigital Library
- Tuchinda, R., Szekely, P., and Knoblock, C. A. 2007. Building data integration queries by demonstration. In Proceedings of the 12th International Conference on Intelligent User Interfaces (IUI’07). ACM, 170--179. Google ScholarDigital Library
- Tuchinda, R., Szekely, P., and Knoblock, C. A. 2008. Building mashups by example. In Proceedings of the 13th International Conference on Intelligent User Interfaces (IUI’08). ACM, New York, 139--148. Google ScholarDigital Library
- Wong, J. and Hong, J. I. 2007. Making mashups with marmite: Towards end-user programming for the web. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’07). ACM, New York, 1435--1444. Google ScholarDigital Library
- Woolson, R. and Lachencruch, P. 1980. Rank tests for censored matched pairs. Biometrika 67, 3, 597--606.Google ScholarCross Ref
- Xu, L. and Embley, D. 2003. Using domain ontologies to discover direct and indirect matches for schema elements. In Proceedings of the 2nd International Semantic Integration Workshop (ISWC’03). 105--110.Google Scholar
- Yang, F., Gupta, N., Botev, C., Churchill, E. F., Levchenko, G., and Shanmugasundaram, J. 2008. Wysiwyg development of data driven web applications. Proc. Very Large Data Bases Endowm. 1, 1, 163--175. Google ScholarDigital Library
- Zloof, M. M. 1975. Query-by-Example: The invocation and definition of tables and forms. In Proceedings of the 1st International Conference on Very Large Data Bases (VLDB’75). ACM, 1--24. Google ScholarDigital Library
Index Terms
- Building Mashups by Demonstration
Recommendations
End-user programming of mashups with vegemite
IUI '09: Proceedings of the 14th international conference on Intelligent user interfacesMashups are an increasingly popular way to integrate data from multiple web sites to fit a particular need, but it often requires substantial technical expertise to create them. To lower the barrier for creating mashups, we have extended the CoScripter ...
Building Mashups by example
IUI '08: Proceedings of the 13th international conference on Intelligent user interfacesCreating a Mashup, a web application that integrates data from multiple web sources to provide a unique service, involves solving multiple problems, such as extracting data from multiple web sources, cleaning it, and combining it together. Existing work ...
End-User Development of Mashups with NaturalMash
Context: The emergence of the long-tail in the market of software applications is shifting the role of end-users from mere consumers to becoming developers of applications addressing their unique, personal, and transient needs. On the Web, a popular ...
Comments