Abstract
Schema integration has been a long-standing challenge for the data-engineering community that has received steady attention over the past three decades. General-purpose integration approaches construct unified schemas that encompass all schema elements. Schema integration has been revisited in the past decade in service-oriented computing since the input/output data-types of service interfaces are heterogeneous XML schemas. However, service integration differs from the traditional integration problem, since it should generalize schemas (mining abstract data-types) instead of unifying all schema elements. To mine well-formed abstract data-types, the fundamental Liskov Substitution Principle (LSP), which generally holds between abstract data-types and their subtypes, should be followed. However, due to the heterogeneity of service data-types, the strict employment of LSP is not usually feasible. On top of that, XML offers a rich type system, based on which data-types are defined via combining type patterns (e.g., composition, aggregation). The existing integration approaches have not dealt with the challenges of a defining subtyping relation between XML type patterns. To address these challenges, we propose a relaxed version of LSP between XML type patterns and an automated generalization process for mining abstract XML data-types. We evaluate the effectiveness and the efficiency of the process on the schemas of two datasets against two representative state-of-the-art approaches.
- A. Doan and A. Y. Halevy. 2005. Semantic integration research in the database community: A brief survey. AI Magazine 26, 1 (2005), 83--94. Google ScholarDigital Library
- Carlo Batini, Maurizio Lenzerini, and Shamkant B. Navathe. 1986. A comparative analysis of methodologies for database schema integratison. ACM Computings Surveys 18, 4 (1986), 323--364. Google ScholarDigital Library
- R. Pottinger and P. A. Bernstein. 2003. Merging models based on given correspondences. In Proceedings of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers, Berlin, 826--873. Google ScholarDigital Library
- T. Erl. 2005. Service-Oriented Architecture: Concepts, Technology, and Design. Prentice Hall. Google ScholarDigital Library
- D. Athanasopoulos, A. Zarras, P. Vassiliadis, and V. Issarny. 2011. Mining service abstractions. In Proceedings of the International Conference on Software Engineering. IEEE, HI, Hawaii, 944--947. Google ScholarDigital Library
- X. Liu and H. Liu. 2012. Automatic abstract service generation from web service communities. In Proceedings of the International Conference on Web Services. IEEE, HI, Hawaii, 154--161. Google ScholarDigital Library
- B. Liskov and J. M. Wing. 1994. A behavioural notion of subtyping. ACM Transactions on Programming Languages and Systems 16, 6 (1994), 1811--1841. Google ScholarDigital Library
- Erhard Rahm, Hong Hai Do, and Sabine Massmann. 2004. Matching large XML schemas. SIGMOD Record 33, 4 (2004). ACM, 26--31. Google ScholarDigital Library
- K. Saleem, Z. Bellahsene, and E. Hunt. 2008. PORSCHE: Performance ORiented SCHEma mediation. Information Systems 33, 7--8 (2008). Elsevier, 637--657. Google ScholarDigital Library
- A. Y. Halevy, A. Rajaraman, and J. J. Ordille. 2006. Data integration: The teenage years. In Proceedings of the International Conference on Very Large Data Bases. ACM, Seoul, 9--16. Google ScholarDigital Library
- R. Pottinger and P. A. Bernstein. 2008. Schema merging and mapping creation for relational sources. In Proceedings of the International Conference on Extending Database Technology: Advances in Database Technology. ACM, Nantes, 73--84. Google ScholarDigital Library
- C. Parent and S. Spaccapietra. 1998. Issues and approaches of database integration. Communications of the ACM 41, 5 (1998), 166--178. Google ScholarDigital Library
- Xiang Li. 2012. Constraint-Driven Schema Merging. Ph.D. Dissertation. RWTH Aachen University.Google Scholar
- A. Baqasah, E. Pardede, and J. W. Rahayu. 2014. A new approach for meaningful XML schema merging. In Proceedings of the International Conference on Information Integration and Web-based Applications 8 Services. ACM, Hanoi, 430--439. Google ScholarDigital Library
- H. Ma, K.-D. Schewe, B. Thalheim, and J. Zhao. 2005. View integration and cooperation in databases, data warehouses and web information systems. Journal on Data Semantics. Springer, 213--249. Google ScholarDigital Library
- V. Kashyap and A. P. Sheth. 1996. Semantic and schematic similarities between database objects: A context-based approach. The VLDB Journal 5, 4 (1996). Springer, 276--304. Google ScholarDigital Library
- X. Li and C. Quix. 2011. Merging relational views: A minimization approach. In Proceedings of the International Conference on Conceptual Modeling. Springer, Brussels, 379--392. Google ScholarDigital Library
- M. Arenas, J. Pérez, J. L. Reutter, and C. Riveros. 2010. Foundations of schema mapping management. In Proceedings of the ACM Symposium on Principles of Database Systems. ACM, Indianapolis, Indiana, 227--238. Google ScholarDigital Library
- P. A. Bernstein, S. Melnik, M. Petropoulos, and C. Quix. 2004. Industrial-strength schema matching. ACM SIGMOD Record 33, 4 (2004), 38--43. Google ScholarDigital Library
- A. Radwan, L. Popa, I. R. Stanoi, and A. Younis. 2009. Top-k generation of integrated schemas based on directed and weighted correspondences. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, Providence, Rhode Island, 641--654. Google ScholarDigital Library
- A. D. Sarma, X. Dong, and A. Halevy. 2008. Bootstrapping pay-as-you-go data integration systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, Vancouver, 861--874. Google ScholarDigital Library
- S. Melnik, E. Rahm, and P. A. Bernstein. 2003. Rondo: A programming platform for generic model management. In Proceedings of the ACM SIGMOD International conference on Management of Data. ACM, San Diego, California, 193--204. Google ScholarDigital Library
- Aída Jiménez, Fernando Berzal, and Juan Carlos Cubero Talavera. 2010. Frequent tree pattern mining: A survey. Intelligent Data Analysis 14, 6 (2010). IOS Press, 603--622. Google ScholarDigital Library
- M. J. Zaki. 2005. Efficiently mining frequent embedded unordered trees. Fundamenta Informaticae 66, 1--2 (2005). IOS Press, 33--52. Google ScholarDigital Library
- Y. Chi, R. R. Muntz, S. Nijssen, and J. N. Kok. 2005. Frequent subtree mining -- An overview. Fundamenta Informaticae 66, 1--2 (2005). IOS Press, 161--198. Google ScholarDigital Library
- M. J. Zaki. 2005. Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Transactions on Knowledge and Data Engineering 17, 8 (2005), 1021--1035. Google ScholarDigital Library
- J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. 2004. Mining sequential patterns by pattern-growth: The PrefixSpan approach. IEEE Transactions on Knowledge and Data Engineering 16, 11 (2004), 1424--1440. Google ScholarDigital Library
- X. Yan, J. Han, and R. Afshar. 2003. CloSpan: Mining closed sequential patterns in large databases. In Proceedings of the SIAM International Conference on Data Mining. SIAM, San Francisco, 166--177.Google Scholar
- C. Wang, M. Hong, J. Pei, H. Zhou, W. Wang, and B. Shi. 2004. Efficient pattern-growth methods for frequent tree pattern mining. In Proceedings of the Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Springer, Sydney, 441--451.Google Scholar
- L. Zou, Y. Lu, H. Zhang, and R. Hu. 2006. PrefixTreeESpan: A pattern growth algorithm for mining embedded subtrees. In Proceedings of the International Conference on Web Information Systems Engineering. Springer, Wuhan, 499--505. Google ScholarDigital Library
- J. I. Chowdhury and R. Nayak. 2014. BEST: An Efficient Algorithm for Mining Frequent Unordered Embedded Subtrees. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence. Springer, Gold Coast, 459--471.Google Scholar
- E. Rahm and P. A. Bernstein. 2001. A survey of approaches to automatic schema matching. VLDB Journal 10, 4 (2001). Springer, 334--350. Google ScholarDigital Library
- Z. Bellahsene, A. Bonifati, and E. Rahm (Eds.). 2011. Schema Matching and Mapping. Springer. Google ScholarDigital Library
- P. Shvaiko and J. Euzenat. 2013. Ontology matching: State of the art and future challenges. IEEE Transactions on Knowledge and Data Engineering 25, 1 (2013), 158--176. Google ScholarDigital Library
- M. Hamdaqa and L. Tahvildari. 2014. Prison break: A generic schema matching solution to the cloud vendor lock-in problem. In Proceedings of the International Symposium on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems. IEEE, Victoria, British Columbia, 37--46. Google ScholarDigital Library
- F. Duchateau, Z. Bellahsene, and M. Roche. 2007. A context-based measure for discovering approximate semantic matching between schema elements. In Proceedings of the International Conference on Research Challenges in Information Science. IEEE, Ouarzazate, 9--20.Google Scholar
- F. Duchateau, Z. Bellahsene, M. Roantree, and M. Roche. 2007. Poster session: An indexing structure for automatic schema matching. In Proceedings of the IEEE International Conference on Data Engineering Workshop. IEEE, Istanbul, 485--491. Google ScholarDigital Library
- P. De Meo, G. Quattrone, G. Terracina, and D. Ursino. 2006. Integration of XML schemas at various “severity” levels. Information Systems 31, 6 (2006). Elsevier, 397--434. Google ScholarDigital Library
- F. Duchateau, Z. Bellahsene, and M. Roche. 2007. BMatch: A semantically context-based tool enhanced by an indexing structure to accelerate schema matching. In Journées Bases de Données Avancées. IEEE, Marseille, 1--20.Google Scholar
- W. Hu, Y. Qu, and G. Cheng. 2008. Matching large ontologies: A divide-and-conquer approach. Data 8 Knowledge Engineering 67, 1 (2008). Elsevier, 140--160. Google ScholarDigital Library
- H. H. Do and E. Rahm. 2002. COMA -- A system for flexible combination of schema matching approaches. In Proceedings of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers, Hong Kong, 610--621. Google ScholarDigital Library
- H. H. Do and E. Rahm. 2007. Matching large schemas: Approaches and evaluation. Information Systems 32, 6 (2007). Elsevier, 857--885. Google ScholarDigital Library
- J. Madhavan, P. A. Bernstein, and E. Rahm. 2001. Generic schema matching with CUPID. In Proceedings of the International Conference on Very Large Data Bases. Morgan Kaufmann Publishers, Roma, 49--58. Google ScholarDigital Library
- A. Algergawy, E. Schallehn, and G. Saake. 2009. Improving XML schema matching performance using Prüfer sequences. Data and Knowledge Engineering 68, 8 (2009). Elsevier, 728--747. Google ScholarDigital Library
- M. Lee, L. H. Yang, W. Hsu, and X. Yang. 2002. XClust: Clustering XML schemas for effective integration. In Proceedings of the ACM International Conference on Information and Knowledge Management. ACM, McLean, Virginia, 292--299. Google ScholarDigital Library
- F. Giunchiglia, P. Shvaiko, and M. Yatskevich. 2004. S-Match: An algorithm and an implementation of semantic matching. In Proceedings of the European Semantic Web Symposium. Springer, Heraklion, Crete, 61--75.Google Scholar
- R. Nayak and W. Iryadi. 2007. XML schema clustering with semantic and hierarchical similarity measures. Knowledge-Based Systems 20, 4 (2007). ACM, 336--349. Google ScholarDigital Library
- A. Algergawy, R. Nayak, and G. Saake. 2010. Element similarity measures in XML schema matching. Information Sciences 180, 24 (2010). Elsevier, 4975--4998. Google ScholarDigital Library
- J. Kim, Y. Peng, N. Ivezik, and J. Shin. 2011. An optimization approach for semantic-based XML schema matching. International Journal of Trade, Economics, and Finance 2, 1 (2011). IACSIT Press, 78--86.Google Scholar
- M. M. Meijer. 2008. On a method for XML schema matching. In Proceedings of the 8th Twente Student Conference on Information Technology. University of Twente, Twente, 1--10.Google Scholar
- I. F. Cruz, F. P. Antonelli, and C. Stroe. 2009. AgreementMaker: Efficient matching for large real-world schemas and ontologies. VLDB Endowment 2, 2 (2009). ACM, 1586--1589. Google ScholarDigital Library
- Y. R. Jean-Mary, E. P. Shironoshita, and M. R. Kabuka. 2009. Ontology matching with semantic verification. Web Semantics: Science, Services and Agents on the World Wide Web 7, 3 (2009). Elsevier, 235--251. Google ScholarDigital Library
- P. Lambrix and H. Tan. 2006. SAMBO -- A system for aligning and merging biomedical ontologies. Web Semantics: Science, Services and Agents on the World Wide Web 4, 3 (2006). Elsevier, 196--206. Google ScholarDigital Library
- K. Voigt. 2011. Structural Graph-Based Metamodel Matching. Ph.D. Dissertation. Technical University of Dresden, Department of Computer Science.Google Scholar
- C. H. Papadimitriou. 1994. Computational Complexity. Addison-Wesley.Google Scholar
- D. Aumueller, H. H. Do, S. Massmann, and E. Rahm. 2005. Schema and ontology matching with COMA++. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD. 906--908. Google ScholarDigital Library
- P. Bille. 2005. A survey on tree edit distance and related problems. Theoretical Computer Science 337, 1--3 (2005). Elsevier, 217--239. Google ScholarDigital Library
- S. Melnik, H. Garcia-Molina, and E. Rahm. 2002. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In Proceedings of the International Conference on Data Engineering. IEEE, San Jose, California, 117--128. Google ScholarDigital Library
- G. Valiente. 2002. Algorithms on Trees and Graphs. Springer. Google ScholarDigital Library
- T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. 2001. Introduction to Algorithms (2nd ed.). McGraw-Hill Higher Education. Google ScholarDigital Library
- T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Sakamoto, and S. Arikawa. 2002. Efficient substructure discovery from large semi-structured data. In Proceedings of the SIAM International Conference on Data Mining. SIAM, Maebashi City, 158--174.Google Scholar
- M. J. Zaki. 2002. Efficiently mining frequent trees in a forest. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, AB, 71--80. Google ScholarDigital Library
- P. Plebani and B. Pernici. 2009. URBE: Web service retrieval based on similarity evaluation. IEEE Transactions on Knowledge and Data Engineering 21, 11 (2009), 1629--1642. Google ScholarDigital Library
- E. Stroulia and Y. Wang. 2005. Structural and semantic matching for assessing web-service similarity. International Journal of Cooperative Information Systems (2005). World Scientific, 407--438.Google Scholar
- G. A. Miller. 1995. WordNet: A lexical database for english. ACM Communications 38, 11 (1995), 39--41. Google ScholarDigital Library
- T. Pedersen, S. Patwardhan, and J. Michelizzi. 2004. WordNet: : Similarity -- Measuring the relatedness of concepts. In Proceedings of the National Conference on Innovative Applications of Artificial Intelligence. AAAI Press, San Jose, California, 1024--1025. Google ScholarDigital Library
- R. Burkard, M. Dell’Amico, and S. Martello. 2009. Assignment Problems. Society for Industrial and Applied Mathematics, USA. SIAM. Google ScholarDigital Library
- A. V. Aho, J. E. Hopcroft, and J. Ullman. 1983. Data Structures and Algorithms. Addison-Wesley. Google ScholarDigital Library
- F. Duchateau and Z. Bellahsene. 2010. Measuring the Quality of an Integrated Schema. In Proceedings of the International Conference on Conceptual Modeling. Springer, Vancouver, BC, 261--273. Google ScholarDigital Library
- R. A. Baeza-Yates and B. A. Ribeiro-Neto. 1999. Modern Information Retrieval. ACM Press/Addison-Wesley. Google ScholarDigital Library
- D. Zhang and J. P. Tsai. 2007. Advances in Machine Learning Applications in Software Engineering. IGI Global, Hershey, PA, USA. Google ScholarDigital Library
Index Terms
- Mining Abstract XML Data-Types
Recommendations
XML data mining
With the spreading of XML sources, mining XML data can be an important objective in the near future. This paper presents a project focussed on designing a general-purpose query language in support of mining XML data. In our framework, raw data, mining ...
Polymorphic type inference and abstract data types
Many statically typed programming languages provide an abstract data type construct, such as the module in Modula-2. However, in most of these languages, implementations of abstract data types are not first-class values. Thus, they cannot be assigned to ...
Comments