Skip to main content
Log in

Generational analysis of variety in data structures: impact on automatic data integration and on the semantic web

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

We examine data definition languages (DDLs) from various computing era spanning almost 50 years to date. We prove that contemporary DDLs are indistinguishable from older ones using Zipf distribution of words, Zipf distributions of meanings, and information theory. None addresses the Law of Requisite Variety, which is necessary for enabling automatic data integration from autonomous heterogeneous data sources and for the realization of the Semantic Web. The growth of the entire computing industry is hampered by the lack of progress in the development of DDLs suitable for these two goals. Our findings set the stage for the future development of a mathematically sound DDL better suited for the aforementioned purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Ashby RW (1956) An introduction to cybernetics. Chapman & Hall, London

    MATH  Google Scholar 

  2. Bechhofer S, van Harmelen F et al (2004) Owl web ontology language reference. W3C Recommendations. http://www.w3.org/TR/owl-ref/. Accessed 7 July 2005

  3. Berners-Lee T, Hendler J et al (2001) The semantic web. Sci Am May:34–43

  4. Blasgen MW, Astrahan MM et al (1981) System R: an architectural overview. IBM Syst J 20(1): 41–62

    Article  MathSciNet  Google Scholar 

  5. Brickley D (1979) Visicalc information: history and commentary from the guys who created it. http://www.bricklin.com/visicalc.htm. Accessed 7 July 2005

  6. Brickley D, Guha RV (2000) Resource Description Framework (Rdf) Schema Specification 1.0, 2000. http://www.w3.org/TR/rdf-schema/. Accessed 5 June 2002

  7. Bricklin D, Kapor M et al (2003) The origins and impact of Visicalc Mountain View, CA, The Computer History Museum and Microsoft Corporation: Lecture given at the Computer History Museum

  8. Casti JL (1985) Canonical models and the law of requisite variety. J Optim Theory Appl 46(4): 455–459

    Article  MATH  MathSciNet  Google Scholar 

  9. Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13(6): 377–387

    Article  MATH  Google Scholar 

  10. Cohen WW (2000) Data integration using similarity joins and a word based information representation language. ACM Trans Inform Syst 18(3): 288–321

    Article  Google Scholar 

  11. Duschka OM, Genesereth MR (1997) Query planning in infomaster. In: The twelfth annual ACM symposium on applied computing (SAC97), San Jose, CA. ACM, New York

  12. Goldfarb CF (1973) Design considerations for integrated text processing systems. IBM Cambridge Scientific Center Technical Report G320-2094

  13. Greaves M (2004) 2004 Daml Program Directions. http://www.daml.org/listarchive/daml-all/0301.html. Accessed 27 October 2005

  14. Groppe S, Groppe J et al (2009) Optimizing the execution of Xslt stylesheets for querying transformed Xml data. Knowl Inform Syst 18(3): 331–391. doi:10.1007/s10115-008-0144-4

    Article  Google Scholar 

  15. Gu H, Perl Y et al (2004) Contextual partitioning for comprehension of Oodb schemas. Knowl Inform Syst 6(3) (issn Print 0219-1377, Online 0219-3116). doi:10.1007/s10115-003-0102-0

  16. Hakimpour F, Geppert A (2001) Ontologies: an approach to resolve semantic heterogeneity in databases. Databases. http://www.ifi.uzh.ch/arvo/dbtg/Projects/MIGI/publication/ontorep.pdf

  17. Hakimpour F, Geppert A (2005) Resolution of semantic heterogeneity in database schema integration using formal ontologies. Inf Tech Manag 6(1): 97–122 (issn 1385-951X)

    Article  Google Scholar 

  18. Hammer J, McLeod D (1993) An approach to resolving semantic heterogeneity in a federation of autonomous, heterogeneous database systems. Int J Cooperative Inform Syst (IJCIS) 2(1): 51–83

    Article  Google Scholar 

  19. Höpken W (2005) Harmonise ontology. ECCA—Etourism Competence Center Austria, Innsbruck, Austria (email with the Harmonise Ontology attachment and meta-data attachement)

  20. Horrocks I, Patel-Schneider PF et al (2003) From Shiq and Rdf to Owl: the making of a web ontology language. J Web Semant 1(1): 7–26

    Google Scholar 

  21. Hunter A, Liu W (2005) Merging uncertain information with semantic heterogeneity in Xml. Knowl Inform Syst 9(2): 230–258. doi:10.1007/s10115-005-0220-y (issn Print 0219-1377 Online 0219-3116)

    Article  Google Scholar 

  22. Hyvönen E, Viljanen K et al (2009) Building a national semantic web ontology and ontology service infrastructure—the Finnonto. The semantic web: research and applications. Springer, Berlin (isbn 0302-9743)

  23. IBM (2000) Enterprise Cobol for Z/Os Language Reference Manual # Gc27-1411-03. IBM, Armonk

  24. Knox RE (2004) Hype Cycle for Xml Technologies for 2004, p. 25. Gartner Group, Stamford

  25. Knox RE, Abrams C (2003) Hype cycle for Xml technologies for 2003, p. 25. Gartner Group, Stamford

  26. Knox RE, Abrams C et al (2006) Hype cycle for Xml technologies, p. 46. Gartner Group, Stamford

  27. Lassila O, Swick RR (1999) Resource description framework (Rdf) model and syntax specification. W3C Recommendation

  28. Lee J, Malone T (1990) Partially shared views a scheme for communicating among groups that use different type hierarchies. ACM Trans Inform Syst 8(1): 1–26

    Article  Google Scholar 

  29. Levene M, Borges J et al (2001) Zipf’s law for web surfers. Knowl Inform Syst 3(1): 120–129. doi:10.1007/PL00011657 (issn 0219-1377)

    Article  MATH  Google Scholar 

  30. Linhalis F, Pontin de Mattos Fortes R et al (2009) Ontomap: an ontology-based architecture to perform the semantic mapping between an interlingua and software components. Knowl Inform Syst. doi:10.1007/s10115-009-0197-z

  31. Lukasiewicz T, Straccia U (2008) Managing uncertainty and vagueness in description logics for the semantic web. Web Semant Sci Serv Agents World Wide Web 6(4): 291–308. doi:10.1016/j.websem.2008.04.001

    Article  Google Scholar 

  32. Markus ML, Steinfield CW et al (2003) The evolution of vertical is standards: electronic interchange standards in the US home mortgage industry. MIS Quarterly (Special Issue). University of Frankfort, Frankfort

  33. MDLI (2005) Edi in Minnesota. http://www.doli.state.mn.us/edi_2.html. Accessed 5 July 2005

  34. Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11): 39–41

    Article  Google Scholar 

  35. Sanderson M, van Rijsbergen C (1999) The impact on retrieval effectiveness of skewed frequency distributions. ACM Trans Inform Syst 17(4): 440–465

    Article  Google Scholar 

  36. Sornette D, Knopoff L et al (1995) Rank-ordering statistics of extreme events: application to the distribution of large earthquakes. http://arxiv.org. Accessed 20 March 2006

  37. Stanley M, Buldyrev S et al (1995) Zipf plots and the size distribution of firms. Econ Lett 49(4): 453–457

    Article  MATH  Google Scholar 

  38. Thornbjorn K (2001) Zipf’s law for cities and beyond: the case of Denmark. Am J Econ Sociol 60: 123–146

    Article  Google Scholar 

  39. Unitt M, Jones IC (1999) Edi—the grand daddy of electronic commerce. BT Technol J 17(3): 17–23

    Article  Google Scholar 

  40. W3C (2001, 01 May 2008) W3c Semantic Web Activity. http://www.w3.org/2001/sw/. Accessed 15 May 2008

  41. Walmsley J (1992) The foreign exchange and money markets guide. Wiley, New York (isbn 0471531049)

    Google Scholar 

  42. WEBONT (2001) W3c Daml+Oil Project. http://www.w3.org/2001/sw/WebOnt/. Accessed 3 March 2003

  43. Williams AB, Padmanabhan A et al (2005) Experimentation with local consensus ontologies with implications for automated service composition. IEEE Trans Knowl Data Eng 17(7): 961–981

    Article  Google Scholar 

  44. WORDNET (2005) Wordnet Website. http://www.cogsci.princeton.edu/cgi-bin/webwn1.7.1

  45. Yan PW, Larson P (1994) Data reduction through early grouping. IBM Press, Toronto

    Google Scholar 

  46. Youyong Z (2005) Umbc travel ontology. http://taga.umbc.edu/ontologies/travel.owl. Accessed 19 May 2005

  47. Zadeh LA, Desoer CA (1963) Linear system theory; the state space approach. McGraw-Hill, New York

    MATH  Google Scholar 

  48. Zipf GK (1949) Human behavior and the principle of least effort: an introduction to human ecology. Reading, Addison-Wesley

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eli Rohn.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rohn, E. Generational analysis of variety in data structures: impact on automatic data integration and on the semantic web. Knowl Inf Syst 24, 283–304 (2010). https://doi.org/10.1007/s10115-009-0246-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0246-7

Keywords

Navigation