skip to main content
10.1145/3365109.3368782acmconferencesArticle/Chapter ViewAbstractPublication PagesbdcatConference Proceedingsconference-collections
research-article

A Schema-First Formalism for Labeled Property Graph Databases: Enabling Structured Data Loading and Analytics

Authors Info & Claims
Published:02 December 2019Publication History

ABSTRACT

Graph databases provide better support for highly interconnected datasets than relational databases. However, labeled property graph databases, which have become increasingly popular, are schema-optional, making them prone to data corruption, especially when new users switch from relational databases to graph databases. In this work, we provide a schema-driven formalism for graph databases. This formalism enables schema-driven loading of graph databases from other sources, such as relational databases. Also, this formalism enables schema-driven data analytics that allows for a more structured analysis of data stored in graph databases. Such analytics are based on a boilerplate approach allowing users who are not experts in the use of graph database query languages to carry out analytics efficiently. We showcase the utility of the proposed formalism by considering a case study from Airbnb for illustrating schema-based loading procedures. The proposed schema-driven analytics process is illustrated using another case study from an industrial cyber-physical systems standard. Overall, the schema-driven formalism provides several useful features, such as preventing both data corruption and long-term degradation of graph database structures.

References

  1. 2011. Systems and software engineering -- Systems and software Quality Requirements and Evaluation (SQuaRE) -- System and software quality models.Google ScholarGoogle Scholar
  2. Inside Airbnb. 2019. Adding data to the debate. http://insideairbnb.com/get-thedata. html Accessed: 2019-09--10.Google ScholarGoogle Scholar
  3. Jacky Akoka, Isabelle Comyn-Wattiau, and Nicolas Prat. 2017. A four V's design approach of NoSQL graph databases. In International Conference on Conceptual Modeling. Springer, 58--68.Google ScholarGoogle ScholarCross RefCross Ref
  4. Güne Aluç, Olaf Hartig,MTamer Özsu, and Khuzaima Daudjee. 2014. Diversified stress testing of RDF data management systems. In International Semantic Web Conference. Springer, 197--212.Google ScholarGoogle Scholar
  5. Richard C Anderson, Rand J Spiro, and Mark C Anderson. 1978. Schemata as scaffolding for the representation of information in connected discourse. American Educational Research Journal 15, 3 (1978), 433--440.Google ScholarGoogle ScholarCross RefCross Ref
  6. Renzo Angles. 2018. The Property Graph Database Model.Google ScholarGoogle Scholar
  7. Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George HL Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Sequeda, et al. 2018. A Core for Future Graph Query Languages. (2018).Google ScholarGoogle Scholar
  8. Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L Reutter, and Domagoj Vrgoc. 2016. Foundations of modern graph query languages. CoRR, abs/1610.06264 (2016).Google ScholarGoogle Scholar
  9. Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Computing Surveys (CSUR) 40, 1 (2008), 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Marcelo Arenas and Martín Ugarte. 2017. Designing a query language for RDF: marrying open and closed worlds. ACM Transactions on Database Systems (TODS) 42, 4 (2017), 21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Guillaume Bagan, Angela Bonifati, Radu Ciucanu, George HL Fletcher, Aurélien Lemay, and Nicky Advokaat. 2016. Generating flexible workloads for graph databases. Proceedings of the VLDB Endowment 9, 13 (2016), 1457--1460.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guillaume Bagan, Angela Bonifati, Radu Ciucanu, George HL Fletcher, Aurélien Lemay, and Nicky Advokaat. 2016. gMark: schema-driven generation of graphs and queries. IEEE Transactions on Knowledge and Data Engineering 29, 4 (2016), 856--869.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pablo Barceló, Jorge Pérez, and Juan L Reutter. 2012. Relative Expressiveness of Nested Regular Expressions. AMW 12 (2012), 180--195.Google ScholarGoogle Scholar
  14. Gordon Bell, Tony Hey, and Alex Szalay. 2009. Beyond the data deluge. Science 323, 5919 (2009), 1297--1298.Google ScholarGoogle Scholar
  15. Tim Berners-Lee, James Hendler, Ora Lassila, et al. 2001. The semantic web. Scientific american 284, 5 (2001), 28--37.Google ScholarGoogle Scholar
  16. Robert E Butts. 1993. Kant's Schemata as semantical Rules. In Historical Pragmatics. Springer, 67--78.Google ScholarGoogle Scholar
  17. Jeremy J Carroll and Graham Klyne. 2004. Resource Description Framework ({RDF}): Concepts and Abstract Syntax. (2004).Google ScholarGoogle Scholar
  18. Artem Chebotko, Andrey Kashlev, and Shiyong Lu. 2015. A big data modeling methodology for Apache Cassandra. In 2015 IEEE International Congress on Big Data. IEEE, 238--245.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Marek Ciglan, AlexAverbuch, and Ladialav Hluchy. 2012. Benchmarking traversal operations over graph databases. In 2012 IEEE 28th International Conference on Data Engineering Workshops. IEEE, 186--189.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Edgar F Codd. 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (1970), 377--387.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Isabelle Comyn-Wattiau and Jacky Akoka. 2017. Model driven reverse engineering of NoSQL property graph databases: The case of Neo4j. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 453--458.Google ScholarGoogle ScholarCross RefCross Ref
  22. Philippe Cudré-Mauroux and Sameh Elnikety. 2011. Graph data management systems for new application domains. Proceedings of the VLDB Endowment 4, 12 (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gwendal Daniel, Gerson Sunyé, and Jordi Cabot. 2016. UMLtoGraphDB: mapping conceptual schemas to graph databases. In International Conference on Conceptual Modeling. Springer, 430--444.Google ScholarGoogle ScholarCross RefCross Ref
  24. David Dominguez-Sal, Norbert Martinez-Bazan, Victor Muntes-Mulero, Pere Baleta, and Josep Lluis Larriba-Pey. 2010. A discussion on the design of graph database benchmarks. In Technology Conference on Performance Evaluation and Benchmarking. Springer, 25--40.Google ScholarGoogle Scholar
  25. George HL Fletcher, Hannes Voigt, and Nikolay Yakovets. 2017. Declarative graph querying in practice and theory. In EDBT/ICDT 2017 Joint Conference 20th International Conference on Extending Database Technology.Google ScholarGoogle Scholar
  26. Mark Graves, Ellen R Bergeman, and Charles B Lawrence. 1994. Querying a genome database using graphs. In Proceedings of the 3th International Conference on Bioinformatics and Genome Research.Google ScholarGoogle Scholar
  27. Richard J Hall, ChristopherWMurray, and Marcel L Verdonk. 2017. The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database. Journal of medicinal chemistry 60, 14 (2017), 6440--6450.Google ScholarGoogle ScholarCross RefCross Ref
  28. Olaf Hartig. 2014. Reconciliation of RDF* and property graphs. arXiv preprint arXiv:1409.3288 (2014).Google ScholarGoogle Scholar
  29. Victor Herrero, Alberto Abelló, and Oscar Romero. 2016. NOSQL design for analytical workloads: variability matters. In International Conference on Conceptual Modeling. Springer, 50--64.Google ScholarGoogle ScholarCross RefCross Ref
  30. Tianyu Jia, Xiaomeng Zhao, ZhengWang, Dahan Gong, and Guiguang Ding. 2016. Model transformation and data migration from relational database to MongoDB. In 2016 IEEE International Congress on Big Data (BigData Congress). IEEE, 60--67.Google ScholarGoogle ScholarCross RefCross Ref
  31. P Jorge, Marcelo Arenas, Claudio Gutierrez, et al. 2008. nSPARQL: A Navigational Language for RDF. The Semantic Web-ISWC 2008 5318 (2008), 66--81.Google ScholarGoogle Scholar
  32. S. Karnouskos, R. Sinha, P. Leitão, L. Ribeiro, and T. I. Strasser. 2018. Assessing the Integration of Industrial Agents and Low-Level Automation Functions with ISO 25010. In Proceedings of the IEEE 16th International Conference on Industrial Informatics (INDIN'18). IEEE, 61--66.Google ScholarGoogle Scholar
  33. Ora Lassila, Ralph R Swick, et al. 1998. Resource description framework (RDF) model and syntax specification. (1998).Google ScholarGoogle Scholar
  34. Leonid Libkin, Juan L Reutter, Adri Soto, Domagoj Vrgo?, et al. 2018. TriAL: A navigational algebra for RDF triplestores. ACM Transactions on Database Systems (TODS) 43, 1 (2018), 5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Robert Lööf and Kenny Pussinen. 2014. Visualisation of requirements and their relations in embedded systems.Google ScholarGoogle Scholar
  36. Ian Robinson, Jim Webber, and Emil Eifrem. 2013. Graph databases. "O'Reilly Media, Inc.".Google ScholarGoogle Scholar
  37. Marko A Rodriguez and Peter Neubauer. 2010. Constructions from dots and lines. Bulletin of the American Society for Information Science and Technology 36, 6 (2010), 35--41.Google ScholarGoogle ScholarCross RefCross Ref
  38. Marko A Rodriguez and Peter Neubauer. 2012. The graph traversal pattern. In Graph Data Management: Techniques and Applications. IGI Global, 29--46.Google ScholarGoogle Scholar
  39. Noa Roy-Hubara, Lior Rokach, Bracha Shapira, and Peretz Shoval. 2017. Modeling graph database schema. IT Professional 19, 6 (2017), 34--43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Diego Serrano and Eleni Stroulia. 2016. From relations to multi-dimensional maps: A SQL-to-hbase transformation methodology. In Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering. IBM Corp., 156--165.Google ScholarGoogle Scholar
  41. Chandan Sharma, Roopak Sinha, and Paulo Leitao. 2019. IASelect: Finding Bestfit Agent Practices in Industrial CPS Using Graph Databases. In IEEE Industrial Informatics Conference (INDIN). IEEE, 1--8.Google ScholarGoogle Scholar
  42. World-Wide Web Consortium (W3C). 2014. RDF Schema 1.1. https://www.w3. org/TR/rdf-schema/ Accessed: 2019-09-03.Google ScholarGoogle Scholar
  43. Igor Zeevi, Petar Bjeljac, Branko Pericic, Stevan Stankovski, Danijel Venus, and Gordana Ostojic. 2018. Model driven development of hybrid databases using lightweight metamodel extensions. Enterprise Information Systems 12, 8--9 (2018), 1221--1238.Google ScholarGoogle Scholar
  44. JW Zhang and YC Tay. 2016. GSCALER: Synthetically Scaling A Given Graph.. In EDBT, Vol. 16. 53--64.Google ScholarGoogle Scholar

Index Terms

  1. A Schema-First Formalism for Labeled Property Graph Databases: Enabling Structured Data Loading and Analytics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        BDCAT '19: Proceedings of the 6th IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
        December 2019
        174 pages
        ISBN:9781450370165
        DOI:10.1145/3365109

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 December 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate27of93submissions,29%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader