ABSTRACT
Graph databases provide better support for highly interconnected datasets than relational databases. However, labeled property graph databases, which have become increasingly popular, are schema-optional, making them prone to data corruption, especially when new users switch from relational databases to graph databases. In this work, we provide a schema-driven formalism for graph databases. This formalism enables schema-driven loading of graph databases from other sources, such as relational databases. Also, this formalism enables schema-driven data analytics that allows for a more structured analysis of data stored in graph databases. Such analytics are based on a boilerplate approach allowing users who are not experts in the use of graph database query languages to carry out analytics efficiently. We showcase the utility of the proposed formalism by considering a case study from Airbnb for illustrating schema-based loading procedures. The proposed schema-driven analytics process is illustrated using another case study from an industrial cyber-physical systems standard. Overall, the schema-driven formalism provides several useful features, such as preventing both data corruption and long-term degradation of graph database structures.
- 2011. Systems and software engineering -- Systems and software Quality Requirements and Evaluation (SQuaRE) -- System and software quality models.Google Scholar
- Inside Airbnb. 2019. Adding data to the debate. http://insideairbnb.com/get-thedata. html Accessed: 2019-09--10.Google Scholar
- Jacky Akoka, Isabelle Comyn-Wattiau, and Nicolas Prat. 2017. A four V's design approach of NoSQL graph databases. In International Conference on Conceptual Modeling. Springer, 58--68.Google ScholarCross Ref
- Güne Aluç, Olaf Hartig,MTamer Özsu, and Khuzaima Daudjee. 2014. Diversified stress testing of RDF data management systems. In International Semantic Web Conference. Springer, 197--212.Google Scholar
- Richard C Anderson, Rand J Spiro, and Mark C Anderson. 1978. Schemata as scaffolding for the representation of information in connected discourse. American Educational Research Journal 15, 3 (1978), 433--440.Google ScholarCross Ref
- Renzo Angles. 2018. The Property Graph Database Model.Google Scholar
- Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter Boncz, George HL Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan Sequeda, et al. 2018. A Core for Future Graph Query Languages. (2018).Google Scholar
- Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan L Reutter, and Domagoj Vrgoc. 2016. Foundations of modern graph query languages. CoRR, abs/1610.06264 (2016).Google Scholar
- Renzo Angles and Claudio Gutierrez. 2008. Survey of graph database models. ACM Computing Surveys (CSUR) 40, 1 (2008), 1.Google ScholarDigital Library
- Marcelo Arenas and Martín Ugarte. 2017. Designing a query language for RDF: marrying open and closed worlds. ACM Transactions on Database Systems (TODS) 42, 4 (2017), 21.Google ScholarDigital Library
- Guillaume Bagan, Angela Bonifati, Radu Ciucanu, George HL Fletcher, Aurélien Lemay, and Nicky Advokaat. 2016. Generating flexible workloads for graph databases. Proceedings of the VLDB Endowment 9, 13 (2016), 1457--1460.Google ScholarDigital Library
- Guillaume Bagan, Angela Bonifati, Radu Ciucanu, George HL Fletcher, Aurélien Lemay, and Nicky Advokaat. 2016. gMark: schema-driven generation of graphs and queries. IEEE Transactions on Knowledge and Data Engineering 29, 4 (2016), 856--869.Google ScholarDigital Library
- Pablo Barceló, Jorge Pérez, and Juan L Reutter. 2012. Relative Expressiveness of Nested Regular Expressions. AMW 12 (2012), 180--195.Google Scholar
- Gordon Bell, Tony Hey, and Alex Szalay. 2009. Beyond the data deluge. Science 323, 5919 (2009), 1297--1298.Google Scholar
- Tim Berners-Lee, James Hendler, Ora Lassila, et al. 2001. The semantic web. Scientific american 284, 5 (2001), 28--37.Google Scholar
- Robert E Butts. 1993. Kant's Schemata as semantical Rules. In Historical Pragmatics. Springer, 67--78.Google Scholar
- Jeremy J Carroll and Graham Klyne. 2004. Resource Description Framework ({RDF}): Concepts and Abstract Syntax. (2004).Google Scholar
- Artem Chebotko, Andrey Kashlev, and Shiyong Lu. 2015. A big data modeling methodology for Apache Cassandra. In 2015 IEEE International Congress on Big Data. IEEE, 238--245.Google ScholarDigital Library
- Marek Ciglan, AlexAverbuch, and Ladialav Hluchy. 2012. Benchmarking traversal operations over graph databases. In 2012 IEEE 28th International Conference on Data Engineering Workshops. IEEE, 186--189.Google ScholarDigital Library
- Edgar F Codd. 1970. A relational model of data for large shared data banks. Commun. ACM 13, 6 (1970), 377--387.Google ScholarDigital Library
- Isabelle Comyn-Wattiau and Jacky Akoka. 2017. Model driven reverse engineering of NoSQL property graph databases: The case of Neo4j. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 453--458.Google ScholarCross Ref
- Philippe Cudré-Mauroux and Sameh Elnikety. 2011. Graph data management systems for new application domains. Proceedings of the VLDB Endowment 4, 12 (2011).Google ScholarDigital Library
- Gwendal Daniel, Gerson Sunyé, and Jordi Cabot. 2016. UMLtoGraphDB: mapping conceptual schemas to graph databases. In International Conference on Conceptual Modeling. Springer, 430--444.Google ScholarCross Ref
- David Dominguez-Sal, Norbert Martinez-Bazan, Victor Muntes-Mulero, Pere Baleta, and Josep Lluis Larriba-Pey. 2010. A discussion on the design of graph database benchmarks. In Technology Conference on Performance Evaluation and Benchmarking. Springer, 25--40.Google Scholar
- George HL Fletcher, Hannes Voigt, and Nikolay Yakovets. 2017. Declarative graph querying in practice and theory. In EDBT/ICDT 2017 Joint Conference 20th International Conference on Extending Database Technology.Google Scholar
- Mark Graves, Ellen R Bergeman, and Charles B Lawrence. 1994. Querying a genome database using graphs. In Proceedings of the 3th International Conference on Bioinformatics and Genome Research.Google Scholar
- Richard J Hall, ChristopherWMurray, and Marcel L Verdonk. 2017. The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database. Journal of medicinal chemistry 60, 14 (2017), 6440--6450.Google ScholarCross Ref
- Olaf Hartig. 2014. Reconciliation of RDF* and property graphs. arXiv preprint arXiv:1409.3288 (2014).Google Scholar
- Victor Herrero, Alberto Abelló, and Oscar Romero. 2016. NOSQL design for analytical workloads: variability matters. In International Conference on Conceptual Modeling. Springer, 50--64.Google ScholarCross Ref
- Tianyu Jia, Xiaomeng Zhao, ZhengWang, Dahan Gong, and Guiguang Ding. 2016. Model transformation and data migration from relational database to MongoDB. In 2016 IEEE International Congress on Big Data (BigData Congress). IEEE, 60--67.Google ScholarCross Ref
- P Jorge, Marcelo Arenas, Claudio Gutierrez, et al. 2008. nSPARQL: A Navigational Language for RDF. The Semantic Web-ISWC 2008 5318 (2008), 66--81.Google Scholar
- S. Karnouskos, R. Sinha, P. Leitão, L. Ribeiro, and T. I. Strasser. 2018. Assessing the Integration of Industrial Agents and Low-Level Automation Functions with ISO 25010. In Proceedings of the IEEE 16th International Conference on Industrial Informatics (INDIN'18). IEEE, 61--66.Google Scholar
- Ora Lassila, Ralph R Swick, et al. 1998. Resource description framework (RDF) model and syntax specification. (1998).Google Scholar
- Leonid Libkin, Juan L Reutter, Adri Soto, Domagoj Vrgo?, et al. 2018. TriAL: A navigational algebra for RDF triplestores. ACM Transactions on Database Systems (TODS) 43, 1 (2018), 5.Google ScholarDigital Library
- Robert Lööf and Kenny Pussinen. 2014. Visualisation of requirements and their relations in embedded systems.Google Scholar
- Ian Robinson, Jim Webber, and Emil Eifrem. 2013. Graph databases. "O'Reilly Media, Inc.".Google Scholar
- Marko A Rodriguez and Peter Neubauer. 2010. Constructions from dots and lines. Bulletin of the American Society for Information Science and Technology 36, 6 (2010), 35--41.Google ScholarCross Ref
- Marko A Rodriguez and Peter Neubauer. 2012. The graph traversal pattern. In Graph Data Management: Techniques and Applications. IGI Global, 29--46.Google Scholar
- Noa Roy-Hubara, Lior Rokach, Bracha Shapira, and Peretz Shoval. 2017. Modeling graph database schema. IT Professional 19, 6 (2017), 34--43.Google ScholarDigital Library
- Diego Serrano and Eleni Stroulia. 2016. From relations to multi-dimensional maps: A SQL-to-hbase transformation methodology. In Proceedings of the 26th Annual International Conference on Computer Science and Software Engineering. IBM Corp., 156--165.Google Scholar
- Chandan Sharma, Roopak Sinha, and Paulo Leitao. 2019. IASelect: Finding Bestfit Agent Practices in Industrial CPS Using Graph Databases. In IEEE Industrial Informatics Conference (INDIN). IEEE, 1--8.Google Scholar
- World-Wide Web Consortium (W3C). 2014. RDF Schema 1.1. https://www.w3. org/TR/rdf-schema/ Accessed: 2019-09-03.Google Scholar
- Igor Zeevi, Petar Bjeljac, Branko Pericic, Stevan Stankovski, Danijel Venus, and Gordana Ostojic. 2018. Model driven development of hybrid databases using lightweight metamodel extensions. Enterprise Information Systems 12, 8--9 (2018), 1221--1238.Google Scholar
- JW Zhang and YC Tay. 2016. GSCALER: Synthetically Scaling A Given Graph.. In EDBT, Vol. 16. 53--64.Google Scholar
Index Terms
- A Schema-First Formalism for Labeled Property Graph Databases: Enabling Structured Data Loading and Analytics
Recommendations
FLASc: a formal algebra for labeled property graph schema
AbstractContemporary labeled property graph databases are either schema-less or schema-optional to support frequent changes in the structure of data found in domains requiring high flexibility. However, the lack of structure impacts data transformation ...
Graph Databases Comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J, and OrientDB
DATA 2018: Proceedings of the 7th International Conference on Data Science, Technology and ApplicationsGraph databases are a very powerful solution for storing and searching for data designed for data rich in relationships, such as Facebook and Twitter. With data multiplication and data type diversity there has been a need to create new storage and ...
Vishleshan: Performance Comparison and Programming Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages
IDEAS '15: Proceedings of the 19th International Database Engineering & Applications SymposiumProcess-Aware Information System (PAIS) are IT systems that manages, supports business processes and generate large event logs from execution of business processes. Process Mining consists of analyzing event logs generated by PAISs and discover business ...
Comments