Abstract
The wave of interest in data-centric applications has spawned a high variety of data models, making it extremely difficult to evaluate, integrate or access them in a uniform way. Moreover, many recent models are too specific to allow immediate comparison with the others and do not easily support incremental model design. In this paper, we introduce GSMM, a meta-model based on the use of a generic graph that can be instantiated to a concrete data model by simply providing values for a restricted set of parameters and some high-level constraints, themselves represented as graphs. In GSMM, the concept of data schema is replaced by that of constraint, which allows the designer to impose structural restrictions on data in a very flexible way. GSMM includes GSL, a graph-based language for expressing queries and constraints that besides being applicable to data represented in GSMM, in principle, can be specialised and used for existing models where no language was defined. We show some sample applications of GSMM for deriving and comparing classical data models like the relational model, plain XML data, XML Schema, and time-varying semistructured data. We also show how GSMM can represent more recent modelling proposals: the triple stores, the BigTable model and Neo4j, a graph-based model for NoSQL data. A prototype showing the potential of the approach is also described.
Similar content being viewed by others
Notes
We say that data are semi-structured when, although some structure is present, it is not as strict, regular, or complete as the one required by the traditional database management systems [1].
Big Table is the model shared by popular NoSQL databases like Apache HBase and Cassandra [13].
In the remainder of the paper we denote constants by means of lowercase words, whereas words denoting variables start with a capital letter.
The notation \(b_2\mid N'\) stands for the restriction of mapping \(b_2\) to the nodes in \(N'\).
Plain XML documents may also contain ENTITY nodes, not unlike macro calls that must be expanded before parsing. We do not consider ENTITY expansion in this paper.
For the sake of conciseness Table 1 does not explicitly consider Base Types, because they may be very large.
An edge pointing to \(m_2\).
References
Abiteboul S (1997) Querying semi-structured data. In: Proceedings of the international conference on database theory, vol 1186. Lecture notes in computer science, pp 262–275
Angles R (2012) A comparison of current graph database models. In: Proceedings of the 2012 IEEE 28th international conference on data engineering workshops, ICDEW ’12. IEEE Computer Society, Washington, DC, pp 171–177
Atzeni P, Cappellari P, Torlone R, Bernstein PA, Gianforme G (2008) Model-independent schema translation. VLDB J 17(6):1347–1370
Atzeni P, Torlone R (2001) A unified framework for data translation over the web. In: Proceedings of the 2nd international conference on web information system engineering. IEEE Computer Society, pp 350–358
Bekiropoulos K, Keramopoulos E, Beza O, Mouratidis P (2010) A list of features that a graphical xml query language should support. Comput Syst Sci Eng 25(5):13–21
Benda S, Klímek J, Nečaský M (2013) Using schematron as schema language in conceptual modeling for xml. In: Proceedings of the ninth Asia-Pacific conference on conceptual modelling, vol 143, APCCM ’13. Australian Computer Society, Inc., Darlinghurst, pp 31–40
Bernstein PA, Halevy AY, Pottinger RA (2000) A vision for management of complex models. SIGMOD Rec 29(4):55–63
Bernstein PA, Pottinger R (2003) Merging models based on given correspondences. Technical report UW-CSE-03-02-03. University of Washington
Bowers S, Delcambre L (2000) Representing and transforming model-based information. In: Proceedings of International workshop on the semantic web at the 4th European conference on research and advanced technology for digital libraries (SemWeb)
Bunemann P, Fan W, Siméon J, Weinstein S (2001) Constraints for semistructured data and XML. SIGMOD Rec 30:47–54
Bunemann P, Fan W, Weinstein S (1998) Path constraints on semistructured and structured data. In: Proceedings of 17th symposium on principles of database system. ACM Press, pp 129–138
Cattell R (2011) Scalable SQL and NoSQL data stores. SIGMOD Rec 39(4):12–27
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):4:1–4:26
Chawathe SS, Abiteboul S, Widom J (1998) Representing and querying changes in semistructured data. In: Proceedings of the fourteenth international conference on data engineering. IEEE Computer Society, pp 4–13
Chawathe SS, Abiteboul S, Widom J (1999) Managing historical semistructured data. Theory Pract Object Syst 5(3):143–162
Chen L, Oughtred R, Berman HM, Westbrook J (2004) Targetdb: a target registration database for structural genomics projects. Bioinform Appl Notes 20(16):2860–2862
Combi C, Oliboni B, Quintarelli E (2012) Modeling temporal dimensions of semistructured data. J Intell Inf Syst 38(3):601–644
Cortesi A, Dovier A, Quintarelli E, Tanca L (2002) Operational and abstract semantics of a query language for semi-structured information. Theor Comput Sci 275(1–2):521–560
Damiani E, Oliboni B, Quintarelli E, Tanca L (2003) Modeling semistructured data by using graph-based constraints. In: OTM workshops proceedings. Lecture notes in computer science. Springer, Berlin, pp 20–21
Damiani E, Tanca L (1997) Semantic approches to structuring and querying web sites. In: Proceedings of 7th IFIP working conference on database semantics (DS-97)
Fan W, Lu P (2017) Dependencies for graphs. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems, PODS ’17. ACM, pp 403–416
Indrawan-Santiago M (2012) Database research: Are we at a crossroad? reflection on NoSQL. In: Proceedings of the 2012 15th international conference on network-based information systems, NBIS ’12. IEEE Computer Society, Washington, DC, pp 45–51
Kaur K, Rani R (2013) Modeling and querying data in NoSQL databases. In: Proceedings of the IEEE international conference on Big Data, pp 1 – 7
Lee KK-Y, Tang W-C, Choi K-S (2013) Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage. Comput Methods Progr Biomed 110(1):99–109
Levy AY, Rajaraman A, Ordille JJ (1996) Querying heterogeneous information sources using source descriptions. In: Proceedings of the twenty-second international conference on very large databases. VLDB Endowment, Saratoga, Calif., Bombay, India, pp 251–262
Makoto M, Lee D, Mani M, Kawaguchi K (2005) Taxonomy of XML schema languages using formal language theory. ACM Trans Internet Technol 5(4):660–704
McBrien P, Poulovassilis A (1999) A uniform approach to inter-model transformations. In: Conference on advanced information systems engineering, pp 333–348
Oliboni B, Quintarelli E, Tanca L (2001) Temporal aspects of semistructured data. In: Proceedings of the eighth international symposium on temporal representation and reasoning (TIME-01). IEEE Computer Society, pp 119–127
Papakonstantinou Y, Garcia-Molina H, Widom J (1995) Object exchange across heterogeneous information sources. In: Proceedings of the eleventh international conference on data engineering. IEEE Computer Society, pp 251–260
Paredaens J, Peelman P, Tanca L (1995) G-Log: a declarative graphical query language. IEEE Trans Knowl Data Eng 7(3):436–453
Vicknair C, Macias M, Zhao Z, Nan X, Chen Y, Wilkins D (2010) A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th annual southeast regional conference, ACM SE ’10. ACM, New York, NY, pp 42:1–42:6
Virgilio RD, Maccioni A, Torlone R (2014) Graph-driven exploration of relational databases for efficient keyword search. In: Candan KS, Amer-Yahia S, Schweikardt N, Christophides V, Leroy V (eds) Proceedings of the workshops of the EDBT/ICDT 2014 joint conference (EDBT/ICDT 2014), Athens, Greece, March 28, 2014, Vol. 1133 of CEUR workshop proceedings, CEUR-WS.org, pp 208–215
W3C (1998) World wide web consortium. Extensible Markup Language (XML) 1.0. http://www.w3C.org/TR/REC-xml/
W3C (2001) World wide web consortium. XML schema. http://www.w3C.org/TR/xmlschema-1/
Zang T, Calinescu R, Kwiatkowska MZ (2011) Metamodel-driven SOA for collaborative e-science application. Comput Syst Sci Eng 26(3):215–226
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Damiani, E., Oliboni, B., Quintarelli, E. et al. A graph-based meta-model for heterogeneous data management. Knowl Inf Syst 61, 107–136 (2019). https://doi.org/10.1007/s10115-018-1305-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1305-8