[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

A graph-based meta-model for heterogeneous data management

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The wave of interest in data-centric applications has spawned a high variety of data models, making it extremely difficult to evaluate, integrate or access them in a uniform way. Moreover, many recent models are too specific to allow immediate comparison with the others and do not easily support incremental model design. In this paper, we introduce GSMM, a meta-model based on the use of a generic graph that can be instantiated to a concrete data model by simply providing values for a restricted set of parameters and some high-level constraints, themselves represented as graphs. In GSMM, the concept of data schema is replaced by that of constraint, which allows the designer to impose structural restrictions on data in a very flexible way. GSMM includes GSL, a graph-based language for expressing queries and constraints that besides being applicable to data represented in GSMM, in principle, can be specialised and used for existing models where no language was defined. We show some sample applications of GSMM for deriving and comparing classical data models like the relational model, plain XML data, XML Schema, and time-varying semistructured data. We also show how GSMM can represent more recent modelling proposals: the triple stores, the BigTable model and Neo4j, a graph-based model for NoSQL data. A prototype showing the potential of the approach is also described.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. We say that data are semi-structured when, although some structure is present, it is not as strict, regular, or complete as the one required by the traditional database management systems [1].

  2. Big Table is the model shared by popular NoSQL databases like Apache HBase and Cassandra [13].

  3. In the remainder of the paper we denote constants by means of lowercase words, whereas words denoting variables start with a capital letter.

  4. The notation \(b_2\mid N'\) stands for the restriction of mapping \(b_2\) to the nodes in \(N'\).

  5. Plain XML documents may also contain ENTITY nodes, not unlike macro calls that must be expanded before parsing. We do not consider ENTITY expansion in this paper.

  6. For the sake of conciseness Table 1 does not explicitly consider Base Types, because they may be very large.

  7. An edge pointing to \(m_2\).

References

  1. Abiteboul S (1997) Querying semi-structured data. In: Proceedings of the international conference on database theory, vol 1186. Lecture notes in computer science, pp 262–275

  2. Angles R (2012) A comparison of current graph database models. In: Proceedings of the 2012 IEEE 28th international conference on data engineering workshops, ICDEW ’12. IEEE Computer Society, Washington, DC, pp 171–177

  3. Atzeni P, Cappellari P, Torlone R, Bernstein PA, Gianforme G (2008) Model-independent schema translation. VLDB J 17(6):1347–1370

    Article  Google Scholar 

  4. Atzeni P, Torlone R (2001) A unified framework for data translation over the web. In: Proceedings of the 2nd international conference on web information system engineering. IEEE Computer Society, pp 350–358

  5. Bekiropoulos K, Keramopoulos E, Beza O, Mouratidis P (2010) A list of features that a graphical xml query language should support. Comput Syst Sci Eng 25(5):13–21

    Google Scholar 

  6. Benda S, Klímek J, Nečaský M (2013) Using schematron as schema language in conceptual modeling for xml. In: Proceedings of the ninth Asia-Pacific conference on conceptual modelling, vol 143, APCCM ’13. Australian Computer Society, Inc., Darlinghurst, pp 31–40

  7. Bernstein PA, Halevy AY, Pottinger RA (2000) A vision for management of complex models. SIGMOD Rec 29(4):55–63

    Article  Google Scholar 

  8. Bernstein PA, Pottinger R (2003) Merging models based on given correspondences. Technical report UW-CSE-03-02-03. University of Washington

  9. Bowers S, Delcambre L (2000) Representing and transforming model-based information. In: Proceedings of International workshop on the semantic web at the 4th European conference on research and advanced technology for digital libraries (SemWeb)

  10. Bunemann P, Fan W, Siméon J, Weinstein S (2001) Constraints for semistructured data and XML. SIGMOD Rec 30:47–54

    Article  Google Scholar 

  11. Bunemann P, Fan W, Weinstein S (1998) Path constraints on semistructured and structured data. In: Proceedings of 17th symposium on principles of database system. ACM Press, pp 129–138

  12. Cattell R (2011) Scalable SQL and NoSQL data stores. SIGMOD Rec 39(4):12–27

    Article  Google Scholar 

  13. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):4:1–4:26

    Article  Google Scholar 

  14. Chawathe SS, Abiteboul S, Widom J (1998) Representing and querying changes in semistructured data. In: Proceedings of the fourteenth international conference on data engineering. IEEE Computer Society, pp 4–13

  15. Chawathe SS, Abiteboul S, Widom J (1999) Managing historical semistructured data. Theory Pract Object Syst 5(3):143–162

    Article  Google Scholar 

  16. Chen L, Oughtred R, Berman HM, Westbrook J (2004) Targetdb: a target registration database for structural genomics projects. Bioinform Appl Notes 20(16):2860–2862

    Article  Google Scholar 

  17. Combi C, Oliboni B, Quintarelli E (2012) Modeling temporal dimensions of semistructured data. J Intell Inf Syst 38(3):601–644

    Article  Google Scholar 

  18. Cortesi A, Dovier A, Quintarelli E, Tanca L (2002) Operational and abstract semantics of a query language for semi-structured information. Theor Comput Sci 275(1–2):521–560

    Article  MATH  Google Scholar 

  19. Damiani E, Oliboni B, Quintarelli E, Tanca L (2003) Modeling semistructured data by using graph-based constraints. In: OTM workshops proceedings. Lecture notes in computer science. Springer, Berlin, pp 20–21

  20. Damiani E, Tanca L (1997) Semantic approches to structuring and querying web sites. In: Proceedings of 7th IFIP working conference on database semantics (DS-97)

  21. Fan W, Lu P (2017) Dependencies for graphs. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems, PODS ’17. ACM, pp 403–416

  22. Indrawan-Santiago M (2012) Database research: Are we at a crossroad? reflection on NoSQL. In: Proceedings of the 2012 15th international conference on network-based information systems, NBIS ’12. IEEE Computer Society, Washington, DC, pp 45–51

  23. Kaur K, Rani R (2013) Modeling and querying data in NoSQL databases. In: Proceedings of the IEEE international conference on Big Data, pp 1 – 7

  24. Lee KK-Y, Tang W-C, Choi K-S (2013) Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage. Comput Methods Progr Biomed 110(1):99–109

    Article  Google Scholar 

  25. Levy AY, Rajaraman A, Ordille JJ (1996) Querying heterogeneous information sources using source descriptions. In: Proceedings of the twenty-second international conference on very large databases. VLDB Endowment, Saratoga, Calif., Bombay, India, pp 251–262

  26. Makoto M, Lee D, Mani M, Kawaguchi K (2005) Taxonomy of XML schema languages using formal language theory. ACM Trans Internet Technol 5(4):660–704

    Article  Google Scholar 

  27. McBrien P, Poulovassilis A (1999) A uniform approach to inter-model transformations. In: Conference on advanced information systems engineering, pp 333–348

  28. Oliboni B, Quintarelli E, Tanca L (2001) Temporal aspects of semistructured data. In: Proceedings of the eighth international symposium on temporal representation and reasoning (TIME-01). IEEE Computer Society, pp 119–127

  29. Papakonstantinou Y, Garcia-Molina H, Widom J (1995) Object exchange across heterogeneous information sources. In: Proceedings of the eleventh international conference on data engineering. IEEE Computer Society, pp 251–260

  30. Paredaens J, Peelman P, Tanca L (1995) G-Log: a declarative graphical query language. IEEE Trans Knowl Data Eng 7(3):436–453

    Article  Google Scholar 

  31. Vicknair C, Macias M, Zhao Z, Nan X, Chen Y, Wilkins D (2010) A comparison of a graph database and a relational database: a data provenance perspective. In: Proceedings of the 48th annual southeast regional conference, ACM SE ’10. ACM, New York, NY, pp 42:1–42:6

  32. Virgilio RD, Maccioni A, Torlone R (2014) Graph-driven exploration of relational databases for efficient keyword search. In: Candan KS, Amer-Yahia S, Schweikardt N, Christophides V, Leroy V (eds) Proceedings of the workshops of the EDBT/ICDT 2014 joint conference (EDBT/ICDT 2014), Athens, Greece, March 28, 2014, Vol. 1133 of CEUR workshop proceedings, CEUR-WS.org, pp 208–215

  33. W3C (1998) World wide web consortium. Extensible Markup Language (XML) 1.0. http://www.w3C.org/TR/REC-xml/

  34. W3C (2001) World wide web consortium. XML schema. http://www.w3C.org/TR/xmlschema-1/

  35. Zang T, Calinescu R, Kwiatkowska MZ (2011) Metamodel-driven SOA for collaborative e-science application. Comput Syst Sci Eng 26(3):215–226

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Barbara Oliboni.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Damiani, E., Oliboni, B., Quintarelli, E. et al. A graph-based meta-model for heterogeneous data management. Knowl Inf Syst 61, 107–136 (2019). https://doi.org/10.1007/s10115-018-1305-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1305-8

Keywords

Navigation