Abstract
Being schemaless is a common feature of most NoSQL systems. It accommodates the change and non-uniformity of stored data, and allows fast deployment of databases. However, the lack of database schemas makes it difficult to develop database applications and tools. Therefore, explicit schemas should be produced, either inferred from NoSQL data, code, or both, to facilitate the work of developers and support the functionality of database tools. Strategies published to discover NoSQL schemas focus on the extraction of the entity types but physical schemas have received very little attention. Our group recently presented an approach to infer logical schemas from aggregate-based NoSQL stores. Because the inferred schemas do not capture physical information on the underlying database, they can not help with the implementation of some typical database tasks, like database migrations, optimization, and schema evolution. In this paper we extend our previous approach by proposing a physical metamodel targeted to MongoDB databases, which captures characteristics such as existing indexes, data organization, and statistical features (e.g. cardinality of values.) We also explain the process of retrieving the physical model from an existing database, and the bidirectional transformations between logical and physical models.
This work has been funded by the Spanish Ministry of Science, Innovation and Universities (project grant TIN2017-86853-P).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
MongoDB Webpage www.mongodb.com.
- 2.
Ecore Webpage http://wiki.eclipse.org/ecore.
- 3.
Fifth position in the DB-Engines ranking, July 2020. db-engines.com/en/ranking.
- 4.
BSON specification: www.bsonspec.org.
- 5.
MongoDB BSON types: docs.mongodb.com/manual/reference/bson-types.
- 6.
Spark Webpage: http://spark.apache.org.
References
Atzeni, P., Bugiotti, F., Cabibbo, L., Torlone, R.: Data modeling in the NoSQL world. Comput. Stand. Interfaces 67, 103149 (2020)
Chillon, A.H., Sevilla, D., Garcia-Molina, J.: Deimos: a model-based NoSQL data generation language. In: 1st CoMoNoS Workshop in 39th International Conference on Conceptual Modeling (2020)
Comyn-Wattiau, I., Akoka, J.: Reverse engineering of relational database physical schemas. In: Thalheim, B. (ed.) ER 1996. LNCS, vol. 1157, pp. 372–391. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0019935
ER-Studio Webpage. https://www.idera.com/er-studio-enterprise-data-modeling-and-architecture-tools. Accessed April 2019
CA ERwin Web Page. http://erwin.com/products/data-modeler. April 2019
Hainaut, J.: The transformational approach to database engineering. In: GTTSE, International Summer School, Portugal, pp. 95–143 (2005)
Hernández, A., Feliciano, S., Sevilla, D., García Molina, J.: Exploring the visualization of schemas for aggregate-oriented NoSQL databases. In: 36th International Conference on Conceptual Modeling on ER Forum, pp. 72–85 (2017)
Hernández, A., Sevilla, D., García Molina, J., Feliciano, S.: A model-driven approach to generate schemas for object-document mappers. IEEE Access 7, 59126–59142 (2019)
Hick, J.M., Hainaut, J.L.: Strategy for database application evolution: the DB-main approach. In: 22nd International Conference on Conceptual Modeling, pp. 291–306 (2003)
Klettke, M., Störl, U., Scherzinger, S.: Schema extraction and structural outlier detection for JSON-based NoSQL data stores. In: Conference on Database Systems for Business, Technology, and Web, pp. 425–444 (2015)
Mior, M.J.: Physical Design for Non-relational Data Systems. Ph.D. thesis, University of Waterloo, Ontario, Canada (2018)
Mior, M.J., Salem, K., Aboulnaga, A., Liu, R.: Nose: schema design for NoSQL applications. In: Proceedings of 32nd IEEE International Conference on Data Engineering, pp. 181–192 (2016)
Sevilla Ruiz, D., Morales, S.F., García Molina, J.: Inferring versioned schemas from NoSQL databases and its applications. In: Johannesson, P., Lee, M.L., Liddle, S.W., Opdahl, A.L., López, Ó.P. (eds.) ER 2015. LNCS, vol. 9381, pp. 467–480. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25264-3_35
Steinberg, D., Budinsky, F., Paternostro, M., Merks, E.: EMF: Eclipse Modeling Framework 2.0. Addison-Wesley Professional (2009)
Wang, L., Zhang, S., Shi, J., Jiao, L., Hassanzadeh, O., Zou, J., Wangz, C.: Schema management for document stores. In: VLDB Endowment, vol. 8 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Muñoz-Sánchez, P.D., Fernández Candel, C.J., García Molina, J., Sevilla Ruiz, D. (2020). Managing Physical Schemas in MongoDB Stores. In: Grossmann, G., Ram, S. (eds) Advances in Conceptual Modeling. ER 2020. Lecture Notes in Computer Science(), vol 12584. Springer, Cham. https://doi.org/10.1007/978-3-030-65847-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-65847-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-65846-5
Online ISBN: 978-3-030-65847-2
eBook Packages: Computer ScienceComputer Science (R0)