Abstract
Database schema design requires careful consideration of the application’s data model, workload, and target database technology to optimize for performance and data size. Traditional normalization schemes used in relational databases minimize data redundancy, whereas NoSQL document-oriented databases favor redundancy and optimize for horizontal scalability and performance.
Systematic NoSQL schema design involves multiple dimensions, and a database designer is in practice required to carefully consider (i) which data elements to copy and co-locate, (ii) which data elements to normalize, and (iii) how to encode data, while taking into account factors such as the workload and data model.
In this paper, we present a workload-driven document database schema recommender (DBSR), which takes a systematic, search-based approach in exploring the complex schema design space. The recommender takes as main inputs the application’s data model and its read workload, and outputs (i) the suggested document schema (featuring secondary indexing), (ii) query plan recommendations, and (iii) a document utility matrix that encodes insights on their respective costs and relative utility. We evaluate recommended schema in MongoDB using YCSB, and show significant benefits to read query performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
DBSR framework repository: https://github.com/vreniers/DBSR.
- 2.
Benchmark repository: https://github.com/vreniers/YCSB-MongoDB-RUBiS.
References
Entity relationships and document design. https://docs.couchbase.com/server/4.6/data-modeling/entity-relationship-doc-design.html. Accessed 25 May 2020
MongoDB: Data model design. https://docs.mongodb.com/manual/core/data-model-design/. Accessed 25 May 2020
Atzeni, P., Bugiotti, F., Cabibbo, L., Torlone, R.: Data modeling in the NoSQL world. Comput. Stand. Interfaces 67, 103149 (2020)
Banerjee, S., Sarkar, A.: Logical level design of NOSQL databases. In: 2016 IEEE Region 10 Conference (TENCON), pp. 2360–2365 (2016)
Bermbach, D., Müller, S., Eberhardt, J., Tai, S.: Informed schema design for column store-based database services. In: 2015 IEEE 8th International Conference on Service-Oriented Computing and Applications (SOCA), pp. 163–172, October 2015
Cecchet, E., Marguerite, J., Zwaenepoel, W.: Performance and scalability of EJB applications. ACM SIGPLAN Not. 37(11), 246–261 (2002)
Chebotko, A., Kashlev, A., Lu, S.: A big data modeling methodology for apache Cassandra. In: IEEE International Congress on Big Data (2015)
Cheng, Chun-Hung., Lee, Wing-Kin, Wong, Kam-Fai: A genetic algorithm-based clustering approach for database partitioning. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 32(3), 215–230 (2002)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154 (2010)
Gómez, P., Casallas, R., Roncancio, C.: Data schema does matter, even in NoSQL systems! In: IEEE Tenth International Conference on Research Challenges in Information Science (RCIS) (2016)
Gómez, P., Roncancio, C., Casallas, R.: Towards quality analysis for document oriented bases. In: International Conference on Conceptual Modeling (2018)
Grolinger, Katarina., Higashino, Wilson A., Tiwari, Abhinav, Capretz, Miriam A.M.: Data management in cloud environments: NoSQL and NewSQL data stores. J. Cloud Comput. Adv. Syst. Appl. 2(1), 1–24 (2013). https://doi.org/10.1186/2192-113X-2-22
Jia, T., Zhao, X., Wang, Z., Gong, D., Ding, G.: Model transformation and data migration from relational database to MongoDB. In: IEEE International Congress on Big Data (BigData Congress) (2016)
Kanade, A., Gopal, A., Kanade, S.: A study of normalization and embedding in MongoDB. In: IEEE International Advance Computing Conference (IACC) (2014)
Lee, C., Zheng, Y.: Automatic SQL-to-NoSQL schema transformation over the MySQL and HBase databases. In: 2015 IEEE International Conference on Consumer Electronics - Taiwan (2015)
Li, X., Ma, Z., Chen, H.: QODM: a query-oriented data modeling approach for NoSQL databases. In: 2014 IEEE Workshop on Advanced Research and Technology in Industry Applications (WARTIA), pp. 338–345. IEEE (2014)
de Lima, C., dos Santos Mello, R.: A workload-driven logical design approach for NoSQL document databases. In: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services (2015)
Mior, M.J., Salem, K., Aboulnaga, A., Liu, R.: NoSE: Schema Design for NoSQL Applications. IEEE Transactions on Knowledge and Data Engineering (Oct 2017)
Pasqualin, D., Souza, G., Buratti, E.L., de Almeida, E.C., Del Fabro, M.D., Weingaertner, D.: A case study of the aggregation query model in read-mostly NoSQL document stores. In: Proceedings of the 20th International Database Engineering & Applications Symposium (2016)
Reniers, V., Van Landuyt, D., Rafique, A., Joosen, W.: Schema design support for semi-structured data: Finding the sweet spot between NF and De-NF. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 2921–2930 (2017)
Stonebraker, M.: SQL databases v. NoSQL databases. Commun. ACM 53(4), 10–11 (2010). https://doi.org/10.1145/1721654.1721659
de la Vega, A., García-Saiz, D., Blanco, C., Zorrilla, M., Sánchez, P.: Mortadelo: automatic generation of NoSQL stores from platform-independent data models. Future Gener. Comput. Syst. 105, 455–474 (2020)
Zhao, G., Lin, Q., Li, L., Li, Z.: Schema conversion model of SQL database to NoSQL. In: 2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, pp. 355–362, November 2014
Acknowledgments
This work has been funded by the KU Leuven Research Fund.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Reniers, V., Van Landuyt, D., Rafique, A., Joosen, W. (2020). A Workload-Driven Document Database Schema Recommender (DBSR). In: Dobbie, G., Frank, U., Kappel, G., Liddle, S.W., Mayr, H.C. (eds) Conceptual Modeling. ER 2020. Lecture Notes in Computer Science(), vol 12400. Springer, Cham. https://doi.org/10.1007/978-3-030-62522-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-62522-1_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62521-4
Online ISBN: 978-3-030-62522-1
eBook Packages: Computer ScienceComputer Science (R0)