Abstract
Multimodal data, integrating various types of data like images, text, audio, and video, has become prevalent in the era of big data. However, there is a gap in benchmarking specifically designed for multimodal data, as existing benchmarks primarily focus on traditional and multimodel databases, lacking a comprehensive framework for evaluating systems handling multimodal data. In this paper, we present a novel benchmark program, named MMDBench, specifically designed to evaluate the performance of multimodal databases that accommodate various data modalities, including structured data, images, and text. The workload of MMDBench is composed of eleven tasks, inspired by real-world scenarios in social networks, where multiple data modalities are involved. Each task simulates a specific scenario that necessitates the integration of at least two distinct data modalities. To demonstrate the effectiveness of MMDBench, we have developed a hybrid database system to execute the workload and have uncovered diverse characteristics of multimodal databases in the execution of hybrid queries.
Supported by National Key R &D Program of China(Grant No. 2022YFF0711600), National Key R &D Program of China(Grant No. 2021YFF0704200) and Informatization Plan of Chinese Academy of Sciences(Grant No. CAS-WX2022GC-02).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the Facebook social graph. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1185–1196 (2013)
Bronson, N., et al.: \(\{\)TAO\(\}\):\(\{\)Facebook’s\(\}\) distributed data store for the social graph. In: 2013 USENIX Annual Technical Conference (USENIX ATC 2013), pp. 49–60 (2013)
Cai, Q., Wang, H., Li, Z., Liu, X.: A survey on multimodal data-driven smart healthcare systems: approaches and applications. IEEE Access 7, 133583–133599 (2019)
Chandrasekaran, G., Nguyen, T.N., Hemanth D, J.: Multimodal sentimental analysis for social media applications: a comprehensive review. Wiley Interdisc. Rev.: Data Min. Knowl. Disc. 11(5), e1415 (2021)
Chasseur, C., Li, Y., Patel, J.M.: Enabling JSON document stores in relational systems. In: WebDB, vol. 13, pp. 14–15 (2013)
Erling, O., et al.: The LDBC social network benchmark: interactive workload. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 619–630 (2015)
Ghazal, A., et al.: Bigbench v2: the new and improved bigbench. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 1225–1236. IEEE (2017)
Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanford 1(12), 2009 (2009)
Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database forstudying face recognition in unconstrained environments. In: Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition (2008)
Kim, B., Koo, K., Enkhbat, U., Kim, S., Kim, J., Moon, B.: M2bench: a database benchmark for multi-model analytic workloads. Proc. VLDB Endowment 16(4), 747–759 (2022)
Misra, R.: News category dataset. arXiv preprint arXiv:2209.11429 (2022)
Nambiar, R.O., Poess, M.: The making of TPC-DS. In: VLDB, vol. 6, pp. 1049–1058 (2006)
Rothe, R., Timofte, R., Gool, L.V.: Deep expectation of real and apparent age from a single image without facial landmarks. Int. J. Comput. Vision 126(2–4), 144–157 (2018)
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Wang, Z., Li, L., Li, Q., Zeng, D.: Multimodal data enhanced representation learning for knowledge graphs. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Wei, C., et al.: AnalyticDB-V: a hybrid analytical engine towards query fusion for structured and unstructured data. Proc. VLDB Endowment 13(12), 3152–3165 (2020)
Wei, J., Zou, K.: Eda: easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196 (2019)
Zhang, C., Lu, J.: Holistic evaluation in multi-model databases benchmarking. Distrib. Parallel Databases 39, 1–33 (2021)
Zhang, C., Lu, J., Xu, P., Chen, Y.: UniBench: a benchmark for multi-model database management systems. In: Nambiar, R., Poess, M. (eds.) TPCTC 2018. LNCS, vol. 11135, pp. 7–23. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11404-6_2
Zhao, Z., Shen, Z., Mao, A., Wang, H., Hu, C.: PandaDB: an AI-native graph database for unified managing structured and unstructured data. In: Wang, X., et al. (eds.) Database Systems for Advanced Applications, DASFAA 2023. LNCS, vol. 13946, pp. 669–673. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30678-5_53
Zhu, X., et al.: Multi-modal knowledge graph construction and application: a survey. IEEE Trans. Knowl. Data Eng. (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mao, A. et al. (2024). MMDBench: A Benchmark for Hybrid Query in Multimodal Database. In: Hunold, S., Xie, B., Shu, K. (eds) Benchmarking, Measuring, and Optimizing. Bench 2023. Lecture Notes in Computer Science, vol 14521. Springer, Singapore. https://doi.org/10.1007/978-981-97-0316-6_6
Download citation
DOI: https://doi.org/10.1007/978-981-97-0316-6_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0315-9
Online ISBN: 978-981-97-0316-6
eBook Packages: Computer ScienceComputer Science (R0)