Abstract
Resource Description Framework (RDF) is a widespread standard and flexible method of data representation. RDF storage systems are actively used for storing, sharing, and publishing RDF data on the Internet. RDF models are used in business applications and by research teams who wish to share their data with the community. Generally, most RDF stores are optimized for queries, but usually, this is achieved at the cost of increased disk space consumption. Renting a dedicated server with a large volume of local storage is quite expensive, especially for small research teams and business startups which makes it an important factor in choosing the data storage. In this study we compared disk space usage of four popular triple storage which can serve as SPARQL (an SQL-like query language) endpoints, depending on the amount and structure of the loaded RDF data. To the best of our knowledge, no previous work has compared the disk space occupied by triple stores. We found that all of the compared open-source solutions, namely Apache Jena Fuseki, consume large amounts of hard disk space and should be used with caution in resource-limited environments. The data structure – one large graph or many smaller named graphs – strongly affected Parliament’s disk space usage so it also should be taken into account when selecting an RDF storage. Free versions of commercial systems show adequate disk consumption and appear to be weakly dependent on data structure, but Ontotext GraphDb is deliberately limited in performance, and Stardog is limited in license term and may need additional manual maintenance.
The reported study was funded by RFBR, project number 20-07-00764.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
MiB: 1 mebibyte equals to \(1024^3\) bytes
- 7.
RDF serialization format:https://www.w3.org/TR/turtle.
- 8.
References
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Ben Mahria, B., Chaker, I., Zahi, A.: An empirical study on the evaluation of the RDF storage systems. J. Big Data 8(1), 100 (2021)
Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Seman. Web Inf. Syst.5, 1–24 (2009). https://doi.org/10.4018/jswis.2009040101
Bonduel, M.: RDF triplestores and SPARQL endpoints (2019). http://www.linkedbuildingdata.net/ldac2019/summerschool/files/07_Bonduel_triplestores_SPARQL_endpoints.pdf
Deb Nath, R.P., Hose, K., Pedersen, T.B., Romero, O., Bhattacharjee, A.: SETLBI: An Integrated Platform for Semantic Business Intelligence, pp. 167-171. Association for Computing Machinery, New York (2020), https://doi.org/10.1145/3366424.3383533
Fellbaum, C.: WordNet, pp. 231–243. Springer Netherlands, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10
Fernández, J.D., Umbrich, J., Polleres, A., Knuth, M.: Evaluating query and storage strategies for RDF archives. Semantic Web 10, 247–291 (2019). https://doi.org/10.3233/SW-180309
Ilievski, F., et al.: KGTK: a toolkit for large knowledge graph manipulation and analysis. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 278–293. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_18
Kirchhoff, M., Geihs, K.: Querying SAP ERP with SPARQL. In: Proceedings of the 8th International Conference on Semantic Systems, pp. 173–176. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2362499.2362525
Pan, J.Z.: Resource Description Framework, pp. 71–90. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92673-3_3
Ramesh, A., Pradhan, V., Lamkuche, H.: Understanding and analysing resource utilization, costing strategies and pricing models in cloud computing. J. Phys. Conf. Ser. 1964(4), 042049 (2021). https://doi.org/10.1088/1742-6596/1964/4/042049
Sellami, S., Dkaki, T., Zarour, N.E., Charrel, P.J.: MidSemI. Int. J. Inf. Syst. Model. Des. 10(2), 1–25 (2019). https://doi.org/10.4018/ijismd.2019040101
Storage and indexing of RDF data. In: Curé, O., Blin, G. (eds.) RDF Database Systems, pp. 105–144. Morgan Kaufmann, Boston (2015). https://doi.org/10.1016/B978-0-12-799957-9.00005-5
Sychev, O.A., Anikin, A., Denisov, M.: Inference engines performance in reasoning tasks for intelligent tutoring systems. In: Gervasi, O., et al. (eds.) ICCSA 2021. LNCS, vol. 12950, pp. 471–482. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86960-1_33
Sychev, O., Penskoy, N., Anikin, A., Denisov, M., Prokudin, A.: Improving comprehension: Intelaligent tutoring system explaining the domain rules when students break them. Educ. Sci. 11(11) (2021). https://doi.org/10.3390/educsci11110719
Wishart, D.S., et al.: DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46(D1), D1074–D1082 (2017). https://doi.org/10.1093/nar/gkx1037
Wylot, M., Hauswirth, M., Cudré-Mauroux, P., Sakr, S.: Rdf data storage and query processing schemes: A survey. ACM Comput. Surv. 51(4) (2018). https://doi.org/10.1145/3177850
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Prokudin, A., Denisov, M., Sychev, O. (2023). Disk Space Consumption by Triple Storage Systems. In: Krouska, A., Troussas, C., Caro, J. (eds) Novel & Intelligent Digital Systems: Proceedings of the 2nd International Conference (NiDS 2022). NiDS 2022. Lecture Notes in Networks and Systems, vol 556. Springer, Cham. https://doi.org/10.1007/978-3-031-17601-2_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-17601-2_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17600-5
Online ISBN: 978-3-031-17601-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)