Abstract
To obtain comparable high query performance with relational databases, diverse database technologies have to be adapted to confront the complexity posed by both Resource Description Framework (RDF) data and SPARQL query. Database caching is one of such technologies that improves the performance of database with reasonable space expense based on the spatial/temporal/semantic locality principle. However, existing caching schemes exploited in RDF stores are found to be dysfunctional for complex query semantics. Although semantic caching approaches work effectively in this case, little work has been done in this area. In this paper, we try to improve SPARQL query performance with semantic caching approaches, i.e., SPARQL algebraic expression tree (AET) based caching and entity caching. Successive queries with multiple identical sub-queries and star-shaped joins can be efficiently evaluated with these two approaches. The approaches are implemented on a two-level-storage structure. The main memory stores the most frequently accessed cache items, and items swapped out are stored on the disk for future possible reuse. Evaluation results on three mainstream RDF benchmarks illustrate the effectiveness and efficiency of our approaches. Comparisons with previous research are also provided.
Similar content being viewed by others
References
Abadi, D.J., Marcus, A., Madden, S., Hollenbach, K.J., 2007. Scalable Semantic Web Data Management Using Vertical Partitioning. 33rd Int. Conf. on Very Large Data Bases, p.411–422.
Bizer, C., Schultz, A., 2009. The Berlin SPARQL Benchmark. Int. J. Semant. Web Inform. Syst., 5(2):1–24. [doi:10.4018/jswis.2009040101]
Broekstra, J., Kampman, A., van Harmelen, F., 2002. Sesame: a generic architecture for storing and querying RDF and RDF schema. LNCS, 2342:54–68. [doi:10.1007/3-540-48005-6_7]
Castillo, R., Leser, U., Rothe, C., 2010. RDFMatView: Indexing RDF Data for SPARQL Queries. Technical Report, Humboldt University, Berlin, Germany.
Chen, L., Rundensteiner, E.A., Wang, S., 2002. XCache: a Semantic Caching System for XML Queries. ACM SIGMOD Int. Conf. on Management of Data, p.618. [doi:10.1145/564691.564771]
Chong, E.I., Das, S., Eadon, G., Srinivasan, J., 2005. An Efficient SQL-Based RDF Querying Scheme. 31st Int. Conf. on Very Large Data Bases, p.1216–1227.
Dar, S., Franklin, M.J., Jónsson, B.T., Srivastava, D., Tan, M., 1996. Semantic Data Caching and Replacement. 22nd Int. Conf. on Very Large Data Bases, p.330–341.
Erling, O., Mikhailov, I., 2007. RDF Support in the Virtuoso DBMS. First Conf. on Social Semantic Web, p.59–68.
Guo, Y., Pan, Z., Heflin, J., 2005. LUBM: a benchmark for OWL knowledge base systems. Web Semant., 3(2–3):158–182. [doi:10.1016/j.websem.2005.06.005]
Harth, A., Umbrich, J., Hogan, A., Decker, S., 2007. YARS2: a federated repository for querying graph structured data from the Web. LNCS, 4825:211–224. [doi:10.1007/978-3-540-76298-0_16]
Klyne, G., Carroll, J.J., 2004. Resource Description Framework (RDF): Concepts and Abstract Syntax. W3C Recommendation. Available from http://www.w3.org/TR/2004/REC-rdf-concepts-20040212/ [Accessed on Jan. 16, 2012].
Li, L., König-Ries, B., Pissinou, N., Makki, K., 2001. Strategies for Semantic Caching. 12th Int. Conf. on Database and Expert Systems Applications, p.284–298. [doi:10. 1007/3-540-44759-8_29]
Martin, M., Unbehauen, J., Auer, S., 2010. Improving the performance of semantic Web applications with SPARQL query caching. LNCS, 6089:304–318. [doi:10.1007/978-3-642-13489-0_21]
Neumann, T., Weikum, G., 2008. RDF-3X: a risc-style engine for RDF. Proc. VLDB Endow., 1(1):647–659.
Owens, A., Seaborne, A., Gibbins, N., Schraefel, M., 2008. Clustered TDB: a Clustered Triple Store for Jena. Available from http://eprints.ecs.soton.ac.uk/16974/1/www2009fixedref.pdf [Accessed on Jan. 16, 2012].
Prud’hommeaux, E., Seaborne, A., 2008. SPARQL Query Language for RDF. W3C Recommendation. Available from http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/ [Accessed on Jan. 16, 2012].
Ren, Q., Dunham, M.H., Kumar, V., 2003. Semantic caching and query processing. IEEE Trans. Knowl. Data Eng., 15(1):192–210. [doi:10.1109/TKDE.2003.1161590]
Ross, K.A., 2009. Cache-Conscious Query Processing. Encyclopedia of Database Systems, p.301–304. [doi:10.1007/978-0-387-39940-9_2151]
Sakr, S., Al-Naymat, G., 2010. Relational processing of RDF queries: a survey. ACM SIGMOD Rec., 38(4):23–28. [doi:10.1145/1815948.1815953]
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C., 2008. SP2Bench: a SPARQL Performance Benchmark. IEEE 25th Int. Conf. on Data Engineering, p.222–233. [doi:10. 1109/ICDE.2009.28]
Wikipedia, 2012. Resource Description Framework. Available from http://en.wikipedia.org/wiki/Resource_Description_Framework [Accessed on Jan. 16, 2012].
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D., 2003. Efficient RDF Storage and Retrieval in Jena2. First Int. Workshop on Semantic Web and Databases, p.131–150.
Recommended reading
Dar, S., Franklin, M.J., Jónsson, B.T., Srivastava, D., Tan, M., 1996. Semantic Data Caching and Replacement. 22nd Int. Conf. on Very Large Data Bases, p.330–341.
Castillo, R., Leser, U., Rothe, C., 2010. RDFMatView: Indexing RDF Data for SPARQL Queries. Technical Report, Humboldt University.
Neumann, T., Weikum, G., 2008. RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow., 1(1):647–659.
Martin, M., Unbehauen, J., Auer, S., 2010. Improving the performance of semantic Web applications with SPARQL query caching. LNCS, 6089:304–318. [doi:10.1007/978-3-642-13489-0_21]
Broekstra, J., Kampman, A., van Harmelen, F., 2002. Sesame: a generic architecture for storing and querying RDF and RDF schema. LNCS, 2342:54–68. [doi:10.1007/3-540-48005-6_7]
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by the National Natural Science Foundation of China (Nos. 60903010, 61025007, and 60933001), the National Basic Research Program (973) of China (No. 2011CB302206), the Natural Science Foundation of Jiangsu Province, China (No. BK2009268), the Fundamental Research Funds for the Central Universities (No. N110404013), and the Key Laboratory of Advanced Information Science and Network Technology of Beijing (No. XDXX1011)
Rights and permissions
About this article
Cite this article
Wu, G., Yang, Md. Improving SPARQL query performance with algebraic expression tree based caching and entity caching. J. Zhejiang Univ. - Sci. C 13, 281–294 (2012). https://doi.org/10.1631/jzus.C1101009
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.C1101009