[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Advances, Systems and Applications

SSF-CDW: achieving scalable, secure, and fast OLAP query for encrypted cloud data warehouse

Abstract

Implementing a cloud-based data warehouse to store sensitive or critical strategic data presents challenges primarily related to the security of the stored information and the exchange of OLAP queries between the cloud server and users. Although encryption is a viable solution for safeguarding outsourced data, applying it to OLAP queries involving multidimensional data, measures, and Multidimensional Expressions (MDX) operations on encrypted data poses difficulties. Existing searchable encryption solutions are inadequate for handling such complex queries, which complicates the use of business intelligence tools that rely on efficient and secure data processing and analysis.This paper proposes a new privacy-preserving cloud data warehouse scheme called SSF-CDW which facilitates a secure and scalable solution for an encrypted cloud data warehouse. Our SSF-CDW improves the OLAP queries accessible only to authorized users who can decrypt the query results with improved query performance compared to traditional OLAP tools. The approach involves utilizing symmetric encryption and Ciphertext Policy Attribute-Based Encryption (CP-ABE) to protect the privacy of the dimension and fact data modeled in Multidimensional OLAP (MOLAP). To support efficient OLAP query execution, we proposed a new data cube retrieval mechanism using a Redis schema which is an in-memory database. This technique dynamically compiles queries by disassembling them down into multiple levels and consolidates the results mapped to the corresponding encrypted data cube. The caching of dimensional and fact data associated with the encrypted cube is also implemented to improve the speed of frequently queried data. Experimental comparisons between our proposed indexed search strategy and other indexing schemes demonstrate that our approach surpasses alternative techniques in terms of search speed for both ad-hoc and repeated OLAP queries, all while preserving the privacy of the query results.

Introduction

Data warehouses (DWs) have become essential systems for facilitating data analysis and supporting business intelligence applications. These repositories gather information from diverse sources, undergo processing through ETL (Extract, Transform, Load) tools, and then load the transformed data into the DW. The DW itself compiles aggregated factual data along with multiple dimensions of strategic information. Typically, the volume of data stored in the DW is extensive. Essentially, OLAP (Online Analytical Processing) tools are introduced to harness data from the data warehouse and big data to aid decision-making. These tools commonly serve to model the structure of the data warehouse. In general, data warehouses can be modeled and implemented using two primary schemes: Relational OLAP (ROLAP) and Multidimensional OLAP (MOLAP). ROLAP stores data in columns and rows (referred to as relational tables) and retrieves information on demand through user-submitted queries. MOLAP, a classical OLAP, supports data analysis by employing a multidimensional data cube. In MOLAP, data undergoes pre-computation, re-summarization, and storage within a data warehouse. MOLAP enables users to explore data from various perspectives through a multidimensional view. However, it comes with the requirement of storage for precomputed cubes.

With the recent shift in the paradigm of system implementation and operation brought about by cloud computing, cloud data warehouse services have been introduced by providers like Azure [1], Amazon [2], Google [3], and Snowflake [4]. Many enterprises have also opted to outsource their data warehouses to the cloud, offering services to their users. Nevertheless, the use of outsourced data and queries over Cloud Data Warehouses (CDWs) presents challenges related to data security and privacy. The primary concern arising from storing data in the cloud is ensuring privacy and security. Enterprises must guarantee the protection of their data warehouse against potential security risks, securing sensitive data from unauthorized access, alteration, or loss through the implementation of various security measures.

Most Cloud Service Providers (CSPs) typically offer standard security measures, such as authentication options to ensure proper access by authorized users. However, given that the cloud is considered “honest-but-curious”, simple access control methods prove inadequate for ensuring privacy-preserving data requirements. Consequently, encryption emerges as a robust solution to guarantee the privacy of data stored in the cloud. It is imperative to encrypt the data before loading it into the CDW. However, OLAP tools encounter difficulties when dealing with encrypted data due to the inherent nature of OLAP queries, which require access to readable multidimensional data based on operations such as roll-up, drill-down, slice, dice, etc., as specified in the queries.

Existing security approaches for Cloud Data Warehouses (CDWs) predominantly focus on traditional encryption solutions for database encryption, such as row-based or column-based encryption, data masking, and access control [5,6,7,8]. However, these techniques may not be suitable for enterprise CDWs with a significant volume of users and OLAP query transactions. Specifically, encrypted data warehouses pose challenges to OLAP queries due to the intricate OLAP schema and queries, as well as aggregations over encrypted data [9]. The issues with current solutions, when implemented in the cloud, can be categorized into three main problems. Firstly, traditional Relational OLAP (ROLAP) or Multidimensional OLAP (MOLAP) systems can only handle decrypted data before joining and returning results. This poses a security risk as plain data is processed without protection during execution in the cloud. Additionally, the decryption and joining operations incur substantial processing time. Secondly, existing solutions often overlook the issue of repeated queries, commonly performed by the same group of users. Each OLAP query typically involves the cost of joining and retrieval, which can be expensive. Thirdly, existing secure data warehouse or OLAP system focus on preserving the privacy of the query made by the users. Basically, users who can access the OLAP system by any means of authentication, they can make any query. This is not practical for the case that the data owners may only want to allow the specific query results can be only viewed by some users. Limiting privilege on dimension data is very difficult. Hence, fine-grained access to query result or materialized view is generally not addressed by any works. Thirdly, encryption schemes utilizing symmetric encryption and public key encryption in cloud computing are not directly applicable to data warehouses. This is because they introduce key management challenges and lack fine-grained access control for users. Some cloud encryption solutions employ homomorphic encryption [10] and order-preserving encryption [11], allowing certain computation or aggregation operations to be performed over the ciphertexts. However, these methods are not suitable for multidimensional data, as they present protection issues and do not support fine-grained access control.

In this paper, we propose a fine-grained and privacy-preserving CDW solution based on the combination of attribute-based encryption integrated with indexing technique for encrypted data cubes. In addition, we employ Redis which is a in-memory, NoSQL key/value store that is used primarily as an application cache for OLAP queries results. The contributions of this paper are summarized as follows:

  • We proposed a fine-grained and privacy-preserving access control scheme with fast OLAP query indexing for cloud data warehouse. The encryption solution is based on ciphertext-policy attribute-based encryption and symmetric encryption where the optimized and secure key distribution is introduced to enable practical deployment of CDW.Our proposed privacy-preserving solution fully support fine-grained access of the authorized user who make a query and only the users having the key containing attributes specified by the access policy can decrypt the encrypted cubes.

  • We proposed a novel storage structure and indexing for encrypted data cube in Redis setting to optimize the search space and increase the speed and efficiency of encrypted data cube retrieval. With the utilization of In-memory database, the frequent OLAP queries are significantly delivered. Also, we designed the system to handle intersection queries to filter the operations and identify the subsets that match the query’s criteria. The result of these intersections is stored in temporary Redis keys, uniquely generated for each set of operations.

  • We conducted the comparative analysis and experiments by using TPC-H datasets to demonstrate the efficiency and practicality of our proposed scheme.

Related work

This section describes the works dedicated to security and access control for big data DW or OLAP system.

In [9] the authors introduced a framework outlining the application of homomorphic encryption schemes to encrypt numeric OLAP measures. The proposed framework also delineates the processing of SUM-based aggregations in analytic queries over the encrypted DW. This method generates encrypted data that is indistinguishable ensuring that encrypted values differ from each other. Moreover it facilitates the execution of various operations such as joins between extensive fact tables and dimension tables data aggregations application of selection constraints data groupings and sorting operations over the encrypted dimensional data stored in the cloud. Furthermore the authors present and elaborate on a system architecture designed for secure processing of encrypted DW.

In [12] the authors proposed an encryption method to secure both the data warehouse and the associated OLAP system. The algorithm supports queries over encrypted DW data hosted in the cloud utilizing encryption tasks based on the statistical properties of the target DW data. The authors conducted experiments to evaluate the performance of OLAP queries over the encrypted DW. However practical implementation may incur significant costs due to the use of several encryption states.

In [13] the authors introduced an effective sensitivity analysis method using approximate query processing to classify documents and limit sensitive information leakage in cloud data warehouses. While this method evaluates approximate query processing it does not offer a privacy-preserving solution or support access control for DW hosted in the cloud.

In [14] the authors proposed the CloudWar system utilizing a homomorphic encryption algorithm for securing and querying a data warehouse in the cloud. Despite converting cell values into perturbation values for homomorphic privacy and introducing a weighted value for answering range queries to reduce complexity the system faces overheads in homomorphic key generations and encryption costs when accessed by a large number of users.

In [15] the authors presented a privacy-preserving OLAP query based on private cell retrieval from a data warehouse and the Paillier cryptosystem. While this scheme allows secure OLAP operations it has limitations on server dependency and communication costs are high with a large number of decryption requests.

In [16] Ahmadian et al. studied sensitivity analysis regarding information leakage through the correlation of multiple documents in a database and cloud data warehouse. They introduced an effective sensitivity analysis method based on approximate query processing to classify documents and selectively provide disinformation to limit information leakage.

In [17] the authors proposed an approach for securing a Data Warehouse (DW) and its associated OLAP system by employing a distinct encryption technique. The suggested encryption algorithm enables the querying of DW data that has been encrypted using this method. The complexity of the algorithm lies in its multifaceted nature as it undertakes various encryption tasks tailored to the statistical characteristics of the target DW data. The authors further validate the performance of the proposed OLAP system through multiple performance tests specifically focusing on query processing efficiency.

To support OLAP privacy-preserving with the concern of query workload, Cuzzocrea [18, 19] proposed a series of sound theoretical findings within the realm of upper-bounds concerning both query and inference errors. These pertain to queries and query workloads that are to be assessed against a privacy-preserving OLAP data cube based on the query workload. Later, the author also proposed an OLAP cube compression algorithm for column-oriented Cloud/Edge data infrastructures, enabling queries to be performed on edge or mobile devices [20].

Recently, S. Fugkeaw et al. [21] introduced a privacy-preserving access control [22] and searchable encryption technique [23] for cloud data warehouse. For privacy-preserving access control approach, their core solution is based on the combination of symmetric and attribute-based encryption with the proposed B+Tree indexing.

As outlined in Table 1, all the schemes under consideration prioritize privacy preservation for OLAP data warehousing at various levels. Notably, our scheme, [9, 12, 15, 21] employ encryption techniques to comprehensively secure OLAP queries, whereas [12, 16] utilize selective disclosure and data perturbation respectively, applied to aggregated queries. It’s worth mentioning that only our scheme, [12, 15] are implemented on cloud infrastructure.

Table 1 Feature comparison of privacy-preserving DW approaches

In terms of the newly introduced indexing method for accessing data warehouses or queries, only our scheme, [16, 21] incorporate an indexing technique to facilitate the accessibility of data or queries. Additionally, among the schemes, only ours supports query result caching enhancing the efficiency of frequently accessed queries.

Recently, there are a few works [17, 20, 21] focused on applying partial encryption over the aggregation operation for big data or multidimensional data. Also, some works [21, 24,25,26,27] specifically support the privacy preserving of multidimensional data query in a specific context. For example, Olawoyin et al. [26] presented an “integrator” model for big data that incorporates both spatial and temporal elements using a bottom-up aggregation approach. Their investigation focused on the application of generative adversarial network (GAN) models for privacy preservation. In [21], Cai et al. introduced a correlated data trading framework for high-dimensional private data based on perturbation mechanism by solving the optimal attribute clustering (OAC) problem to enhance the utility of traded data.

However, these papers only focused on the context of saptio-temporal data or high dimensional data and did not support privacy-preserving OLAP query with fine-grained access control to data warehouse.

Background

This section describes the background of materialized views, bilinear maps, and access trees used in our system model.

Materialized views in data warehouse

In a data warehouse, a materialized view (MV) is a pre-computed view result comprising aggregated and/or joined data from fact and possibly dimension tables. In MOLAP, a DW is modeled in the multidimensional space where multiple dimensions are formed and associated with the measure attribute. The precomputed view can be calculated from the possible aggregation operations of the dimensions and measured in a cube.

Multidimensional space

Let \(\Omega\) be the space of all dimensions. For each dimension \(D_i\), there exists a set of levels denoted as levels(\(D_i\)). A dimension is a lattice (H \(\prec\)) of levels. Each path in the lattice of a dimension hierarchy, beginning from its least upper bound and ending in its greatest lower bound, is called a dimension path. For example, the dimension path [day \(\prec\) week \(\prec\) month \(\prec\) year].

Dimensional level space

Let \(\Psi\) be the space of all dimension levels. We can find the dimension where a dimension level (DL) belongs to through the operator h: \(h(DL_i) = D\) if \(DL_i \in\) levels(D). For each dimension level, there is a set of values belonging to it (e.g., dimension level “city” has “Bangkok”, “Tokyo”, “London”, “New York” as values). We define dom(\(DL_i\)) as the set of all the values of a dimension level \(DL_i\).

Base cube

A base cube \(C_b\) as a 3-tuple \(<D, L, R>\) where:

  • \(D = <D_1, D_2, \ldots , D_n, M>\) is a list of dimensions (\(D_i, M \in \Omega\)). M is a measure of the cube.

  • \(L = <DL_1, DL_2, \ldots , DL_n, *ML>\) is a list of dimension levels (\(DL_i, *ML \in \Psi\)). ML is the dimension level of the measure of the cube.

  • R is a set of cell data formed as a tuple \(x = (x_1, x_2, \ldots , x_n, *m)\) where \(i \in [1, n]\), \(x_i \in\) dom(\(DL_i\)) and \(*m \in\) dom(\(*ML\)).

In our model, we assume that materialized view represents all possible views of the base cube c. Each view is computed from the set of aggregation operations including sum, avg, count, max, min, rank(n). Each one of the operations results in a new cube \(c'\) or a materialized view (MV).

Bilinear maps and access policy

Let \(G_0\) and \(G_1\) be two multiplicative cyclic groups of prime order p. Let g be a generator of \(G_0\) and e be a bilinear map \(e: G_0 \times G_0 \rightarrow G_1\). The bilinear map e has the following properties:

  • Bilinearity: \(\forall u, v \in G_0\) and \(a, b \in \mathbb {Z}_p\), \(e(u^a, v^b) = e(u, v)^{ab} = e(u^b, v^a)\)

  • Non-degeneracy: \(e(g, g) \ne 1\)

  • Computability: \(\forall u, v \in G_0\) an efficient computation of e(uv) exists.

Access structure

Let a set \(\{P_1, P_2, \ldots , P_n\}\) be given attributes. A collection \(A \subset 2^{\{P_1, P_2, \ldots , P_n\}}\) is monotone if \(\forall B, C\): if \(B \in A\) and \(B \subset C \rightarrow C.An\) access structure is respectively a monotone collection A of non-empty subsets of \(\{P_1, P_2, \ldots , P_n\}\) i.e., \(A \subset 2^{\{P_1, P_2, \ldots , P_n\}} \setminus \{\emptyset \}\).

Access tree T

Let T be a tree representing an access structure. Each non-leaf node of the tree represents a threshold gate described by its children and a threshold value. If \(num_x\) is the number of children of a node x and \(k_x\) is its threshold value, then \(0 < k_x \le num_x\). When \(k_x = 1\) the threshold gate is an OR gate, and when \(k_x = num_x\) it is an AND gate. Each leaf node x of the tree is described by an attribute and a threshold value \(k_x = 1\). If the k-of-n gate is allowed in T, in this case, \(k_x = k\) where k is the threshold value determined in the k-of-n gate. In our scheme, the access tree T is called the access control policy (ACP).

Overview of Redis

Redis is an open-source, in-memory data structure store that can be used as a database, cache, and message broker. It is known for its fast performance because it stores data in RAM (Random Access Memory) and allows quick access to the data.

In Redis, keys and values are encoded into Redis Objects, often referred to as “robj” (Redis Object). The robj structure is a fundamental part of how Redis handles and represents data internally. Here’s a simplified example of how a Redis Object (robj) might be defined in C code:

figure a

Basically, the key is consistently represented as a string object. To fulfill the functional needs of a key-value (KV) store application, Redis accommodates diverse data models for the “Value” component. For instance, the “String” model represents the value as a series of characters indexed by a Hash Table alongside the key. Another model is the “Zset”, where the value constitutes non-repeating ordered collections of strings sorted based on a specified score. This versatility in data models allows Redis to effectively meet the requirements of various KV store applications.

In this paper, we applied Redis to store metadata of all MVs or precomputed cubes, and it is used to support the model of our proposed indexed search service where dimension level and their values and cube operations are modeled to efficiently compute and specify the target cube.

Our proposed scheme

System overview

This section presents the system overview and algorithmic process of our proposed SEC-CDW scheme and provides the details of its system components. Figure 1 represents our proposed system model.

Fig. 1
figure 1

SEC-CDW system model

There are the following entities constituted in our system:

  • Data sources: refer to multiple sources of data that are heterogeneous in their formats, volume, and locations.

  • ETL tool: is a system responsible for normalizing the data by extracting data from sources, transforming the multiple data formats into a common schema able to be processed by the data warehouse and OLAP tool, and loading the data to be stored in the warehouse.

  • Private Cloud: stores the first stage of pre-computed views after the ETL process. In our system, we assume that the private cloud is an isolated environment where the access control is accessible by only one tenant or organization. We also locate the encryption service in the private cloud to support data encryption before it is sent to the public cloud for supporting OLAP query to the data users.

  • Public Cloud: stores encrypted cubes which connect to the OLAP interface and blockchain where the query and access control are performed respectively.

  • Data Users: are the entities authorized to access and make a query over the data warehouse. Each user is assigned a decryption key to decrypt the encrypted query or data cube.

  • Query Engine: is the central hub of the system that interfaces directly with the user’s queries. It interprets the parameters of each query to determine what data is needed. The Query Engine then orchestrates the entire data retrieval process by communicating with the Indexed Search Service to find the relevant MOLAP cube identifiers based on the query’s dimensionality. After the search process is complete, it retrieves the encrypted cube data from the public cloud and provides the query results to the users.

  • Indexed Search Service: This service is designed for performance and leverages the speed of indexing to rapidly pinpoint the required data within the MOLAP cubes. When a query is received, this service determines which metadata attributes are relevant and utilizes the indexing system to locate these attributes efficiently. It manages complex queries that involve multiple dimensions and measures by coordinating with the metadata stored in Redis and MongoDB.

  • Metadata Search (Redis): Redis is specifically chosen for this component due to its ability to handle high-speed read operations and in-memory data storage, which is essential for quick metadata access. In the context of OLAP operations, Redis excels at performing SET operations and calculating intersections, which are crucial when retrieving common elements across different sets of cube dimensions. Redis enables the Indexed Search Service to quickly filter and retrieve cube identifiers that match the query’s dimensionality criteria.

  • Metadata Document (MongoDB): MongoDB complements Redis by providing a durable and scalable storage solution for the full documentation of metadata. While Redis is used for its performance, MongoDB offers a robust and flexible structure for storing more complex and detailed metadata documents. It can handle a variety of data types and complex queries, which makes it suitable for the structure and content of the MOLAP cubes.

Our proposed indexed search service

We designed and developed an indexing service to quickly compile and map the user query to calculate the results before searching through the corresponding encrypted cube ID stored in the public cloud. The focus on implementing indexed searching suggests an effort to optimize the speed and efficiency of data retrieval operations.

There are five major functions of this service including OLAP query reception, OLAP query translation, Redis query preparation, Redis intersection operations, and cube result generation. The workflow within the Indexed Search Service for processing OLAP queries can be summarized in the following steps:

  1. 1.

    OLAP Query Reception: The system receives an OLAP query in a structured language such as MDX, which outlines the data analysis requirements including dimensions, dimension levels, measures, and other criteria.

  2. 2.

    OLAP Query Translation: This query is then translated into a backend-friendly JSON format, which represents all the specified analytical parameters clearly. Firstly, we extract and assign dimensions, levels, conditions, and measures from the MDX query to the JSON object. Then it returns the structured JSON representation of the MDX query.

  3. 3.

    Redis Query Preparation: Utilizing the translated query, the system generates specific Redis commands and keys for the specified dimensions and measures, which are essential for data retrieval from Redis. The algorithm below shows how the Redis query is generated.

    figure b

    Algorithm 1 PrepareRedisCommands Function

    Algorithm 1 starts with initializing a new list for Redis commands. Then, it iterates over dimensions and levels in the query, generating and storing Redis keys and intermediate results. Then, it processes each measure and creating commands for data aggregation. Finally, it returns the list of Redis commands.

  4. 4.

    Redis Intersection Operations: The system executes intersection operations in Redis using a pipeline for batch processing. These operations filter the data to find matches for the query’s criteria and store the results in unique temporary Redis keys. It executes a series of Redis commands using a pipeline. This approach minimizes network latency and improves performance by executing multiple commands in a batch.

    figure c

    Algorithm 2 ExecuteRedisPipeline Function

    Algorithm 2 starts a Redis pipeline for batch command execution. It adds each command from the list to the pipeline and executes the pipeline and retrieves the final result. After that, it cleans up temporary keys used in the process and returns the final result related to the given MDX query.

  5. 5.

    Cube Result Compilation: The system gathers the results from the Redis intersections, extracts the cube ID from the temporary key. This ID corresponds to the data points that meet the user’s query specifications. The corresponding cube is then mapped to the encrypted cube ID and then it is returned to the user as the query result. The process to retrieve the resulting cube is shown in Algorithm 3.

    figure d

    Algorithm 3 RetrieveCubeByDimensions

Redis SET

We used Redis to store key and value regarding the set of dimensions and measures queried by users. Figure 2 illustrates how Redis stores dimensional data.

Fig. 2
figure 2

Redis data structure

The diagram provides a visual representation of how Redis uses the SET data structure to manage metadata, employing a key-value pair system that is fundamental to its design. The keys are crafted to reflect a hierarchical multi-dimensional data model with dimensions, levels, and specific values or ranges, which facilitates structured data retrieval. Correspondingly, the values are not mere single data points but are themselves sets of cube IDs which act as references to the actual data stored within the database. This dual-layer structure, where keys map to sets of identifiers rather than individual entries, enables complex queries and data aggregation in an efficient manner.

Key Composition:

  • The base key encapsulates the dimension: "dimension:$DIMENSION"

  • Extending this, the level is added: "dimension:$DIMENSION:level:$LEVEL".

  • For a fixed value within a level: "dimension:$DIMENSION:level:$LEVEL: $VALUE"

  • For a range of values within a level: "dimension:$DIMENSION:level:$LEVEL: range:$VALUE1-$VALUE2"

Value Types:

  • fix: Refers to a specific integer or string value such as "1", "2", "LA", or "Thailand"

  • range: Denotes a span of integers e.g., "0-100" or "101-200"

The following are exemplifications of how keys may represent data within Redis:

  • Year within a Date dimension: "dimension:Date:level:Year:2001"

  • City within a Region dimension: "dimension:Region:level:City:LA"

  • Price range within a Product dimension: "dimension:Product:level:Price:5001-10000"

Based on the structure defined, we propose the method of how dimensional data and value are stored in Redis as exemplified by Fig. 3.

Fig. 3
figure 3

Hierarchical organization of MOLAP data in redis

Figure 3 illustrates the example of the hierarchical organization of MOLAP data within a Redis database, which is crucial for understanding data storage and retrieval operations. For example, there are three primary Dimensions: Date, Product, and Region, each representing a different axis for data analysis. These dimensions are further divided into Levels such as Year, Quarter, and Month for Date; Code, Type, and Price for Product; and Country, State, and City for Region. At each level, specific Values are associated with sets of Cube IDs. These sets act as identifiers for the data cubes that contain the actual data points. Hence, the search space is reduced by each level of dimension schema. The results of the final level specifying the instance of each dimensional level are then calculated for their intersection to yield the resulting cube.

Cryptographic model

In this section, we describe the cryptographic construct of our proposed model. Basically, the cryptographic model of our system is based on symmetric encryption AES-256 and Ciphertext-Attribute-Based Encryption (CP-ABE). There are four phases: Setup, Keygen, Encryption, and Decryption.

To ease the description of our proposed cryptographic algorithms, we present the notations used in our scheme in Table 2.

Table 2 Notation

Setup phase

CreateAuthenticatedAuthority(AA) \(\rightarrow\) PK_k, MK_k.

The setup algorithm considers security or system parameters and returns the public key PK_k and master key MK_k.

Keygen

There are two key types: symmetric key (AES Key) and user secret key (CP-ABE key) used in our system and each key type is generated through systemKeygen and duAttributeKeygen respectively.

  • systemKeygen(keyGen) \(\rightarrow\) SymKey This algorithm is run by the data owner by taking keyGen as an input where keyGen = CSPRNG.selectRandomKey() and keySize = 256. It returns the SymKey for AES-256. Then the SymKey is encrypted with the CP-ABE method.

  • duAttributeKeygen(PK_k, MK_k, SA) \(\rightarrow\) SK_uid This algorithm is run by the AA. It takes as input PK, MK, and SA. The SK of the DU is created by the algorithm using a randomly selected r \(\in\) Z_p and each attribute j \(\in\) S will be represented by randomly selecting r \(\in\) Z_p resulting in the following:

    $$\begin{aligned} SK\_R = (D = g^{(\alpha + r) / \beta _{j \in S}}, D_j = g^r.H(j)^r_j, D'_j = g^r_j) \end{aligned}$$

    The AA then sends the SK_uid to the DUs.

Encryption

We perform dual encryption based on symmetric encryption AES-256 and CP-ABE which are done by our encryption service located in the private cloud. The details of the encryption step are presented as follows.

MVs Encryption

The algorithm takes a symmetric key SymKey and all cubes as inputs and it outputs the encrypted cube Enc_Cube_uids.

$$\begin{aligned} \mathtt {Enc(Cubes, SymKey)}\ \rightarrow \ \mathtt {Enc\_Cube\_uids} \end{aligned}$$

Then the encrypted cubes are sent to be stored in the public cloud storage.

symKey Encryption

This algorithm takes AA’s public key PK_k, the symmetric key SymKey, and access control policy ACP as inputs. Then it outputs the ciphertext of the encrypted symmetric key CT_k.

$$\begin{aligned} \mathtt {Enc-CP-ABE(PK\_k, SymKey, ACP)}\ \rightarrow \ \mathtt {CT\_k} \end{aligned}$$

Then CT_k is forwarded to store in the public cloud where the authorized data user can download it when they make a request to access data via query.

Decryption

In this phase, after DU makes a query, the operations related to the query engine and indexed search services are performed as described in the previous section. After the Redis operations fully execute all functions and get the cube ID, the resulting cube will be checked with the corresponding address of the Enc_Cube_id before it is sent to the DU as an encrypted query result. Then DU performs decryption as follows.

SymKey Decryption

DU takes a secret key to decrypt the CT_k through the following decryption function.

$$\begin{aligned} \mathtt {Dec-CP-ABE(SK\_uid, CT\_k)}\ \rightarrow \ \texttt{SymKey} \end{aligned}$$

Then the SymKey is obtained for use in the final decryption.

Symmetric Decryption

This algorithm is run by the DU. It takes a symmetric key SymKey to decrypt the encrypted cube Enc_Cube_id and outputs the Cube_id. The decryption function is defined as:

$$\begin{aligned} \mathtt {Dec(SymKey, Enc\_Cube\_id)}\ \rightarrow \ \mathtt {Cube\_id} \end{aligned}$$

Then the resulting Cube_id, which is an OLAP query result, is obtained.

Security analysis

This section discusses the security model and security properties of our proposed system.

Security model

In our model, we assume that the Attribute Authority (AA) is a trusted entity, while private cloud storage is only accessible by the data owner. Conversely, the public cloud is considered semi-trusted. Within our system, all pre-computed cubes are encrypted using AES encryption and stored on the public cloud. To maintain the confidentiality of the AES key, it is encrypted with the CP-ABE method and stored in the cloud. Only authorized users possessing a secret key issued by the AA can decrypt the encrypted AES key and access the query results.

The security model of our scheme is defined through a game-based approach, focusing on compromising the CP-ABE key to gain access to the encrypted symmetric key. The interaction between an adversary (A) and a challenger (C) is outlined as follows:

Setup: For uncorrupted authorities (AA), the challenger (C) runs the CreateAttributeAuthority algorithm and sends the public keys (PK_k) to the adversary (A). For corrupted authorities (AA’), the challenger sends both the public key (PK_k) and secret key (SK) to the adversary (A).

Phase 1: The adversary (A) provides SK, a set of attributes issued by an uncorrupted authority (AAk). The challenger (C) then provides the secret key (SK) to the adversary (A).

Challenge: The adversary (A) sends two challenge messages, \(m_1\) and \(m_2\), to the simulator. The simulator flips a fair binary coin \(\nu\) and returns an encryption of \(m_\nu\). In this game, the ciphertext (CT_k), representing the symmetric key encrypted by the CP-ABE method, is computed as follows:

$$\begin{aligned} CT\_k & = \{ T, \hat{C} = m_\nu z, CT\_k = hs, \\ & \forall y \in Y : Cy = g^{qy(0)}, \hat{C}_y = H(att(y))^{qy(0)} \} \end{aligned}$$

where \(\gamma\) is a chosen set of attributes. If \(\mu = 0\) then \(z = e(g, g)^{\alpha s}\). Thus, the ciphertext CT_k is a valid random encryption of message \(m_\nu\). Otherwise, if \(\mu = 1\), then \(z = e(g, g)^z\). Consequently, \(\hat{C} = m_\nu e(g, g)^z\). Since z is random, \(\hat{C}\) appears as a random element of \(G_1\) from the adversary’s perspective, revealing no information about \(m_\nu\).

Phase 2: The simulator repeats the actions of Phase 1.

Guess: The adversary (A) attempts to guess \(\nu '\) of \(\nu\). The advantage of A in this game is defined as:

$$\begin{aligned} \text {ADV}_A = \Pr [\nu = \nu '] - \frac{1}{2} \end{aligned}$$

Definition 3: Our proposed scheme is secure if all polynomial-time adversaries have at most a negligible advantage in the above game.

Theorem 1: Assuming no polynomial-time adversary can break the security of CP-ABE with non-negligible advantage, it follows that no polynomial-time adversary can compromise our cryptosystem with non-negligible advantage.

Proof: We demonstrate that if adversary A has a non-negligible advantage against our scheme, then a similar adversary B can be constructed to break the CP-ABE scheme with non-negligible advantage. Adversary B can engage in a similar game with the CP-ABE scheme, making private queries to obtain private keys during the game.

Initialization: Adversary B takes the public key (PK_k) of authority k, \(PK\_k' = \{G_0, g, h = g^\beta , f = g^{1/\beta }, e(g, g)^\alpha \}\), with the corresponding secret key \((\beta , g^\alpha )\) unknown to the adversary.

Setup: Adversary B obtains the public parameters from \(PK\_k'\) as \(PK\_k = \{G_0, g, h = g^\beta , f = g^{1/\beta }, e(g, g)^\alpha \}\) and sends the public key (PK_k) to the adversary.

Phase 1: B addresses private key queries. Suppose the adversary is given a secret key query for a set of attributes (S) that does not satisfy T. Here, B makes a query for obtaining SK for the same set S twice. B then obtains two different SKs as follows:

$$\begin{aligned} SK_k = \{ D=g^{(\alpha _k+r)/\beta _k}, \forall i \in S : D_i=g^r \cdot H(i)^{r_i}, D'_i=g^{r'_i} \} \end{aligned}$$
$$\begin{aligned} SK'_k = \{ D=g^{(\alpha _k+r')/\beta _k}, \forall i \in S : D_i=g^{r'} \cdot H(i)^{r'_i}, D'_i=g^{r^{\prime \prime }_i} \} \end{aligned}$$

where i’s are attributes from S, and \(r, r', r_i, r'_i\) are random numbers in \(Z_p\). With \(SK_k\) and \(SK'_k\), B can derive \(g^{(r-r')/\beta }\), choosing random numbers \(t_i, t_{i,j} \in Z_p\). Let \(r^* = t_i - r_i\) and \(r^{\prime \prime } = t_{i,j} - r'_i\). Then B derives the SK requested by A as:

$$\begin{aligned} SK^* = \{ D=g^{(\alpha _k+r')/\beta _k}, \forall i \in S : D_i=g^{r^*} \cdot H(i)^{r^{\prime \prime }_i}, D'_i=g^{r^{\prime \prime }_i} \} \end{aligned}$$

Then, the SK is returned to adversary A.

Challenge: When A concludes Phase 1, it outputs an access policy (T) and two messages (\(m_1\) and \(m_2\)) for the challenge. B passes these messages to the challenger and receives the challenge ciphertext (CT_k). B then computes the challenge ciphertext for A from CT_k as \(CT^*K\), which is returned to adversary A.

Phase 2: A issues queries not addressed in Phase 1, and B responds as in Phase 1.

Guess: Finally, A guesses \(\nu ' \in \{1, 0\}\), and B concludes its game by generating \(\nu '\). According to the above security model, the advantage of adversary B is:

$$\begin{aligned} \text {ADV}_A = \Pr [\nu = \nu '] - \frac{1}{2} = \text {ADV}_B \end{aligned}$$

Thus, B has a non-negligible advantage against the CP-ABE, completing the proof of the theorem.

Experimental evaluation

This section presents the computation cost analysis of our proposed SSF-CDE scheme alongside similar approaches referenced as [14,15,16], which provide data encryption solutions in data warehousing. Furthermore, we conducted comparative experiments to measure the efficiency of encryption, decryption, and OLAP queries of our schemes and related works.

Computational cost analysis

This section examines the computational expenses associated with encryption, decryption, and querying/searching the encrypted query results within a data warehouse. Table 3 presents a comparative analysis of the computational costs between our method and similar studies. To clarify the representation of the computational costs for each approach, the following notations are utilized.

  • G0: Exponential operation in group G0

  • G1: Exponential operation in group G1

  • E: Bilinear pairing operation

  • |AP|: Number of attributes in access policy

  • |UA|: Number of attributes in user secret key

  • AESEnc1: AES encryption operation of 128 bits

  • AESEnc2: AES encryption operation of 256 bits

  • AESDec1: AES decryption operation of 128 bits

  • AESDec2: AES decryption operation of 256 bits

  • PCREnc: Paillier cryptosystem encrypted operation

  • PCRDec: Paillier cryptosystem decrypted operation

  • Gm: Multiple Arithmetic operation in group G0

  • XOR: XOR operation in 128 bits

  • |ll|: Concatenation operation in 128 bits

  • |DC|: Number of dimensions in cube

  • |DL|: Number of dimension levels for each DC

  • |NC|: Number of generated cubes

  • |NC2|: Square root of multiple operation of data collections containing number of correlations

Table 3 Comparison of computational cost

Our approach implements a 2-step encryption procedure employing both the AES and CP-ABE algorithms to encrypt the materialized view and the symmetric key respectively. Given the size of the view or cube, the utilization of the AES algorithm ensures rapid data encryption. Subsequently, the CP-ABE method encrypts the 256-AES key. Concerning decryption, the computational expense is subject to the number of attributes in the policy and the number of attributes within the user secret key in addition to the exponential operation of the prime order group G1 and bilinear pairing operation. All encrypted materialized views or cubes are indexed in Redis where the search operation on encrypted cubes incurs a scan cost over dimensions and dimension levels. The lookup cost (O(1)) is executed through the intersection of the values of all dimension levels associated with the cubes. In essence, our search cost does not involve the prime order group as in other approaches resulting in our scheme having the shortest query time compared to related works.

In scheme [14], the encryption phase involves multiple XOR operations necessary for performing AES encryption of 128 bits within the exponential operation of G0 and the concatenation operation of the prime order group G1. Conversely, the decryption phase incurs a lower computational cost compared to the encryption phase since it does not require multiple XOR operations and certain operations are constructed when conducting searches for pairs of values. However, the query cost is notably high due to multiple searches for key pair values in N documents which consume NC2 operations and necessitate repeated searches for other sub-key pairs across two different documents. Additionally, the security of 126-bit AES encryption is considered insecure.

In scheme [15], the encryption phase employs a homomorphic function that involves arithmetic operations within the exponential operation of the prime order group G1. Regarding decryption, it incurs a higher cost compared to ours due to the cryptographic construct of the Chinese remainder theorem and the exponential operation in group G0 of G1. The search operation method on encrypted cubes in this scheme is quite intensive involving several arithmetic operations and result comparisons.

In scheme [16], both encryption and decryption processes are conducted using arithmetic operations within the group of G1 primarily utilizing the Paillier cryptosystem. This results in a higher computational cost compared to AES-256 encryption. Furthermore, the query method in this scheme requires the generation and retrieval of responses for specific encrypted cubes.

Performance analysis

We did the experiments to measure the processing time of encryption and decryption of our scheme and related works including [14,15,16]. Furthermore, we conducted the experiments to measure the OLAP query performance and throughput of our proposed scheme and Tiny OLAP opensource [28].

The implementation is done via Python’s Cryptography and we used Java-Pairing based Cryptography [29] and the Advanced Crypto Software Collection [30, 31] to simulate the cryptographic operations of our scheme. For scheme [16], we used pycryptodomex [32] and pycryptodome [33] libraries and used numpy with sympy [34] libraries for partially homomorphic encryption (PHE) for scheme [14, 15] respectively. We also use more libraries to support the dataset of TPC-H benchmarks to generate dataset in different size on account of various scale factors such as [35,36,37,38,39,40,41,42].

The experiments were done on an Intel(R) Xeon(R) E-2336 CPU @ 2.9GHz and 16 GB of RAM server that is running on the Ubuntu 20.04 Operating System. As the library and module are on the Ubuntu Server, we used python code to implement all functions. We provide the source code of our proposed system at [43].

Encryption and decryption performance

To measure the encryption and decryption performance, we compare the time used to encrypt and decrypt the cube or dimension and measure data of our scheme, scheme [14], scheme [15], and scheme [16]. The computation time was then measured by varying the number of generated values per cube/report while maintaining a constant data size of 250 KB for all executions of all implemented algorithms. Figures 4 and 5 depict the overall encryption and decryption costs for all schemes.

Fig. 4
figure 4

Total encryption cost

Fig. 5
figure 5

Total decryption cost

In the analysis presented in Fig. 4, our methodology demonstrates a consistent and low encryption cost across various increments of generated value per cube/report, distinguishing itself from alternative approaches. This efficiency is achieved through the implementation of AES encryption for all the constant size of MVs and the employment of CP-ABE for the encryption of the symmetric key. In contrast, the encryption process in Scheme [14] relies on a homomorphic technique where the computational expense is influenced by the constant data size, resulting in costs approximately triple those associated with our method. Scheme [16] incorporates an AES128bits algorithm with a fixed deterministic function, leading to higher expenses than both our scheme and Scheme [14]. Among the evaluated schemes, scheme [15] incurs the highest encryption costs due to its utilization of Partially Homomorphic Encryption (PHE), marking it as the most resource-intensive option.

Figure 5 demonstrates the comparison in computational time of each scheme compared to ours, which includes both decryption cost and searching cost over encrypted MVs. Each scheme employs distinct decryption algorithms and encrypted MV retrieval strategies impacting overall performance. Our analysis reveals that despite a higher computational overhead at initial stages-attributed to lower generated value volumes-our approach maintains a constant decryption time. This efficiency becomes more pronounced with larger generated value quantities as our optimized search mechanism leveraging a Redis memory-based data retrieval significantly reduces processing times compared to alternatives. Scheme [16] demonstrates the lowest initial decryption and search costs facilitated by an AES 128 bits with a deterministic function. Yet as value volumes escalate, its processing time increases markedly, a consequence of multiple encrypted data searches categorized into four distinct types. Scheme [14] experiences decryption costs driven by the operation of CRT and homomorphic decryption techniques alongside comparative search operation. Scheme [15] incurs higher computational demands due to its reliance on the Paillier cryptosystem and a complex request-response algorithm for data retrieval. Compared to Schemes [14, 15], our method exhibits a superior balance of decryption efficiency and search cost optimization, underscoring its effectiveness for handling large-scale encrypted multi-dimensional data processing.

OLAP query performance

In this part, we conducted two comparison for the query performance. One was about a query performance comparison of Redis and B+Tree and another experiment was about the query performance comparison of Redis and Non-Redis. Each of the comparison has different measurements such as the number of data size in GB and number of queries. To generate and utilize the TPC-H dataset as a multi-dimensional data cube for performance evaluation, we started by generating the dataset using the dbgen tool with a specific scale factor (e.g., ‘./dbgen -s 1‘ for 1GB). The generated files were moved to a designated directory (‘mv -f *.tbl ./tpch-dbgen/‘). The dataset includes tables such as customer, orders, lineitem, supplier, part, partsupp, nation, and region, which were loaded into pandas DataFrames for preprocessing. These tables were then structured into a multi-dimensional data cube using TOLAP, focusing on key dimensions like customer, product, time, and region. For efficient storage and retrieval, each table was serialized and compressed using ‘zlib.compress(json.dumps(data).encode(’utf-8’))‘ and stored in Redis. Additionally, data was stored in a B+Tree structure for comparative analysis. Queries over these data cubes involve complex filtering, aggregation, and analytical operations, executed using Redis pipelining and in-memory storage for rapid data processing. This process showcases the efficiency and scalability of the multi-dimensional data cube in a cloud data warehouse environment, ensuring high performance for extensive data analysis and reporting needs.

Query Performance Comparison of Redis and B+Tree

In this section, we also compared our query performance via Redis and TinyOLAP [28] integrated with B+Tree for fast data retrieval. The details of performance can be illustrated in Fig. 6.

From Fig. 6, we can see that at the initial stage Redis may take higher computational cost in terms of querying time in milliseconds. However, when the size of the dataset TPC-H has increased from 32GB upward, the performance of Redis is gradually better than B+Tree alone. The graph precisely shows that at the point where the size of dataset reaches 64GB, Redis can perform a query in just 17638.24 milliseconds. Whereas the B+Tree takes 17806.264 milliseconds to complete the whole query process. Therefore, Redis can support faster data retrieval in terms of multiple data types than B+Tree, which is only capable of range and hierarchical search.

Fig. 6
figure 6

Query performance between redis and B+tree

Query Performance Comparison of Redis and Non-Redis

In this section, we also compared our query performance via Redis and non-Redis for fast query time as the x-axis and the various number of queries as the y-axis measurement. This experiment was conducted with a fixed data size of 500MB for the TPC-H dataset, varying the number of queries from 1 to 5,000 units. The details of performance can be illustrated in Fig. 7.

The outcomes as depicted in Fig. 7 demonstrate that our proposed framework exhibits efficient query performance compared to other existing works that do not integrate Redis. Redis supports fast data retrieval through its memory-based database capabilities, particularly with the TPC-H dataset. In the initial stages, our Redis-integrated system showed slower query performance compared to systems without Redis, due to the time required for data processing and retrieval in cases of repeated queries or memory capacity constraints. However, when the number of queries reached 50 or more, the performance of Redis began to surpass that of non-Redis systems. For example, with 1,000 queries, the Redis system took 1,188.21 ms to query the TPC-H dataset, whereas the non-Redis system took approximately 2,131.81 ms to perform the same tasks. Therefore, our system demonstrates superior performance under varying query loads with a constant data size, achieving comprehensive output even as the number of queries increases.

Fig. 7
figure 7

Number of queries

Data access (decryption) throughput

In our experimental analysis, we evaluated the decryption throughput of our schemes to assess its capacity for handling data access transactions. Throughput measurements were based on the generation of concurrent multi-thread requests to facilitate data access. The search operations are performed by Redis. For these tests, we standardized the cube/report size at 250KB alongside a 5-attribute policy and a 5-attribute secret key. We escalated the volume of decryption requests to a ceiling of 100,000, documenting the throughput as illustrated in Fig. 8.

Fig. 8
figure 8

Number of decryption requests

The outcomes as depicted in Fig. 8 demonstrate that our proposed framework achieved a peak throughput of 820, accommodating up to 1,000-3,000 concurrent decryption requests per second. The system exhibited robust support for request volumes ranging from 1,000 to 15,000 before experiencing a significant drop-off beyond the 15,000-request threshold. Notably, throughput performance is inherently linked to the hardware specifications of the transactional platform. In a real-world cloud environment, our scheme is projected to deliver enhanced throughput owing to the dynamic resource provisioning, superior computational capabilities, and load balancing inherent to cloud computing infrastructures. The integration of an optimized search memory-based data retrieval via Redis and CP-ABE decryption for small-sized symmetric keys underpins the scheme’s high throughput and efficient computational resource utilization.

Conclusion

We have introduced a sophisticated access control system that maintains privacy while allowing fine-grained access in cloud data warehousing. Our approach incorporates rapid OLAP query indexing for efficient performance. Encryption methods include ciphertext-policy attribute-based encryption and symmetric encryption complemented by an optimized and secure key distribution mechanism for practical deployment in CDW environments. Furthermore, we have devised a unique storage structure and indexing strategy tailored for encrypted data cubes within the Redis framework. This streamlines the search process, enhancing the speed and efficiency of encrypted data cube retrieval. Additionally, our system is equipped to handle intersection queries, enabling the filtering of operations and identification of subsets aligning with the query’s parameters. The results of these intersections are stored in temporary Redis keys, each generated uniquely for every set of operations, thus enabling high query performance. Finally, we conducted experiments to substantiate that our proposed scheme achieves feasible cryptographic operations and query performance improvement compared to related works.

For future works, we will consider the following major aspects. First, we will investigate the interplay between in-memory database optimization and the strategy to distribute query execution across multiple CPU cores or nodes. This involves exploring advanced parallel processing techniques that can efficiently handle large datasets and complex queries. Also, caching optimization techniques such as configuring Redis to evict the least recently used keys when memory is full or dynamically setting set-time-to-live to free up memory is worth to examine. In addition, we will consider the data cube compression technique, which is suitable for the encryption methods used in our solution. This will involve examining various compression techniques, such as dictionary encoding and run-length encoding, to determine which methods are most compatible with our encryption scheme. Finally, we will compare the efficiency of query performance of our scheme with existing cloud data warehousing solutions such as Amazon Redshift, Google BigQuery, or Snowflake.

Availability of data and materials

No datasets were generated or analysed during the current study.

References

  1. Azure Synapse Analytics (n.a.) [Online]. https://azure.microsoft.com/en-us/blog/azure-sql-data-warehouse-is-now-azure-synapse-analytics/. Accessed 5 Feb 2024

  2. Amazon Redshift (n.a.) [Online]. https://aws.amazon.com/redshift/. Accessed 5 Feb 2024

  3. Google BigQuery (n.a.) [Online]. https://cloud.google.com/bigquery/. Accessed 5 Feb 2024

  4. Snowflake Data Cloud (n.a.) [Online]. https://www.snowflake.com/en/data-cloud/workloads/data-warehouse/. Accessed 5 Feb 2024

  5. Blanco C, Fernández-Medina E, Trujillo J, Piattini M (2009) Data Warehouse Security. In: Liu L, Özsu MT (eds) Encyclopedia of Database Systems. Springer, Boston. https://doi.org/10.1007/978-0-387-39940-9_333

  6. Fernández-Medina E, Trujillo J, Villarroel R, Piattini M (2006) Access control and audit model for the multidimensional modeling of data warehouses. Decis Support Syst 42(3):1270–1289

    Article  Google Scholar 

  7. Singh MP, Sural S, Vaidya J, Atluri V (2019) Managing attribute-based access control policies in a unified framework using data warehousing and in-memory database. Comput Secur 86:183–205. https://doi.org/10.1016/j.cose.2019.06.001

    Article  Google Scholar 

  8. Fugkeaw S, Sato H (2015) Privacy-preserving access control model for big data cloud. In: 2015 International Computer Science and Engineering Conference (ICSEC). pp 1–6. https://doi.org/10.1109/ICSEC.2015.7401416

  9. Lopes CC, Times VC (2015) A framework for investigating the performance of sum aggregations over encrypted data warehouses. In: Proc. ACM SAC. Association for Computing Machinery, New York, pp 1000–1007. https://doi.org/10.1145/2695664.2695805

  10. Gentry C (2010) Computing arbitrary functions of encrypted data. Commun ACM 53(3):97–105

    Article  Google Scholar 

  11. Ahmadian M, Paya A, Marinescu DC. Security of Applications Involving Multiple Organizations and Order Preserving Encryption in Hybrid Cloud Environments. 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, Phoenix, pp. 894–903, https://doi.org/10.1109/IPDPSW.2014.102

  12. Lopes CC, Times VC, Matwin S, Ciferri RR, Ciferri CDdA. Processing OLAP Queries over an Encrypted Data Warehouse Stored in the Cloud. In: Bellatreche, L., Mohania, M.K. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2014. Lecture Notes in Computer Science, vol 8646. Springer, Cham. https://doi.org/10.1007/978-3-319-10160-6_18

  13. Shang X et al (2022) One Stone, Three Birds: Finer-Grained Encryption with Apache Parquet @ Large Scale. In: 2022 IEEE International Conference on Big Data (Big Data). Osaka, pp 5802–5811. https://doi.org/10.1109/BigData55660.2022.10020987

  14. Karkouda K, Nabli A, Gargouri F (2018) CloudWar: A new schema for securing and querying data warehouse hosted in the cloud. In: 2018 28th Int. Conf. Comput. Theory Appl. (ICCTA). pp 6–12. https://doi.org/10.1109/ICCTA45985.2018.9499193

  15. Yi X, Paulet R, Bertino E, Xu G (2016) Private Cell Retrieval From Data Warehouses. IEEE Trans Inf Forensic Secur 11(6):1346–1361. https://doi.org/10.1109/TIFS.2016.2527620

    Article  Google Scholar 

  16. Ahmadian M, Marinescu DC (2020) Information Leakage in Cloud Data Warehouses. IEEE Trans Sustain Comput 5(2):192–203. https://doi.org/10.1109/TSUSC.2018.2838520

    Article  Google Scholar 

  17. Zhang X, Qi L, Dou W, He Q, Leckie C, Kotagiri R, Salcic Z (2022) MR-Mondrian: Scalable Multidimensional Anonymisation for Big Data Privacy Preservation. IEEE Trans Big Data 8(1):125–139

    Article  Google Scholar 

  18. Cuzzocrea A, De Maio V, Fadda E (2020) Experimenting and Assessing a Distributed Privacy-Preserving OLAP over Big Data Framework: Principles Practice and Experiences. In: 44th IEEE Annu. Comput. Softw. Appl. Conf. pp 1344–1350

  19. Cuzzocrea A (2023) Privacy-Preserving OLAP via Modeling and Analysis of Query Workloads: Innovative Theories and Theorems. In: Proc. 35th Int. Conf. Sci. Stat. Database Manage., Article no. 6. pp 1–12. https://doi.org/10.1145/3603719.3603735

  20. Cuzzocrea A (2023) Big OLAP Data Cube Compression Algorithms in Column-Oriented Cloud/Edge Data Infrastructures. In: 2023 IEEE Ninth Multimedia Big Data (BigMM). Laguna Hills, pp 1–2. https://doi.org/10.1109/BigMM59094.2023.00020

  21. Cai H, Yang Y, Fan W, Xiao F, Zhu Y (2023) Towards Correlated Data Trading for High-Dimensional Private Data. IEEE Trans Parallel Distrib Syst 34(3):1047–1059. https://doi.org/10.1109/TPDS.2023.3237691

    Article  Google Scholar 

  22. Fugkeaw S, Hak L (2024) PPAC-CDW: A Privacy-Preserving Access Control Scheme With Fast OLAP Query and Efficient Revocation for Cloud Data Warehouse. IEEE Access 12:78743–78758. https://doi.org/10.1109/ACCESS.2024.3408221

    Article  Google Scholar 

  23. Fugkeaw S, Hak L, Theeramunkong T (2024) Achieving Secure, Verifiable, and Efficient Boolean Keyword Searchable Encryption for Cloud Data Warehouse. IEEE Access 12:49848–49864. https://doi.org/10.1109/ACCESS.2024.3383320

    Article  Google Scholar 

  24. Liu Z, Cao Z, Dong X, Zhao X, Liu T, Bao H, Shen J (2022) EPMDA-FED: Efficient and Privacy-Preserving Multidimensional Data Aggregation Scheme with Fast Error Detection in Smart Grid. IEEE Internet Things J 9(9):6922–6933

    Article  Google Scholar 

  25. Jiang R, Lu R, Choo K-KR (2018) Achieving High Performance and Privacy-Preserving Query over Encrypted Multidimensional Big Metering Data. Future Gener Comput Syst 78:392–401

    Article  Google Scholar 

  26. Olawoyin AM, Leung CK, Cuzzocrea A (2023) Privacy Preservation of Big Spatio-Temporal Co-occurrence Data. In: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). Torino, pp 1331–1336. https://doi.org/10.1109/COMPSAC57700.2023.00202

  27. Tong Q, Li X, Miao Y, Liu X, Weng J, Deng RH (2023) Privacy-Preserving Boolean Range Query With Temporal Access Control in Mobile Computing. IEEE Trans Knowl Data Eng 35(5):5159–5172. https://doi.org/10.1109/TKDE.2022.3152168

    Article  Google Scholar 

  28. Zeutschler T (n.a.) TinyOlap. GitHub. [Online]. https://github.com/Zeutschler/tinyolap. Accessed 5 Feb 2024

  29. Python Cryptographic Authority (2022) Pyca/Cryptography. GitHub. [Online]. https://github.com/pyca/cryptography. Accessed 7 Nov 2023

  30. Bethencourt J et al (2006) Advanced Crypto Software Collection. ACSC, University of Texas. [Online]. https://acsc.cs.utexas.edu/cpabe/. Accessed 7 Nov 2023

  31. PBC (Pairing-Based Cryptography) library. https://crypto.stanford.edu/pbc/. Accessed 22 Nov 2023

  32. Eijs H (2023) pycryptodomex: Cryptographic library for Python. PyPI. [Online]. https://pypi.org/project/pycryptodomex/. Accessed 22 Nov 2023

  33. Eijs H (2023) pycryptodome: Cryptographic library for Python. PyPI. [Online]. https://pypi.org/project/pycryptodome/. Accessed 22 Jun 2023

  34. SymPy Development Team (2023) SymPy: Python library for symbolic mathematics. SymPy. [Online]. https://www.sympy.org/. Accessed 23 Jun 2024

  35. Pandas Development Team (2023) pandas: powerful Python data analysis toolkit. pandas. [Online]. https://pandas.pydata.org/. Accessed 24 Jun 2024

  36. Redis (2023) Redis: In-memory data structure store. Redis. [Online]. https://redis.io/. Accessed 24 Jun 2024

  37. Python Software Foundation (2023) timeit - Measure execution time of small code snippets. Python Documentation. [Online]. https://docs.python.org/3/library/timeit.html. Accessed 24 Jun 2024

  38. Python Software Foundation (2023) json - JSON encoder and decoder. Python Documentation. [Online]. https://docs.python.org/3/library/json.html. Accessed 24 Jun 2024

  39. Python Software Foundation (2023) os - Miscellaneous operating system interfaces. Python Documentation. [Online]. https://docs.python.org/3/library/os.html. Accessed 24 Jun 2024

  40. Transaction Processing Performance Council (2023) TPC-H dbgen: Database population tool. TPC-H. [Online]. http://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp. Accessed 24 Jun 2024

  41. Transaction Processing Performance Council (2023) TPC-H Benchmark Scale Factor. TPC-H. [Online]. http://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp. Accessed 24 Jun 2024

  42. Transaction Processing Performance Council (2023) TPC-H: A decision support benchmark. TPC-H. [Online]. http://www.tpc.org/tpch/. Accessed 24 Jun 2024

  43. Monster22real (2024) SSF-CDW: A Scalable, Secure, and Fast OLAP Query for Encrypted Cloud Data Warehouse. GitHub. [Online]. https://github.com/monster22real/SSF-CDW. Accessed 24 Jun 2024

Download references

Funding

This work (Grant No. RGNS 65-110) was supported by the Office of the Permanent Secretary Ministry of Higher Education, Science, Research and Innovation (OPS MHESI), Thailand Science Research and Innovation (TSRI), and Thammasat University.

Author information

Authors and Affiliations

Authors

Contributions

SF wrote, reviewed & editing the manuscript, investigated the methodologies, performed literature review, designed and developed the solution. PS wrote the original draft ,investigated techniques and developed the concept. LH wrote and revised the manuscript, developed the concept, and did the experiments.

Corresponding author

Correspondence to Somchart Fugkeaw.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fugkeaw, S., Suksai, P. & Hak, L. SSF-CDW: achieving scalable, secure, and fast OLAP query for encrypted cloud data warehouse. J Cloud Comp 13, 129 (2024). https://doi.org/10.1186/s13677-024-00692-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13677-024-00692-y

Keywords