Open AccessArticle

Blockchain-Assisted Verifiable and Multi-User Fuzzy Search Encryption Scheme

Xixi Yan

Pengyu Cheng

Yongli Tang

and

Jing Zhang

School of Software, Henan Polytechnic University, Jiaozuo 454000, China

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(24), 11740; https://doi.org/10.3390/app142411740

Submission received: 3 November 2024 / Revised: 7 December 2024 / Accepted: 13 December 2024 / Published: 16 December 2024

Download

Browse Figures

Versions Notes

Abstract

Searchable encryption (SE) allows users to efficiently retrieve data from encrypted cloud data, but most of the existing SE solutions only support precise keyword search. Fuzzy searchable encryption agrees with practical situations well in the cloud environment, as search keywords that are misspelled to some extent can still generate search trapdoors that are as effective as correct keywords. In scenarios where multiple users can search for ciphertext, most fuzzy searchable encryption schemes ignore the security issues associated with malicious cloud services and are inflexible in multi-user scenarios. For example, in medical application scenarios where malicious cloud servers may exist, diverse types of files need to correspond to doctors in the corresponding departments, and there is a lack of fine-grained access control for sharing decryption keys for different types of files. In the application of medical cloud storage, malicious cloud servers may return incorrect ciphertext files. Since diverse types of files need to be guaranteed to be accessible by doctors in the corresponding departments, sharing decryption keys with the corresponding doctors for different types of files is an issue. To solve these problems, a verifiable fuzzy searchable encryption with blockchain-assisted multi-user scenarios is proposed. Locality-sensitive hashing and bloom filters are used to realize multi-keyword fuzzy search, and the bigram segmentation algorithm is optimized for keyword conversion to improve search accuracy. To realize fine-grained access control in multi-user scenarios, ciphertext-policy attribute-based encryption (CP-ABE) is used to distribute the shared keys. In response to the possibility of malicious servers tampering with or falsifying users’ search results, the scheme leverages the blockchain’s technical features of decentralization, non-tamperability, and traceability, and uses smart contracts as a trusted third party to carry out the search work, which not only prevents keyword-guessing attacks within the cloud server, but also solves the verification work of search results. The security analysis leads to the conclusion that the scheme is secure under the adaptively chosen-keyword attack.

Keywords:

multi-user fuzzy search; smart contract; access control

1. Introduction

Due to the convenience of cloud services, individuals and organizations are inclined to upload their files to the cloud for storage, it is crucial to encrypt sensitive files before storing on the cloud. However, another problem arises as to how to perform search operations on the ciphertext files in cloud services. Searchable encryption technology [1] perfectly solves the problem of searching ciphertext files directly and has a wide range of applications, such as in the smart healthcare [2], smart city, and IoT fields [3].

Since the pioneering work on searchable encryption, scholars have devoted themselves to studying some practical and feasible searchable encryption schemes. For example, a scheme [4] supports conjunctive, subset and range queries on encrypted data, a scheme [5] supports spatial keyword queries, and another scheme [6] supports Boolean queries. In practice, when users perform a search operation, they often misspell the search keywords or simply enter the approximate stem of the keyword or other forms to start performing the search operation. To address this situation, fuzzy search [7] realizes a certain degree of approximate keyword search matching, which can generate the same or similar keyword trapdoor for users to perform search matching. However, a few existing fuzzy search schemes still have limitations. For example, the schemes in [7] and the scheme in [8] only support single keyword search. Although there exists a multi-user scenario implemented in the scheme in [9] and the scheme in [10], where multiple users can retrieve the ciphertext through a single cloud server. In the case that multiple users have search permissions, it is necessary to ensure that people with different search permissions can search for the corresponding ciphertext files. Therefore, fine-grained access control needs to be ensured. There is the problem that the encryption and decryption keys are difficult to share, the online auditing and authorization by administrators for users applying for access are not friendly when it comes to the actual situation of smart healthcare, and do not satisfy the fine-grained access control of multi-user application scenarios. Although the scheme [11] can search for ciphertext files with corresponding permissions in multi-user scenarios, current fuzzy search schemes seldom consider fine-grained access control on shared keys for different types of files. Therefore, it is essential to provide multi-keyword fuzzy search encryption schemes with fine-grained control under multi-user conditions.

In the scenario of smart healthcare, doctors in different departments need to search for the ciphertext of case files. The case information of different patients will be first classified in a certain way, and then the medical data center will store the patient’s case and other private information through encryption in the medical database or cloud server, which can only be accessed by authorized doctors. In practice, doctors in different departments have access rights to different files, e.g., doctors in the cardiology department can only access cases of cardiac patients. Although some fuzzy search schemes support multiple user scenarios, there is the problem of difficulty in sharing encryption and decryption keys, and it is unfriendly in the actual situation in smart healthcare to review and authorize the users applying for access online through administrators.

In addition, some scheme [12,13,14,15] models serve as honest but curious entities, which are not realistic in practice. Cloud servers have too much power and are not conducive to the data owner’s control over the data. Some malicious cloud servers even seriously harm the data owner’s interests, such as returning wrong or incomplete files. This greatly affects the practical application of fuzzy search encryption schemes. Therefore, a decentralized fuzzy search encryption scheme is necessary and feasible to solve the possible problems of malicious cloud servers.

Aiming at the above problems, the Blockchain-Assisted Verifiable and Multi-User Fuzzy Search Encryption Scheme is proposed, which has more comprehensive functions and efficient performance. And it is more adaptable to the ciphertext search needs under new network services such as smart healthcare. The main innovations are as follows.

For the existence of malicious cloud servers in fuzzy search schemes, semi-decentralization is used to weaken the power of cloud servers, a smart contract is constructed to guarantee fairness between users and servers, and blockchain is introduced to assist in achieving integrity verification of results.
Aiming at the lack of fine-grained access control for users in one-to-many application scenarios, the is used to distribute the shared key with a tree structure, and only authorized users who meet the access results can decrypt the shared key.
By improving the index tree structure and bigram participle algorithm, the proposed scheme not only optimizes the search efficiency and search precision but also enables users to check the search results locally for a second time.
Through security analysis, it is proved that the scheme is secure under the adaptive selection of keyword attacks. Functional comparison and experimental analysis show that the search efficiency of the scheme has been improved under the premise of supporting multi-functionalities such as multiple keywords, fuzzy search, dynamic updating, and result validation. In particular, the search time of our scheme is more advantageous with the increase in search keywords and searched files.

The rest of this paper is arranged as follows. Related work on fuzzy search and blockchain-based searchable encryption is in Section 2. Some preliminaries used in the scheme are given in Section 3. In Section 4, the definitions of algorithm and security model are provided. Details of the program will be presented in Section 5. Section 6 and Section 7 provide the program analysis and efficiency analysis, respectively. Section 8 summarizes the work and gives conclusions.

2. Related Work

We present research work on fuzzy searchable encryption and blockchain-based searchable encryption in this section.

2.1. Fuzzy Searchable Encryption

Since the fuzzy search encryption scheme has a broader application prospect, it attracts scholars to study it continuously. Li et al. [7] first formalized the fuzzy keyword search problem on encrypted cloud data and designed a wildcard-based secure index construction scheme, which first constructs a wildcard-based fuzzy keyword set, each element in the set is encrypted, and when searching, the user first generates the same encrypted fuzzy keyword set, then the cloud server finds the corresponding keywords, and finally returns the corresponding files. Subsequently, Wang et al. [16] improved on the scheme in [7] and proposed a searchable encryption scheme for fuzzy keyword search based on the index tree structure of keywords, which improves the search efficiency, but it only supports single keyword search. Although [17] designed a dual ranking function that combines keyword weights and keyword morphological similarity to rank search results, it still cannot avoid the limitations of single-keyword search.

Kuzu et al. [18] scaled by minhash to transform keyword vectors and determine the similarity of keywords. Subsequently, Wang et al. [19] implemented a multi-keyword search based on scheme [18]. Fu et al. [14] improved the search accuracy with the Bloom filter and location-sensitive hash function but still did not support the dynamic update function. Zhong et al. [15] proposed a scheme to support efficient dynamic updating, which improves the search accuracy and supports encrypted index tree based on the top-k sorting of encrypted files. Tong et al. [20] designed a dual bloom filter to store and mask the keywords contained in a file and realized the verification of correctness and completeness, but the verification is more complicated. Li et al. [21] designed an improved bi-gram participle algorithm to enhance fuzzy search accuracy, the scheme assumes cloud server is malicious but fails to construct an effective solution for the malicious behavior of malicious servers.

2.2. Blockchain-Based Searchable Encryption

The cloud server has too much right in traditional searchable encryption schemes, and all the search processes are dependent on the cloud server. Scholars have introduced the blockchain fairness mechanism to realize fairness due to the non-tamperable characteristics, which guarantees that if a user and cloud server follow the agreement honestly, the cloud server gets the corresponding service fee, while users get the correct results. Once a cloud server is detected with dishonest behavior, it will not get the service fee and will be punished with the loss of margin.

Cai et al. [22] realized dynamic keyword search in distributed storage combined with blockchain technology for fair search. Wang et al. [23] constructed a decentralized privacy-preserving search scheme. Data owners can be assured of receiving correct results without having to worry about malicious server behavior. Hu et al. [24] combined blockchain technology to achieve fair payment and constructed a file-sharing scheme in one-to-one as well as one-to-many user scenarios. Smart contracts are used to store secure indexes and execute searches, ensuring that users get correct results if they pay for the transaction. Chen et al. [25] constructed a blockchain-enabled public key encryption scheme with multi-keyword search, which uses smart contract to ensure the fairness without introducing a third party. Guo et al. [26] constructed an authentication-enabled search encryption scheme on blockchain and implement forward security. It is evident that the combination of blockchain and searchable encryption cryptography to achieve decentralization, build fair transactions, and authentication features is effective.

3. Preliminaries

We introduce the technical knowledge used in fuzzy search in this section.

3.1. Bloom Filter (BF)

The Bloom Filter is a space-saving probabilistic data structure that determines whether an element is in a collection or not. Mapping two or more different elements to the same position in a bit array via a hash function has a false positive rate.

Suppose the bloom filter initializes a p-bit array, k random hash functions, and the element set

\partial = {α_{1}, α_{2}, α_{3}, \dots α_{n}}

For each element in the set, it is mapped to the

k

positions of the

p - bit

bloom filter by

k

independent hash functions and corresponding positions are set to 1. When determining whether

α_{x}

is in

\partial

, it uses the same approach as described above to map

α_{i}

to the

k

positions of the bloom filter. If one of the

k

positions has a value of 0, then

α_{x} \notin \partial

; if all

k

positions have a value of 1, then

α_{x} \in \partial

or there is a false positive.

3.2. Locality Sensitive Hashing (LSH)

In the scheme, the LSH determines whether the keyword is in the file or not by mapping keywords into a hash bucket. A positional hash function is defined as follows: A family of hash functions

F

is a mapping from Euclidean space

S_{2}

to hash coding space

U

. The hash function

F

is said to satisfy

(r_{1}, r_{2}, p_{1}, p_{2}) - ness

when

P r_{F} [h (q) = h (p)] \geq p_{1}, p \in B (q, r_{1})

and

P r_{F} [h (q) = h (p)] \leq p_{2}, p \in B (q, r_{2})

are satisfied, where

B

is the space centered at

q

with radii

r_{1}

r_{2}

. The p-stable position-sensitive hash function is calculated as

h_{a, b} (v) = [\frac{a \cdot v + b}{n}]

, where the parameters

a

and

b

are a

d - dimensional

vector and a random number between

[0, n]

, respectively, and each element in the vector

v

satisfies

p - stable

distribution. As shown in Figure 1,

a \cdot v

means that vector

a

is mapped onto an axis with vector

v

as the base vector, and this axis is divided into

n

equal parts of

m

. The labelled value of the interval corresponding to the value of

a \cdot v

is taken as the hash value of vector

a

. We can determine whether a keyword can be mapped to a bucket by calculating the probability that the values are equal by using LSH.

4. Definitions of Algorithm and Security Model

Before describing our scheme in detail, the basic model, algorithmic framework, and security model are first given.

4.1. System Model

The model is based on the following roles, as shown in Figure 2.

(1): Data Owner $(D O)$ . $D O$ extends beyond individual entities to data centers that are responsible for the integration of file information owned by multiple data owners. This implementation realizes the application of a multi-owner scenario from another perspective.
(2): Trusted Authority $(T A)$ . As an authenticating entity, it generates attribute private keys for data users and transmits system public parameters and other relevant information to the users securely.
(3): Data Users $(D U)$ . $D U$ need to apply for their attribute private keys based on their attributes as well as reliable information, then generate and send search trapdoor to the smart contract account for searching transactions through query requirements.
(4): Smart Contract $(S C)$ . As a decentralized component of the system, $S C$ stores secure index tree $I n d e x_{t r e e}$ and verification information of data owners, mitigating the excessive and uncontrolled authority of the cloud server. The search contract performs the initial segment of the entire search transaction and ensures the verification process for $D U$ .
(5): Cloud Server $(C S)$ . $C S$ has powerful storage as well as computational capabilities, and stores the ciphertext data files outsourced by $D O$ , and provides $D U$ with the second part of the search service based on the transaction initiated, it may have the malicious behavior of tampering with the user’s search results.

The workflow of the system is as follows.

D O

transmits parameters, public keys, etc., to

T A

initially (Step 1), firstly categorizes all large collections of files, encrypts classified file collection

D = {F_{1}, F_{2}, \dots, F_{n}}

and outsources the ciphertext set

C = {C_{1}, C_{2}, \dots, C_{n}}

C S

(Step 2). To achieve access control and efficient search,

D O

constructs a file index tree and sends the index tree to the contract storage address (Step 3).

D U

send a registration request to

T A

(Step 4) and receives attribute private keys and search keys (Step 5). Then the

D U

generates a search token locally based on query keywords and algorithms disclosed by

T A

and sends it through a transaction to

S C

(Step 6).

S C

executes a search to obtain a search list and sends the information to the user (Step 7). The ciphertext of file identifiers (

F I D

), the ciphertext set

C = {C_{1}, C_{2}, \dots, C_{n}}

and transaction information is sent to

D U

(Step 8). Then

C S

returns to

D U

the ciphertext data of the search transaction (step 9).

D U

can verify the search results locally, and only users whose attribute private key satisfies the access control can finally decrypt the file and malicious servers will receive penalties. (Step 10).

4.2. Algorithm Framework

4.2.1. Definition

Before giving the definition of the algorithm, the definitions of the symbols used in the scheme are given in Table 1.

4.2.2. Algorithm Definition

The scheme has nine main algorithms.

(1): $S e t u p (1^{λ}) \to (p k, m k)$ is a system initialization algorithm. The Data Owner $D O$ inputs random security parameter λ, generates system public key $p k$ and the master private key $m k$ .
(2): $K e y G e n (1^{λ}, l) \to K S$ is a key generation algorithm. $D O$ inputs random security parameter λ and the length $l$ of file index, outputs the key set $K S = (M_{1}, M_{2}, S)$ .
(3): $E n c (p k, K_{1}, K_{2}, D, K W, K S, Γ) \to (F I D, C, C_{K_{1}}, I, M A C_{c_{i}}, I n d e x_{t r e e})$ is an encryption algorithm. $D O$ takes public key $p k$ , plaintext set $D$ , keyword set $K W$ , access policy $Γ$ , key $K_{i}$ of the corresponding type of files, authentication key $K_{2}$ , and index encryption key $K S$ as input. Then, for outputs, the ciphertext set $C$ , encrypted file identifier $F I D$ , security index $I$ , the key cipher containing access policy $C_{K_{i}}$ , ciphertext file message authentication code $M A C_{c_{i}}$ , and security index tree $I n d e x_{t r e e}$ .
(4): $U s e r S i g n (p k, m k, S_{u}) \to (S K_{u}, S K_{s}, u s e r_i d)$ is a user registration algorithm run by $T A$ , it takes $p k$ , $m k$ , and $S_{u}$ as input, outputs attribute private key $S K_{u}$ , the user search key $S K_{s}$ , and authenticated identifier $u s e r_i d$ .
(5): $T o k e n G e n (p k, S K_{s}, W_{q}) \to T_{t}$ is a trapdoor generation algorithm. Data User takes the public key $p k$ , a query keyword collection $W_{q}$ and attribute private key $S K_{s}$ as input, outputs the trapdoor with time stamp $T_{t}$ .
(6): $S e a r c h (I n d e x_{t r e e}, T_{t}) \to (C_{κ_{i}}, F I D, M A C_{c_{i}})$ is a search-matching algorithm run by Smart Contract $S C$ . This algorithm takes $T_{t}$ and $I n d e x_{t r e e}$ as input, outputs $C_{K_{i}}$ , $M A C_{c_{i}}$ and $F I D$ .
(7): $V e r i f y (C_{i}, M A C_{C_{i}}, K_{2}) \to 1 / 0$ is a verification algorithm. The Data User takes the verification key $K_{2}$ , the ciphertext file $C_{i}$ and the ciphertext message authentication code $M A C_{C_{i}}$ as input. It is determined whether a preliminarily correct result is obtained by outputting the result.
(8): $D e c (S K_{u}, C_{K_{i}}, C_{w_{q}}) \to (D_{k w} / ⊥)$ is a decryption algorithm run by the Data User. It takes the attribute private key $S K_{u}$ , the key cipher containing access policy $C_{K_{i}}$ , and file set cipher $C_{w_{q}}$ , which are based on the query keyword set $w_{q}$ as input. If $S K_{u}$ satisfies the access structure $Γ$ defined by the Data Owner $D O$ , then it outputs the plaintext files containing the query keyword set, otherwise, the output is $⊥$ .
(9): $U p d a t e (U p t y p e, I_{T_{c}}) \to I n d e x_{t r e e_n e w}$ is an index tree update algorithm run by the smart contract. It takes the encrypted subtree $I_{T_{c}}$ sent by the Data Owner as input, and outputs the new index tree $I n d e x_{t r e e_n e w}$ . The detailed process is described in Section 5.2.4.

4.3. Security Model

The

C S

is considered an entity that may have malicious behavior and is not fully trustworthy, it may return untrue results to save computation and bandwidth resources, which can seriously harm the

D U

. In addition,

D U

uploads its own set of attributes honestly to obtain

u s e r_i d

, all communication with

T A

is secure.

Regarding privacy requirements, this article considers two threat models with different adversary capabilities.

Known Ciphertext Model: The cloud server can only observe the encrypted file set, the encrypted index, the submitted tokens, and the search results. It has no prior knowledge of the file set or the search keywords.

Known Background Model: the cloud server is also able to obtain some additional background information like statistical information about the files, the queried keywords and their corresponding tokens, and the frequency of searched keywords.

A simulation-based game between adversary

A

and a stateful simulator

B

using an allowed leakage access model and a search model are used to prove security. Two leakage functions

L = (L_{1}, L_{2})

are defined, where

L_{1}

and

L_{2}

are expressed formally as follows.

L_{1} (D) = {| D |, n, {| F_{i} |, i n d (F_{i})}_{i \in [1, n]}}

and takes the plaintext file collection

D

as input and outputs the plaintext collection size

| D |

, the number of files

n

, the size of each file

| F_{i} |

, and file identifiers

i n d {(F_{i})}_{i \in [1, n]}

L_{2} (D, {W_{q}}^{*}) = {P ({W_{q}}^{*}), T^{*}}

, it takes the plaintext file collection

D

and the search keyword collection as input and outputs the keyword access patterns

P ({W_{q}}^{*})

and search trapdoor

T^{*}

. Here, suppose

A

is an adversary,

B

is a stateful simulator, and

G

acts as a challenger. The interaction game between

A

B

, and

G

is as follows.

$R e a l_{A} (λ)$ . $G$ initializes system and runs $K e y G e n (1^{λ}, m) \to K S$ , then sends a collection of files $D$ to $G$ . $G$ generates encrypted file $C$ and index $I$ with a secure symmetric encryption algorithm and index generation algorithm, then sends them to $A$ . $A$ generates a polynomial number of keywords sets and launches an adaptive search $Q_{1} = (w_{1}, w_{2}, \dots, w_{q})$ . Then the search keywords in set are transformed into a set of vectors to generate search trapdoor by $G$ , which is sent to $A$ . Eventually, $A$ outputs $a n s e r \in {0, 1}$ , ending the game.
$I d e a l_{A} (λ)$ . Given the leakage functions $L_{1}$ and $L_{2}$ , $G$ inputs plaintext file $D$ . Simulator $B$ generates a simulated index $I^{*}$ and ciphertext $C^{*}$ , sends $I^{*}$ and $C^{*}$ to $A$ . $B$ chooses a polynomial number of keywords $Q_{1} = (w_{1}, w_{2}, \dots, w_{q})$ for adaptive interrogation, then outputs search trapdoor ${T_{t}}^{*}$ based on $L_{2}$ . Finally, $A$ outputs the result $a n s e r' \in {0, 1}$ , ending the game.

5. Proposed Scheme

We describe the main ideas of fuzzy search and detail the algorithms in this section.

5.1. Main Idea

The scheme implements fuzzy search based on LSH and BF. Assuming that

D O

has

n

plaintext files

D = {F_{1}, F_{2}, \dots, F_{n}}

, a

p

bit bloom filter

B_{i}

is generated as an index vector, which contains all the keywords in

F_{i}

. To map the keywords in

F_{i}

B_{i}

, the keywords need to be converted into bigram vectors, then the bigram vectors are mapped to

B_{i}

by LSH. Trapdoor generation similarly requires transforming the query keyword

W_{q}

into a bigram vector and generating an

m - bit

bloom filter

B_{w_{q}}

, which maps bigram vector on

B_{w_{q}}

using the same LSH to generate index vector. The inner product of

B_{i}

and

B_{w_{q}}

represents the relevance of file

F_{i}

and query

w_{q}

. As shown in Figure 3.

5.2. Concrete Construction

We describe the various algorithms and talk about them in four broad modules, namely, the preliminary work, the encryption work, the search work, and the subsequent work.

5.2.1. Preliminary Work

The preliminary work includes

S e t u p (1^{λ})

K e y G e n (1^{λ}, m)

and

U s e r S i g n (p k, m k, S_{u})

S e t u p (1^{λ})

D O

defines two multiplicative cyclic groups

G_{0}, G_{1}

ℤ_{p}

, the bilinear map

e : G_{0} \times G_{0} \to G_{1}

, randomly selects anti-collision hash function

H_{1} : {0, 1}^{*} \to G_{0}

, and two pseudo-randomized functions

H_{2} : {0, 1}^{l} \times {0, 1}^{*} \to {0, 1}^{k}

and

H_{3} : {0, 1}^{l} \times {0, 1}^{*} \to {0, 1}^{k}

g

is the generating element of

G_{0}

p

is a large secure prime, and

e (g, g)

is the value of the bilinear mapping in the swarm.

T A

selects

α

β

randomly as input, outputs the master private key

m k = (α, β)

, and

p k = (G_{0}, G_{1}, p, g, h = g^{β}, e {(g, g)}^{α}, H_{1}, H_{2}, H_{3})

K e y G e n (1^{λ}, l)

D O

inputs a random security parameter

λ

and the index length

l

to generate

K S = (M_{1}, M_{2}, S)

to encrypt the trapdoor and index, where

(M_{1}, M_{2}) \in R^{l \times l}

S \in {0, 1}^{l}

is a vector that prevents brute force decryption and

M_{1}, M_{2}

are invertible matrices.

U s e r S i g n (p k, m k, S_{u})

: When a user sends an enrollment request,

T A

checks the user’s attribute set

S_{u}

, then confer

u s e r_i d

and search key

S K_{s}

to the user.

5.2.2. Encryption Work

The encryption work mainly includes key encryption, file index vector generation, index tree generation, and trapdoor generation.

A.: Key Encryption

E n c (p k, K_{1}, K_{2}, D, K W, K S, Γ)

: Assume that the file set has

t

types,

D O

has

n

plaintext files

D = {F_{1}, F_{2}, \dots, F_{n}}

. The files are categorized according to their types and the corresponding symmetric encryption key

K_{i} (1 \leq i \leq t)

is generated for different types of files., e.g., in a medical database, ophthalmology case files are classified as one type and case files from different sections are categorized into different types.

D O

parses keyword sets

K W

from D and

K W_{F_{i}}

from

F_{i}

respectively, then encrypts file identifiers of

D = {F_{1}, F_{2}, \dots, F_{n}}

K_{2}

to obtain the ciphertext

F I D = {F I D_{1}, F I D, \dots, F I D_{n}}

D O

sets the access control structure tree Γ for different types of files with key

K_{1}

based on actual application requirements. For example, the core files of a medical department can be decrypted only by users who meet the conditions of important attributes, while for the general files of the department only the basic conditions of access to the files by the department need to be met.

D O

assign a polynomial

q_{x}

for each node

x

from root

r

, where

q_{x}

in leaf nodes are constant. Randomly select

s

from

ℤ_{p}

which

q_{r} (0) = s

\deg (q_{x}) = d_{x} = k_{x} - 1

, then

d e g (q_{r})

random coefficients are selected from

ℤ_{p}

to determine

q_{x}

Y

denotes the leaf nodes and

k_{x}

is the threshold value of

x

. Eventually gets

C_{K_{i}} = {Γ, \bar{C} = K_{1} e {(g, g)}^{α s}, C = h^{s}, {C_{y}, C_{y}'} \forall y \in Y}

where

C_{y} = g^{q_{y} (0)}, C_{y}' = H_{1} {(a t t (y))}^{q_{y} (0)}

D O

encrypts plaintext

D = {F_{1}, F_{2}, \dots, F_{n}}

K_{i}

to obtain ciphertext set

C = (C_{1}, C_{2} \dots, C_{n})

. Eventually,

C

and

F I D

are sent to

C S

C_{K_{i}}

will be stored in the index tree described subsequently.

B.: File Index Vector Generation

The generation of file index vectors is carried out by mapping the keywords on the bloom filter of the file. The keywords transform into vectors so that they can be mapped on the bloom filter and the improved keyword bigram disambiguation algorithm from scheme [21] is used to transform the keywords into vectors.

For the keyword “happy”,

D O

first converts it to a bigram set

{h a_{1}, a p_{1}, p p_{1}, p y_{1}}

. Considering the existence of the same bigram elements, the vector is expanded to

2 \times 26^{2}

, where

26 \times 26

represents possible binary letters. If the bigram set of the keyword exists in the corresponding position of the vector, the corresponding position is set to 1, otherwise it is set to 0, different keywords are converted into vectors with the same length.

D O

constructs and initializes an

l - bit

bloom filter

B_{i}

for file

F_{i}

in the file collection

D

k

LSH.

I^{'}

and

I^{″}

are

l - bit

null vectors. The keyword set

K W_{F_{i}}

is subjected by the porter stemming algorithm to obtain the stemmed set

K W_{S T_{F_{i}}}

. If the file

F_{i}

has

x

keywords, the bigram set of

K W_{S T_{F_{i}}}

finally converts to a vector set

B v_{S T_{F_{i}}}

b v_{i}

in the vector set

B v_{S T_{F_{i}}}

are mapped onto

B_{i}

k

LSH. Set the corresponding position of the mapping to

T F_{i, j} / k

according to the importance of the file according to the importance of the keyword to the file, where

T F_{i, j} = 1 + | k w_{j} | / | F_{i} |

is the frequency of keyword

k w_{j}

F_{i}

| k w_{j} |

denotes the number of occurrences of

k w_{j}

F_{i}

, and

| F_{i} |

denotes the number of keywords in

F_{i}

The index vector construction for

F_{i}

is demonstrated by an example, as shown in the left side of Figure 3. Suppose

F_{i}

has three keywords “happily, teachers, likely”, where

S = [1, 0, 1, 0, 0, 1, 1, 0, 0, 1]

l = 10

M_{1}, M_{2}

are invertible matrices and

M_{1}, M_{2} \in R^{10 \times 10}

D O

constructs an

l - bit

bloom filter

B_{i}

for

F_{i}

, then converts the set of keywords to the stem set

S T_{k w} = {h a p p y, t e a c h, l i k e}

. The stem keyword “happy” is first converted to

B v_{1} = {h a_{1}, a p_{1}, p p_{1}, p y_{1}}

and then converted to vector

b v_{1}

. For ease of representation, Figure 3 is represented by a string of 0 and 1 vectors. Then

B v_{2}, B v_{3}

and

b v_{2}, b v_{3}

are constructed for the rest stem keywords in the same way, finally,

b v_{1}

b v_{2}

, and

b v_{3}

are mapped as inputs to

k

LSHs on

B_{i}

, respectively, and the corresponding position of

B_{i}

is set to

T F_{i, j} / k

. Assuming three keywords have the same weight

T F_{i, j} / k

of 0.5, the final index vector

B_{i}

of the file

F_{i}

is obtained.

C.: Index Tree Generation

Denote

S

K S

and bloom filter

B_{i}

of file

F_{i}

S = (s_{1}, s_{2}, \dots s_{m})

and

B_{i} = (b_{1}, b_{2}, \dots b_{m})

, respectively, where

B_{i}

and

S

have the same structure. For each

b_{i}

B_{i}

, let

B_{i}' = (b_{1}', b_{2}', \dots, b_{m}'), B_{i} ″ = (b_{1} ″, b_{2} ″, \dots, b_{m} ″)

. If

s_{i} = 1

, then

b_{i} = b_{i}' = b_{i} ″

; If

s_{i} = 0

, then

b_{i}' = b_{i} / 2 + r

b_{i} ″ = b_{i} / 2 - r

, where

r

is a random parameter and

r \in R

. Finally, the encrypted index

I_{i} = (I_{i}', I_{i} ″)

of the file

F_{i}

is obtained, where

I_{i}' = (M_{1}^{T} \cdot B_{i}'), I_{i} ″ = (M_{2}^{T} \cdot B_{i} ″)

The index tree construction of the scheme in [15] is used to construct the

I n d e x_{t r e e}

and the storage structure of the leaf nodes is improved on this basis. Leaf nodes store file identifiers and vectors in addition to filing authentication codes and key ciphertexts with access control structures. The internal nodes store the optimal vectors of their leaf node index vectors and the internal node vectors are encrypted in accordance with the index encryption method of

F_{i}

. Finally, the

I n d e x_{t r e e}

is sent to

S C

, the

I n d e x_{t r e e}

is shown in Figure 4.

D.: Trapdoor Generation

T o k e n G e n (p k, S K_{s}, W_{q})

D U

first converts the set of keywords

W_{q}

into a bigram set by the bigram segmentation algorithm, then the bigram set is transformed into a binary vector set

B v

using the same method as for index generation. Each vector in

B v

is used as an input to the

k

LSHs, respectively, to map into an

l - bit

bloom filter, the value of the mapped location is set to 1. Finally, the vector

B_{t}

of the keywords is obtained.

The bloom filter used to generate the search trapdoor is denoted as

B_{t} = (t_{1}, t_{2}, \dots t_{m})

S

K S

is denoted as

S = {s_{1}, s_{2}, \dots, s_{m}}

S

and

B_{t}

have the same structure as

B_{t}' = (t_{1}', t_{2}', \dots t_{m}')

and

B_{t} ″ = (t_{1} ″, t_{2} ″, \dots t_{m} ″)

. For each

t_{i}

B_{t}

, if

s_{i} = 1

, then

t_{i}' = t_{i} / 2 + r'

t_{i} ″ = t_{i} / 2 - r'

; if

s_{i} = 0

, then

t_{i} = t_{i}' = t_{i}^{″}

. Then, encrypt

B_{t}'

B_{t} ″

to get

T^{'} = ({M_{1}}^{- 1} \cdot B_{t}')

T' = ({M_{2}}^{- 1} \cdot B_{t} ″)

. Eventually, the secure search trapdoor

T = (T^{'}, T ″)

T_{t} = (T | | L_{t})

, and

u s e r_i d

are sent to

S C

through transaction.

5.2.3. Search Match

S e a r c h (I n d e x_{t r e e}, T_{t})

D O

defines a reasonable search unit price

$ s e a r c h

, then sets

$ p r o_D U

as the deposit of

D U

stored temporarily in

S C

to prevent

D U

from quitting in the middle.

$ A l l

represents the total price that

D U

should pay for each search transaction and

$ s e a r c h_p r i c e

represents the overhead of calling the search algorithm.

g a s_l i m

and

g a s_p r i c e

respectively represent the maximum gas consumption and gas unit price in the

S C

, and the product of the two represents the maximum price that the

S C

is allowed to accept.

S C

obtains the user’s search trapdoor and

u s e r_i d

through the transaction. After authenticating whether the identity is correct, it checks whether the pre-deposit of

D U

satisfies the search transaction, if it does, a search match will be performed, otherwise, the transaction is canceled.

The index tree will be traversed from top to bottom. Meanwhile,

S C

calculates

T^{T} \cdot I_{i}

, which represents the correlation score

S c o r e

of the node and search keywords, where

T \cdot I_{i} = {{M_{1}}^{- 1} B_{t}' \cdot M_{1}^{T} B_{i}' + {M_{2}}^{- 1} B_{t} ″ \cdot M_{2}^{T} B_{i} ″}

. If the

S c o r e

of the parent node vector and the search trapdoor is less than the threshold

T H

, the traversal of the node’s children is stopped. The entire index tree is traversed in this way.

S C

defines a list

S c l i s t

, which consists of at most

k_{n}

records. Each entry contains a pair

< S c o r e_{F_{i}}, v a l u e >

, where

v a l u e = {F I D | | C_{K_{i}} | | M A C_{C_{i}}}

S c o r e = {T * I_{i}}

. When traversing to the leaf node,

S C

stores the information of leaf node in

S c l i s t

. After traversing the index tree,

S C

sorts and eliminates the

k_{n}

entries of

S c l i s t

based on

S c o r e_{F_{i}}

. Finally the

k

entries that match the user’s search are obtained, then

S C

sends the

v a l u e

R L i s t

D U

, sends the

F I D

and transaction information in

v a l u e

to the

C S

. After receiving the message,

C S

packages the ciphertext set

C

and sends it to the

D U

S C

uses GDFS (Greedy Depth-First Search) for searching, the notation and Algorithm 1 used are described as follows.

S c o r e_{n d}

: correlation score between index vector

I_{i}

of node

n d

and search vector

T

n d . h c l d

: child nodes of node

n d

with high query relevance.

n d . l c l d

: child nodes of node

n d

with low query relevance.

Algorithm 1 Search Matching Algorithm

Input: the index tree

I n d e x_{t r e e}

Output:

S c l i s t

1: if

n d

is not a leaf node then

2: if

S c o r e_{n d}

T H

then

3: GDFS (

n d . h c l d

)

4: GDFS (

n d . l c l d

)

5: else

6: return

7: end if

8: else

9: if

S c o r e_{F_{i}}

T H

S c l i s t

is full

10: delete the lowest relevance node from

S c l i s t

11: insert the new node

< S c o r e_{F_{i}}, v a l u e >

and sort all the node by

S c o r e_{F_{i}}

12: else if

S c o r e_{F_{i}}

T H

S c l i s t

is not full

13: insert the new node

< S c o r e, v a l u e >

S c l i s t

14: end if

15: return

16: end if

As the child nodes are overwritten by parent nodes, only part of the tree needs to search. As in Figure 4, the file collection

D = {F_{1}, F_{2}, \dots, F_{7}}

, the threshold

T H

is set to 1.5, and the trapdoor

T = (0, 1, 0, 1, 0)

is sent to

S C

by the user. Assume that 1 most relevant file for the search is selected.

S C

traverses the nodes from top to bottom and computes

S c o r e

for each node. The children of

L_{11}

do not need to traverse due to

S c o r e_{R o o t} = 2.5 > T H

S c o r e_{L_{11}} = 1 < T H

S c o r e_{R_{12}} = 2 > T H

S c o r e_{R_{23}} = 1.8 > T H

S c o r e_{F_{5}} = 1 < T H

S c o r e_{F_{6}} = 1.6 > T H

and

S c o r e_{F_{7}} = 1.5 > T H

. The information of

F_{6}

and

F_{7}

is finally added to

S c l i s t

S C

ranks

F_{6}

and

F_{7}

S c l i s t

according to

S c o r e

, then

F_{7}

is eliminated.

5.2.4. The Subsequent Work

In this section, we focus on verification, decryption of files, and updating the index tree.

A.: Verification

V erify (C_{i}, M A C_{C_{i}}, K_{2})

: Suppose

k

encrypted identifiers

F I D = {F I D_{1}, F I D_{2} \dots, F I D_{k}}

are returned in a search result,

D U

first determines whether

F I D

from

C S

is the same as

F I D

from

S C

. Then, the

M A C_{C_{i}}

of the file is calculated locally by

k_{2}

to determine whether the

C S

has tampered with search information.

S C

sends

$ s e a r c h

D O

, sends

$ s e a r c h_p r i c e * $ g a s_p r i c e

to the staff member who executed the transaction, and refunds the transaction deposit from

D U

B.: Decryption

D e c (S K_{u}, C_{K_{i}}, C_{w_{q}})

: After getting the key ciphertext

C_{K_{i}}

, the user checks whether the

S K_{u}

matches with the ciphertext access policy, if it matches,

D U

can get

A = e {(g, g)}^{r s}

and recover the

K_{i}

, otherwise,

D U

has no access to the file. The decryption of

K_{i}

is shown in the Decryption Correctness section.

C.: Index tree update

U p d a t e (U p t y p e, I_{T_{c}})

: The updating of the index tree is the process of replacing the old subtree with the new encrypted one.

D O

retains the unencrypted index tree locally and denotes

T_{c}

as the set of nodes that may be changed during the update process. As an example, in Figure 4, if the file

F_{1}

is deleted in an update operation, then

T_{c} = {L_{11}, L_{21}, R o o t, F_{1}}

When

U p t y p e = d e l e t e

, delete the leaf node

F_{i}

and recompute the other node index vectors in

T_{c}

, encrypt node index vectors in

T_{c}

K S

. If the operation destroys the balance of the tree, replace it with a pseudo node, then a new indexed subtree

I_{T_{c}}

is generated.

D O

sends

I_{T_{c}}

and

U p t y p e = d e l e t e

S C

, which replaces the original indexed subtree with the help of

I_{T_{c}}

.When

U p t y p e = a d d

, the leaf node

F_{i}

needs to store the key ciphertext

C_{K_{i}}

, the file ciphertext message authentication code

M A C_{F_{i}}

, encrypted identifier

F I D_{F_{i}}

and the other operations are the same as the deletion operation. The index update process will not destroy the balance of the tree because the new nodes will be added to the pseudo nodes.

6. Program Analysis

We analyze the correctness and security of the scheme in this section.

6.1. Correctness Analysis

6.1.1. Fuzzy Search Correctness

We denote

I_{i} = (I_{i}', I_{i} ″)

and trapdoor

T = (T^{'}, T^{″})

for the multi-keyword set

w_{q}

. Assuming

n

file indexes are matched with

T

in the search,

S c o r e

is the inner product of

I_{i}

and

T

, denote each element in

S c o r e

P_{x, y}

, which

S c o r e = I_{i}' \cdot T^{'} + I_{i}^{″} \cdot T^{″} = M_{1}^{T} \cdot B_{i}' \cdot {M_{1}}^{- 1} \cdot B_{t}' + M_{2}^{T} \cdot B_{i} ″ \cdot {M_{2}}^{- 1} \cdot B_{t} ″

P = {P_{x, y} | 1 \leq x \leq n, 1 \leq y \leq n}

and satisfy (1).

\{\begin{cases} r_{x, y} = b_{x, y}^{'} \cdot t_{y}^{'} + b_{x, y}^{''} \cdot t_{y}^{''} = b_{x, y} \cdot (t_{y} / 2 + r^{'}) + b_{x, y} \cdot (t_{y} / 2 - r^{'}) = b_{x, y} \cdot t_{y}, s_{y} = 1 \\ r_{x, y} = b_{x, y}^{'} \cdot t_{y}^{'} + b_{x, y}^{''} \cdot t_{y}^{''} = b_{x, y} \cdot (t_{y} / 2 + r^{'}) + b_{x, y} \cdot (t_{y} / 2 - r^{'}) = b_{x, y} \cdot t_{y}, s_{y} = 0 \end{cases}

(1)

From the above calculations, it can be concluded that the random numbers

r

and

r'

are not linked to the result, so the correlation scores of the fuzzy search when the trapdoor is matched with the index inner product are not affected by the random numbers.

6.1.2. Decryption Correctness

After

D U

receives the key ciphertext

C_{k_{1}}

containing the access structure,

D U

uses attribute private key

S_{u}

to determine whether the access structure is satisfied or not, and only if the access structure is satisfied can

K_{i}

be recovered, the process is in (2).

K_{i} = \frac{\bar{C}}{\frac{e (C, E)}{A}} = \frac{K_{i} e {(g, g)}^{α s}}{\frac{e (h^{s}, g^{\frac{α + r}{β}})}{e {(g, g)}^{r s}}} = \frac{K_{i} e {(g, g)}^{α s} e {(g, g)}^{r s}}{e {(g, g)}^{s (α + r)}}

(2)

6.2. Security Analysis

Theorem 1.

The scheme is secure under the adaptive selection of keyword attacks when

C S

and external adversaries can only divulge what they are allowed to.

Proof.

For adversary

A

, a stateful simulator

B

satisfies the condition

| P r [R e a l_{A} (λ)] - P r [I d e a l_{A, B} (λ)] | \leq n e g l^{*} (λ)

, where

n e g l^{*} (λ)

is a negligible probability. From the proof equivalence in the scheme in [27], the simulation-based game proofs are equivalent to indistinguishable game proofs.

A

only needs to analyze the differentiation between

B

to win the game,

P r [I n d_{A} (λ) = 1] - 1 / 2 \leq n e g l^{*} (λ)

is used to prove the scheme is secure.

B

simulates the process of generating ciphertexts, indexes, and trapdoors as follows.

(1): Generate ciphertext $C^{*}$ by simulation.

According to

L_{1} (D) = {| D |, n, {| F_{i} |, i n d (F_{i})}_{i \in [1, n]}}

B

generates

n

ciphertexts of

| F_{i} |_{i \in [1, n]} - bit

uniformly at random to simulate the encrypted file set

C^{*} = {C_{1}, C_{2}, \dots, C_{n}}

, which is encrypted by AES. According to the semantic security of symmetric encryption, the ciphertext

C

generated by

R e l_{A} (λ)

and the ciphertext

C^{*}

generated by

I d e a l_{A, B} (λ)

are computationally indistinguishable, i.e., the advantage of the adversary’s victory can be ignored, i.e.,

∣ P r [E n c r y p t (F, K_{i}) \to C] - P r [S i m \to C^{*}] ∣ \leq n e g {l_{1}}^{*} (λ)

(2): Generate search trapdoor $T^{*}$ by simulation.

According to

L_{2} (D, {W_{q}}^{*}) = {P ({W_{q}}^{*}), T^{*}}

B

maps each keyword in

{W_{q}}^{*} = (w_{1}, w_{2}, \dots, w_{q})

to the bloom filter, its search trapdoor is denoted as

T_{Q} = (T_{Q}', T_{Q} ″)

, where

T_{Q}' = ({M_{1}}^{- 1} \cdot B_{Q}')

T_{Q} ″ = ({M_{2}}^{- 1} \cdot B_{Q} ″)

B_{Q}' = (t_{1}^{'}, t_{2} ″, \dots, t_{m}^{'})

t_{j}^{'} = t_{j}^{'} / 2 + r^{'}

, and

t_{j} ″ = t_{j} ″ / 2 - r^{'}

. The probability of

B

simulating the search token

T_{Q}

is negligible when the

K S

is unknown. Due to the introduction of the random value

r^{'}

in the encrypted query and the pseudo-random nature of the key set

K S

, the simulator

B

randomly selects

K S^{*}

and

r^{*}

to construct the feasible search token with probability

P r [T_{Q} = T^{*}] \approx P r [r^{'} = r^{*}] \approx P r [K S = K S^{*}] \approx 1 / 2^{k} \leq n e {g_{2}}^{*} (λ)

, thus the advantage of

B

to simulate the feasible search trapdoor is negligible.

(3): Simulated construction of index $I^{*}$ .

In the process of generating the secure index based on the leakage function

ℒ_{1}

L_{2}

, simulator

B

picks a random value to replace the random number used in the real case. According to the secure

K N N

encryption algorithm, it is known that the secure

K N N

algorithm using random numbers is indistinguishable. Meanwhile, the matrix picked for the key set

K S

is a sufficiently large invertible matrix and

K S

is not artificially compromised. Thus, the advantage of winning for the polynomial adversary satisfies (3), which is negligible.

| P r [B u i l d I n d e x (S K, F_{i}, K W_{F_{i}}) \to I_{r e a l}] - P r [S i m \to I^{*}] | \leq n e g {l_{3}}^{*} (λ)

(3)

Since the adversary wins this indistinguishability game by analyzing ciphertexts, indexes and search credentials, the advantages of the adversary in distinguishing between real ciphertexts and simulated ciphertexts, real indexes and simulated ciphertexts, and real search tokens and simulated search tokens, respectively, are denoted as

A d v_{A} (C, C^{*})

A d v_{A} (I_{r e a l}, I^{*})

and

A d v_{A} (T_{Q}, T^{*})

. Then we get (4).

\begin{array}{l} P r [I n d_{A} (λ) = 1] = \frac{1}{2} + A d v_{A} (C, C^{*}) + A d v_{A} (I_{r e a l}, I^{*}) + A d v_{A} (T_{Q}, T^{*}) \\ = \frac{1}{2} + | P r [E n c r y p t (F_{i}, K_{i}) \to C] - P r [S i m \to C^{*}] | \\ + | P r [B u i l d I n d e x (K S, F_{i}, K W_{F_{i}}) \to I_{r e a l}] - P r [S i m \to I^{*}] | \\ + | P r [T r a p d o o r (K S, {W_{q}}^{*}) \to T_{Q}] - P r [S i m \to T^{*}] | \\ \leq \frac{1}{2} + n e g {l_{1}}^{*} (λ) + n e {g_{2}}^{*} (λ) + n e g {l_{3}}^{*} (λ) \end{array}

(4)

In summary, the outputs of

R e l_{A} (λ)

and

I d e a l_{A, B} (λ)

obtained for any polynomial adversary are indistinguishable. □

Theorem 2.

If the blockchain is tamper-proof, the scheme ensures that user transactions are fair.

Proof.

If the

C S

is dishonest, it will return incorrect file numbers or falsify search results. According to the fair-trade rules set by

S C

C S

will not receive service fees and will be penalized with loss of deposit. If

C S

enforces the contract rules honestly,

D U

gets the correct result and

C S

gets the corresponding service fee. In addition, the transaction specifies a limit time

L_{t}

, when the transaction time is greater than

L_{t}

, the user’s deposit will be refunded, and the transaction will be ended. □

Theorem 3.

For malicious cloud servers and other external adversaries, no information can be learned about plaintext files except for ciphertext files.

Proof.

In the scheme, the file is encrypted with the corresponding symmetric key

K_{i}

before it is uploaded to the

C S

, then

D O

encrypts

K_{i}

C_{K_{i}}

based on

CP - ABE

and stores it in the index tree

I n d e x_{t r e e}

. Even though the

C S

and external adversaries eavesdrop on the ciphertext files in the channel, the difficulty of trying to get the plaintext information is equal to the difficulty of decrypting

K_{i}

. Since

\bar{C} = k_{i} e {(g, g)}^{α s}

C_{K_{i}}

e {(g, g)}^{α s}

must be computed to decrypt

K_{i}

. Under the discrete logarithm problem, the external adversary and

C S

to compute

e {(g, g)}^{α s}

from

C = h^{s}

and

e {(g, g)}^{α}

is difficult. Based on the

D i f f e - H e l l m a n

computational problem, the adversary cannot use the Lagrange interpolation formula to recursively compute

e {(g, g)}^{r s}

without the attribute key

S K_{u}

that conforms to the access policy and obtain

K_{i}

K_{i}

and the plaintext file can be decrypted when

S K_{u}

satisfies the access structure. Since

S K_{u}

is privately stored by the

D U

, it is not possible for

C S

and external adversaries to learn any plaintext information other than the ciphertext file. □

7. Efficiency Analysis

The scheme will be analyzed in terms of function, complexity, and performance experimentation in this section.

7.1. Functionality Comparison

The functions of the proposed scheme with the related schemes [15,21,25,28,29] are completed in this section. As shown in Table 2, where the symbol “✓” indicates that the corresponding functionality is satisfied and “✗” indicates that it is not satisfied.

Both the scheme in [15] and the scheme in [29] are about multi-keyword fuzzy searchable encryption. The scheme in [15] realizes dynamic updating by constructing a tree structure but does not support verification of the search result. The scheme in [29] only considers the problem of ciphertext result correctness and does not support dynamic updating. The scheme in [25] utilizes the blockchain mechanism to protect the index, but it does not support fuzzy search and relevance sorting of files, and it cannot maximize the return of the set of files that the user is interested in. The scheme in [28] utilizes authentication and homomorphic encryption to detect misbehavior occurring in cloud servers, and the search results are outsourced to edge computing servers, which can verify the correctness and integrity, but it would impose additional overhead. Although the scheme [21] supports all the functions, the scheme does not indicate how the process of dynamic updating takes place. It also does not take measures against the possible malicious behavior of the cloud server and does not effectively limit the privileges of the cloud server.

The proposed scheme makes up for the shortcomings of the above schemes, and it not only realizes fuzzy search, search result relevance sorting, and dynamic updating but also supports fine-grained access control, search result correctness, and integrity verification. At the same time, smart contracts are used to constrain malicious behaviors that may occur in the search process of cloud servers and protect the rights and interests of users.

7.2. Complexity Theoretical Analysis

In this section, the performance will be analyzed from three aspects: index tree generation, trapdoor generation, and search matching.

Index tree generation. The cost of file index generation includes the generation of a plaintext index vector for each file and the encryption of the plaintext index vector. Where the generation of the index is mainly linked to the number of keywords, the encryption of the plaintext vector mainly consists of the factorization operation of the vector and multiplication with a matrix of order $l \times l$ . The time overhead of single file index generation is $O (l^{2})$ . The $I n d e x_{t r e e}$ generates $(n - 1)$ additional nodes at the internal nodes when the number of files is $n$ . Therefore, the cost of the $I n d e x_{t r e e}$ generation is a little higher than the other schemes, but it is generated by the $D U$ locally and upload to the smart contract only once.
Trapdoor Generation. The cost of trapdoor generation mainly consists of generating an $l$ bit plaintext trapdoor and performing matrix multiplication operations on the plaintext trapdoor, so the cost of trapdoor generation is $O (l^{2})$ .
Search Match. The Search and Match includes traversing the $I n d e x_{t r e e}$ , the calculation of the inner product of trapdoors and index vectors, and sorting the files to filter out rejections based on relevance. In this scheme, only the $n_{k}$ most relevant ciphertext files are searched. These files and child nodes share the same access path, so the traversal of the $I n d e x_{t r e e}$ only touches a portion of $n$ nodes i.e., on $r (r < < n)$ paths from the root to the leaf nodes instead of traversing the child nodes repeatedly, hence the time complexity is $O (r \log n)$ .

In Table 3, the efficiency is compared in terms of index generation, trapdoor generation, and search time. Here,

n

is the number of files,

Q

is the number of exact keywords,

Z

is the maximum number of fuzzy keywords, and

l

denotes the index length in the proposed scheme. In the scheme,

l

can be treated as a constant and

n < < Q < < Z

7.3. Experimental Performance

In this section, simulation experiments are conducted. The performance of the scheme is evaluated using Java language in the Windows 10 Home Chinese version. The hardware system is a 64-bit Windows operating system Intel(R) Core (TM) i7-8750H CPU @2.20 GHz with 16 GB of RAM and the experimental dataset is RFC (Request For Comments). Smart contracts are deployed through the solidity language, running in an Ethereum virtual machine. The blockchain implementation is based on the official Ethereum test network Rinkeby and the Ethereum wallet MetaMask is used to function as an account. The numbers of LSH are chosen as k = 30, the constructed index and trapdoor vector is l = 8000, the file is encrypted by AES-256, and the number of keywords in a file is 80–113.

The scheme excludes keywords with little semantic distinction such as the, where, and a. Simulation experiments and comparisons of our scheme with scheme [15], scheme [28], and scheme [29] are conducted in terms of file index generation, search matching time, and search accuracy.

Individual File Index Generation and Encryption. In Figure 5a, the cost of generating a plaintext index vector for a single file index is gradually increasing with the number of file keywords. Meanwhile, the encryption time of the plaintext index vector is constant, which fluctuates slightly at 40 ms.
Search Match. In Figure 5b, as the number of files gradually increases, all 4 schemes basically show an increasing trend. When the number of files is no more than 1000, all the schemes increase at a relatively small level. The time cost and growth rate required for searching in the proposed scheme are lower than those of the schemes in [25,29], and as the number of files increases, the scheme follows up with a slight increase compared to the beginning. Although the search time of the scheme in [21] also consistently increases at a lower level, the search time of our scheme is still superior to it. When the number of files is 2000, the search times of the schemes in [28,29] are 7.42 s and 4.42 s, respectively, while the time cost of our scheme is only 2.21 s. In Figure 5c, the search matching costs of the schemes in [21,28,29] tend to stabilize, while the time cost in our scheme shows a small linear increase. When the number of search keywords is 9, the time costs of the scheme in [28,29], and [21] are 7.42 s, 4.51 s, and 2.65 s, respectively, while the time cost of our scheme is only about 1.85 s.
Fuzzy Search Accuracy. Since our scheme and the scheme in [15] have similarities in index construction which uses $u n i g r a m$ to transform keywords, the scheme in [15] is chosen to compare fuzzy search accuracy. The fuzzy keywords are constructed by replacing one letter in the keyword with another letter randomly, and the check accuracy is used to test the search results. The accuracy test method $Δ P = p^{'} / p$ from the scheme in [32] is used in our scheme, where $p$ indicates the number of search files returned and $p^{'}$ indicates the number of real files returned.

As seen in Figure 5d, the search accuracy increases gradually with the increase in the number of search keywords, which makes the scheme more capable of distinguishing the files that satisfy the user’s search conditions. The search accuracy of the scheme in [15] is around 85%, while the search accuracy of our scheme can reach 91% with the increase in valid search keywords.

8. Conclusions

In this paper, an effective semi-decentralized multi-keyword fuzzy searchable encryption scheme is proposed. We improved the index tree storage to support fine-grained access control without losing search efficiency. In addition,

CP - ABE

is introduced into the scheme to support multi-user scenarios. Experimental results show that our scheme has certain advantages in efficiency compared to similar schemes. In a word, the proposed scheme is rich with additional functions such as verification, updating, etc., while at the same time presenting a high performance.

Author Contributions

Conceptualization, X.Y. and P.C.; methodology, X.Y. and P.C.; software, X.Y.; validation, X.Y. and P.C.; formal analysis, X.Y., P.C. and J.Z.; investigation, P.C.; resources, X.Y. and Y.T.; data curation, P.C.; writing—original draft preparation, X.Y. and P.C.; writing—review and editing, X.Y. and P.C.; visualization, X.Y. and P.C.; supervision, X.Y. and Y.T.; project administration, X.Y. and Y.T.; funding acquisition, X.Y. and Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by [Programs for Science and Technology Development of Henan Province] grant number [242102210152], [the Fundamental Research Funds for the Universities of Henan Province] grant number [NSFRF240620], [Key Scientific Research Project of Henan Higher Education Institutions] grant number [24A520015] and [National Natural Science Foundation of China] grant number [62472144]. And The APC was funded by [National Natural Science Foundation of China] grant number [62472144].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Song, D.X.; Wagner, D.; Perrig, A. Practical techniques for searches on encrypted data. In Proceedings of the 2000 IEEE Symposium on Security and Privacy, S&P 2000, Berkeley, CA, USA, 14–17 May 2000; pp. 44–55. [Google Scholar]
Xu, G.; Li, H.; Ren, H.; Lin, X.; Shen, X. DNA Similarity Search With Access Control Over Encrypted Cloud Data. IEEE Trans. Cloud Comput. 2022, 10, 1233–1252. [Google Scholar] [CrossRef]
Tong, Q.; Miao, Y.; Liu, X.; Choo, K.-K.R.; Deng, R.H.; Li, H. VPSL: Verifiable Privacy-Preserving Data Search for Cloud-Assisted Internet of Things. IEEE Trans. Cloud Comput. 2022, 10, 2964–2976. [Google Scholar] [CrossRef]
Boneh, D.; Waters, B. Conjunctive, Subset, and Range Queries on Encrypted Data. In Theory of Cryptography; Vadhan, S.P., Ed.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4392, pp. 535–554. [Google Scholar] [CrossRef]
Chen, L.; Cong, G.; Jensen, C.S.; Wu, D. Spatial Keyword Query Processing: An Experimental Evaluation. In Proceedings of the VLDB Endowment, Riva del Garda, Trento, Italy, 26–30 August 2013. [Google Scholar]
Tong, Q.; Li, X.; Miao, Y.; Liu, X.; Weng, J.; Deng, R. Privacy-Preserving Boolean Range Query with Temporal Access Control in Mobile Computing. IEEE Trans. Knowl. Data Eng. 2022, 35, 5159–5172. [Google Scholar] [CrossRef]
Li, J.; Wang, Q.; Wang, C.; Cao, N.; Ren, K.; Lou, W. Fuzzy Keyword Search over Encrypted Data in Cloud Computing. In Proceedings of the 2010 Proceedings IEEE INFOCOM, San Diego, CA, USA, 14–19 March 2010; pp. 1–5. [Google Scholar] [CrossRef]
Chuah, M.; Hu, W. Privacy-Aware BedTree Based Solution for Fuzzy Multi-keyword Search over Encrypted Data. In Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops, Minneapolis, MN, USA, 20–24 June 2011; pp. 273–281. [Google Scholar] [CrossRef]
Wang, B.; Yu, S.; Lou, W.; Hou, Y.T. Inverted index based multi-keyword public-key searchable encryption with strong privacy guarantee. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), Hong Kong, China, 26 April–1 May2015; pp. 2092–2100. [Google Scholar]
Chen, J.; He, K.; Deng, L.; Yuan, Q.; Du, R.; Xiang, Y. EliMFS: Achieving Efficient, Leakage-Resilient, and Multi-Keyword Fuzzy Search on Encrypted Cloud Data. IEEE Trans. Serv. Comput. 2020, 13, 1072–1085. [Google Scholar] [CrossRef]
Shan, B.; Yao, Y.; Li, W.; Zuo, X.; Yu, N. Fuzzy Keyword Search over Encrypted Cloud Data with Dynamic Fine-grained Access Control. In Proceedings of the 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Wuhan, China, 9–11 December 2022; pp. 1340–1347. [Google Scholar] [CrossRef]
Xia, Z.; Wang, X.; Sun, X.; Wang, Q. A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data. IEEE Trans. Parallel Distrib. Syst. 2016, 27, 340–352. [Google Scholar] [CrossRef]
Fu, Z.; Wu, X.; Wang, Q.; Ren, K. Enabling Central Keyword-Based Semantic Extension Search Over Encrypted Outsourced Data. IEEE Trans. Inform. Forensics Secur. 2017, 12, 2986–2997. [Google Scholar] [CrossRef]
Fu, Z.; Wu, X.; Guan, C.; Sun, X.; Ren, K. Toward Efficient Multi-Keyword Fuzzy Search Over Encrypted Outsourced Data With Accuracy Improvement. IEEE Trans. Inform. Forensic Secur. 2016, 11, 2706–2716. [Google Scholar] [CrossRef]
Zhong, H.; Li, Z.; Cui, J.; Sun, Y.; Liu, L. Efficient dynamic multi-keyword fuzzy search over encrypted cloud data. J. Netw. Comput. Appl. 2020, 149, 102469. [Google Scholar] [CrossRef]
Wang, C.; Ren, K.; Yu, S.; Urs, K.M.R. Achieving usable and privacy-assured similarity search over outsourced cloud data. In Proceedings of the 2012 Proceedings IEEE INFOCOM, Orlando, FL, USA, 25–30 March 2012; pp. 451–459. [Google Scholar] [CrossRef]
Zhang, H.; Zhao, S.; Guo, Z.; Wen, Q.; Li, W.; Gao, F. Scalable Fuzzy Keyword Ranked Search Over Encrypted Data on Hybrid Clouds. IEEE Trans. Cloud Comput. 2023, 11, 308–323. [Google Scholar] [CrossRef]
Kuzu, M.; Islam, M.S.; Kantarcioglu, M. Efficient Similarity Search over Encrypted Data. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, Arlington, VA, USA, 1–5 April 2012; pp. 1156–1167. [Google Scholar] [CrossRef]
Wang, B.; Yu, S.; Lou, W.; Hou, Y.T. Privacy-preserving multi-keyword fuzzy search over encrypted data in the cloud. In Proceedings of the IEEE INFOCOM 2014—IEEE Conference on Computer Communications, Toronto, ON, Canada, 27 April–2 May 2014; pp. 2112–2120. [Google Scholar] [CrossRef]
Tong, Q.; Miao, Y.; Weng, J.; Liu, X.; Choo, K.-K.R.; Deng, R. Verifiable Fuzzy Multi-keyword Search over Encrypted Data with Adaptive Security. IEEE Trans. Knowl. Data Eng. 2022, 35, 5386–5399. [Google Scholar] [CrossRef]
Li, X.; Tong, Q.; Zhao, J.; Miao, Y.; Ma, S.; Weng, J. VRFMS: Verifiable Ranked Fuzzy Multi-keyword Search over Encrypted Data. IEEE Trans. Serv. Comput. 2022, 16, 698–710. [Google Scholar] [CrossRef]
Cai, C.; Weng, J.; Yuan, X.; Wang, C. Enabling Reliable Keyword Search in Encrypted Decentralized Storage with Fairness. IEEE Trans. Dependable Secur. Comput. 2021, 18, 131–144. [Google Scholar] [CrossRef]
Wang, S.; Zhang, Y.; Zhang, Y. A Blockchain-Based Framework for Data Sharing With Fine-Grained Access Control in Decentralized Storage Systems. IEEE Access 2018, 6, 38437–38450. [Google Scholar] [CrossRef]
Hu, S.; Cai, C.; Wang, Q.; Wang, C.; Luo, X.; Ren, K. Searching an Encrypted Cloud Meets Blockchain: A Decentralized, Reliable and Fair Realization. In Proceedings of the IEEE INFOCOM 2018—IEEE Conference on Computer Communications, Honolulu, HI, USA, 15–19 April 2018; pp. 792–800. [Google Scholar] [CrossRef]
Chen, Z.; Wu, A.; Li, Y.; Xing, Q.; Geng, S. Blockchain-Enabled Public Key Encryption with Multi-Keyword Search in Cloud Computing. Secur. Commun. Netw. 2021, 2021, 6619689. [Google Scholar] [CrossRef]
Guo, Y.; Zhang, C.; Wang, C.; Jia, X. Towards Public Verifiable and Forward-Privacy Encrypted Search by Using Blockchain. IEEE Trans. Dependable Secur. Comput. 2022, 20, 2111–2126. [Google Scholar] [CrossRef]
Chai, Q.; Gong, G. Verifiable symmetric searchable encryption for semi-honest-but-curious cloud servers. In Proceedings of the 2012 IEEE International Conference on Communications (ICC), Ottawa, ON, Canada, 10–15 June 2012; pp. 917–922. [Google Scholar] [CrossRef]
Li, J.; Ma, J.; Miao, Y.; Chen, L.; Wang, Y.; Liu, X. Verifiable Semantic-Aware Ranked Keyword Search in Cloud-Assisted Edge Computing. IEEE Trans. Serv. Comput. 2022, 15, 3591–3605. [Google Scholar] [CrossRef]
Fu, S.; Zhang, Q.; Jia, N.; Xu, M. A Privacy-preserving Fuzzy Search Scheme Supporting Logic Query over Encrypted Cloud Data. Mob. Netw. Appl. 2021, 26, 1574–1585. [Google Scholar] [CrossRef]
Ge, X.; Yu, J.; Hu, C.; Zhang, H.; Hao, R. Enabling Efficient Verifiable Fuzzy Keyword Search Over Encrypted Data in Cloud Computing. IEEE Access 2018, 6, 45725–45739. [Google Scholar] [CrossRef]
Huang, R.; Li, Z.; Wu, G. A Verifiable Encryption Scheme Supporting Fuzzy Search. In Security, Privacy, and Anonymity in Computation, Communication, and Storage; Wang, G., Feng, J., Bhuiyan, M.Z.A., Lu, R., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2019; Volume 11611, pp. 397–411. [Google Scholar] [CrossRef]
Xiangyang, Z.; Hua, D.; Xun, Y.; Geng, Y.; Xiao, L. MUSE: An Efficient and Accurate Verifiable Privacy-Preserving Multikeyword Text Search over Encrypted Cloud Data. Secur. Commun. Netw. 2017, 2017, 1923476. [Google Scholar] [CrossRef]

Figure 1. p-stable element mapping.

Figure 2. System model.

Figure 3. LSH-Bloom Architectural Diagram.

Figure 4. The index tree.

Figure 5. (a) Index generation time. (b) The search matching time when search keyword number is 3. (c) The search matching time when the file number is 2000. (d) The fuzzy search accuracy. Zhong, H. et al (2020) refers to the scheme in ref. [15], Li, X. et al (2022) refers to the scheme in ref. [21], Li, J. et al (2022) refers to the scheme in ref. [28] and Fu, S. et al (2021) refers to the scheme in ref. [29].

Table 1. Description of Symbols.

Notation	Meaning
$D = {F_{1}, F_{2}, \dots, F_{n}}$	the file set of $D O$
$C = {C_{1}, C_{2}, \dots, C_{n}}$	the ciphertext set
$p k$	the public key
$Γ$	the access policy
$K_{i}$	the symmetric key for the corresponding type of files.
$C_{K_{i}}$	the key cipher containing access policy
$K_{2}$	the verification key
$S_{u}$	the user attributes collection
$W_{q}$	a query keyword collection
$K S$	the index encryption key
$S K_{u}$	attribute private key for users
$S K_{s}$	the user search key
$T_{t}$	a trapdoor with time stamp
$K W$	a collection of keywords for a file set
$K W_{F_{i}}$	the keyword set for file $F_{i}$
$S T_{k w}$	the stemming of the keyword $k w$
$S T_{K W_{F_{i}}}$	the keyword stemming set for file $F_{i}$
$M A C_{c_{i}}$	the ciphertext message authentication code
$K S = (M_{1}, M_{2}, S)$	the key set for the encrypted trapdoor and index

Table 2. Functionality Comparison of Different Schemes.

Scheme	Scheme [15]	Scheme [28]	Scheme [25]	Scheme [29]	Scheme [21]	Ours
Multi-Keyword	✓	✓	✓	✓	✓	✓
Dynamic Updates	✓	✘	✓	✘	✓	✓
Fuzzy Search	✓	✘	✘	✓	✓	✓
Integrity Verification	✘	✓	✓	✘	✓	✓
Correctness Verification	✘	✓	✓	✓	✓	✓
Related Sorting	✓	✓	✘	✓	✓	✓

Table 3. Time Complexity of Different Schemes.

Scheme	Scheme [30]	Scheme [31]	Scheme [21]	Ours
Index Generation	$O (Z Q)$	$O (Z Q)$	$O (n l^{2})$	$O ((2 n - 1) l^{2})$
Trapdoor Generation	$O (Z)$	$O (Z)$	$O (l^{2})$	$O (l^{2})$
Search Cost	$O (Z)$	$O (Z \log Q)$	$O (n l)$	$O (r \log n)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yan, X.; Cheng, P.; Tang, Y.; Zhang, J. Blockchain-Assisted Verifiable and Multi-User Fuzzy Search Encryption Scheme. Appl. Sci. 2024, 14, 11740. https://doi.org/10.3390/app142411740

AMA Style

Yan X, Cheng P, Tang Y, Zhang J. Blockchain-Assisted Verifiable and Multi-User Fuzzy Search Encryption Scheme. Applied Sciences. 2024; 14(24):11740. https://doi.org/10.3390/app142411740

Chicago/Turabian Style

Yan, Xixi, Pengyu Cheng, Yongli Tang, and Jing Zhang. 2024. "Blockchain-Assisted Verifiable and Multi-User Fuzzy Search Encryption Scheme" Applied Sciences 14, no. 24: 11740. https://doi.org/10.3390/app142411740

APA Style

Yan, X., Cheng, P., Tang, Y., & Zhang, J. (2024). Blockchain-Assisted Verifiable and Multi-User Fuzzy Search Encryption Scheme. Applied Sciences, 14(24), 11740. https://doi.org/10.3390/app142411740

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Blockchain-Assisted Verifiable and Multi-User Fuzzy Search Encryption Scheme

Abstract

1. Introduction

2. Related Work

2.1. Fuzzy Searchable Encryption

2.2. Blockchain-Based Searchable Encryption

3. Preliminaries

3.1. Bloom Filter (BF)

3.2. Locality Sensitive Hashing (LSH)

4. Definitions of Algorithm and Security Model

4.1. System Model

4.2. Algorithm Framework

4.2.1. Definition

4.2.2. Algorithm Definition

4.3. Security Model

5. Proposed Scheme

5.1. Main Idea

5.2. Concrete Construction

5.2.1. Preliminary Work

5.2.2. Encryption Work

5.2.3. Search Match

5.2.4. The Subsequent Work

6. Program Analysis

6.1. Correctness Analysis

6.1.1. Fuzzy Search Correctness

6.1.2. Decryption Correctness

6.2. Security Analysis

7. Efficiency Analysis

7.1. Functionality Comparison

7.2. Complexity Theoretical Analysis

7.3. Experimental Performance

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI