In this work, we introduce the
Proof of Timely-Retrievability (PoTR) mechanism: a storage proof that aims to assess whether a storage node can access stored data with a latency smaller than some specific threshold δ. The auditor can select the value of the δ parameter based on the specific requirements of a given application, but it is typically small and, in some cases, can only be satisfied if the audited (edge) node keeps the data locally. With the goal of applying the proof in edge computing scenarios, we assume small values of δ in the order of the time required to read a data block from the local disk. By supporting such strict values of δ, PoTR is capable of distinguishing the case in which the fog node stores the data locally (Figure
1(a)) from the case it keeps it in some remote node or cloud (Figure
1(b)).
An alternative could be to ask clients to report the latency they observe and use this information to perform the audit. However, this approach raises many privacy challenges, as an attacker could infer the client’s location given its latency to a fog node. If a PoTR can be extracted independently of the auditor location, then we avoid these vulnerabilities.
4.2 Design of the Challenge
The challenge requires the untrusted part of the fog node to access a given number of data objects, in a certain sequence, and return, at the end, a value related to these data objects. The delay the fog node takes to read these data blocks and to compute the final value is used by the auditor to estimate the reading delay observed at the audited node and to check if it matches the target threshold δ.
Each challenge (implicitly) specifies a sequence of files that must be accessed by the audited node, and each file is uniquely identified by a set index. For efficiency reasons, the fog node for each file reads a data block of size \(s_b\), instead of the entire file. In practice, the block size should be a multiple of the block size used by the fog node file system.
In each challenge
c, the fog node has to read a pseudorandom and unpredictable sequence of
N data blocks (each of size
\(s_b\)) and return a cryptographic hash of the concatenation of all data blocks accessed. The number
N of data blocks is a configuration parameter that influences the accuracy and efficiency of the challenge: The higher the value
N, the more accurate but less efficient the proof. The approach to configure this parameter is explained through Po
ATR in Section
5, and if the user is unable to configure
\(s_b\), then it is still possible to execute our challenge defining only
N, as discussed in Section
6.3.1.
The pseudorandom sequence of the data blocks is determined by a nonce \(\eta ^c\) (unique per challenge and generated by the auditor) and the content of the data blocks. For each challenge, the auditor sends the nonce (encrypted with a symmetric key), the number N of data blocks, and the size \(s_b\) of a block to the enclave. To ensure that the nonce \(\eta ^c\) is not disclosed to the untrusted part, the nonce never leaves the enclave. The untrusted part only has access to a cryptographic hash of the nonce; we denote this hash as #index. With the #index, the untrusted part is able to determine the set index of the first file to be accessed. The set index is determined by applying the modulo function (mod) to the hash with the total size of the set of files, i.e., the set index is the remainder of the division of the hash by the number of files.
Since the fog node has to read a data block of size \(s_b\), and not the entire file, the auditor sends to the enclave a second nonce \(\eta ^c_b\) that will determine a data block inside the file, with the computed set index. The enclave, in turn, computes the cryptographic hash of this second nonce and forwards the result to the untrusted part. Both nonces are hashed by the enclave to ensure that the untrusted part does not know them in plain text. Otherwise, the untrusted part could maliciously predetermine the N data blocks. Now, with the hashed \(\eta ^c_b\) and the total size of the file to be accessed, the untrusted part can determine the data block, with size \(s_b\), inside the file to be retrieved, by applying the modulo function.
When the untrusted part receives the first #index from the enclave, it proceeds to determine and read the corresponding initial data block. It then calculates a cryptographic hash of the data block’s content, combined with the #index, resulting in result_hash. Subsequently, the untrusted part returns the value of result_hash to the enclave. The enclave, in turn, computes a new #index by hashing the response result_hash together with the nonce \(\eta ^c\). This newly generated #index is then forwarded to the untrusted part, which uses it to identify the next file.
For each subsequent hash, the untrusted part repeats the execution of the modulo function to obtain the next set index and the index of the data block within the identified file. It reads the corresponding data block and calculates its cryptographic hash together with the file
#index, yielding
result_hash, which is returned to the enclave. The enclave computes two new hashes (block and file index), and this process continues until all
N data blocks are read. Note that each new
#index is computed using the hash of the previously retrieved data blocks. Finally, the enclave computes a final hash using
\(\eta ^c\) and sends the result back to the auditor as the conclusive proof. The auditor, upon receiving the final hash, repeats all the aforementioned computations to construct a verifiable version and verify the correctness of the proof. If both computations match, then the issued proof is deemed correct. In Figure
4, we present in detail all the steps of our communication protocol in the execution of our PoTR challenge. We separate the steps on the auditor’s side between the untrusted part and the enclave, where the sensitive parts of our challenge, such as the nonce, are always securely stored inside the enclave.
Security Guarantees: Each challenge requires two unique nonces that will determine in a sequential and pseudo-randomly faction which
N data blocks the untrusted part must read. This interaction with the enclave ensures that the fog node is unable to retrieve the
N data blocks in parallel from neighboring nodes. Instead, it must retrieve one block at a time to determine the next block it needs. Additionally, the dependency between data blocks, since the next block index depends on the content of the previous one, ensures that the fog node cannot reuse outputs from previous challenges, thus maintaining challenge freshness. The constant exchange between execution environments (enclave to untrusted part and vice versa) guarantees that the fog node cannot rely on a remote machine for the entire proof computation. The proof is computed solely on the expected machine [
28]. It is important to note that through the verification of proof correctness, the auditor can assess the integrity of the stored documents. Any modification to a file will render the proof invalid.
4.3 Reading Delay δ at the Audited Node
In addition to checking proof correctness, the auditor has to check the proof timeliness. When the challenge is sent to the enclave of the audited node, the auditor starts a timer that is stopped when the proof is received. A proof is valid if it is correct and the reading delay estimation, for a single data block, is acceptable to the auditor.
Having
\(T_i\) as the time elapsed between the auditor sending the challenge
i and receiving the response from the fog node,
\(T_i\) can be decomposed into different factors, namely:
where
\({\it rtt}_i\) is the network round trip time for challenge-response messages,
N the number of data blocks to be accessed,
\(\alpha _i^j\) the delay observed at the fog node to compute the cryptographic hash of a given data block
j and compute the index of the next block,
\(\delta _i^j\) the delay observed to read a data block
j, and a small constant delay
\(\epsilon\) to account for the cold start of the challenge. Note that for sufficiently large values of
N, we will have the following:
However, the auditor is unable to measure accurate values for
\({\it rtt}_i\),
\(\overline{\alpha }_i\), and
\(\overline{\delta }_i\), as they are different and variable in each challenge
i, and the auditor is only able to measure the total delay
\(T_i\). Therefore, to estimate
\(\overline{\delta }\), i.e., the actual mean delay a fog node takes to read a data object, in our work, we resort to mean values:
Therefore, the estimate error for a given challenge
i will depend on how far the observed values are from the mean values. As noted in Section
3.1, we assume that
\(\overline{rtt}\) is known
a priori by the auditor, based on the expected location of the audited node. Experimentally, we verified that the
\(\alpha\) values are subject to a negligibly small variance, whereby we assumed
\(\overline{\alpha }_i = \overline{\alpha }\), i.e., we ignore the error introduced by
\(\alpha\) sampling. Therefore, the error of the estimate
\(\overline{\delta }_{\text{estimate}}\) depends on two factors: (i) the reading error
\(\varepsilon ^\delta\) when estimating
\(\overline{\delta }\) (which depends on the sample size, i.e., the number of data blocks accessed) and (ii) the variance between the observed network round-trip time (
\(rtt_i\)) and the expected mean value (
\(\overline{rtt}\)), divided by the number
N of data blocks, as Equation (
5) describes: