D.2 Proof of Proposition 1
The sampling algorithm and its subroutine are formally defined in Algorithms
2 and
3. In the following, we assume that: (i) a given individual
\(i~\) \(\in\) \(\mathcal {V}\) makes
\(O(t_{\text{max}})\) visits to sites
\(\mathcal {S}\) over the horizon
\([0, t_{\text{max}})\); (ii) the mobility model is
sparse, i.e., every individual
\(i \in \mathcal {V}\) has
\(O(1)\) unique contact persons; and, (iii) there are no containment measures. This implies that there are a total number of
\(O(t_{\text{max}})\) contact windows of
i with all other individuals
\(j \in \mathcal {V}\). Following our implementation [
57], we assume that the mobility traces
\(P_{i,k}(t)\) are given as an unsorted list of time intervals
\([t_0, t_1]\), where each time interval indicates a visit of an individual
\(i \in \mathcal {V}\) to a site
\(k \in \mathcal {S}\) during the simulated time period.
Event queue. In any possible trajectory of the epidemiological state variables, there is a constant number of events not concerning exposure that can be pushed to the event queue Q per individual. This is because every individual transitions through at most a finite set of states. In addition, since by assumption (ii) the mobility model is sparse, there is a constant number of exposure events caused by and thus pushed per individual. Thus, the overall number of events pushed to the event queue throughout the simulation is \(O(|\mathcal {V} |)\). This is an upper bound on the size of the queue at any point in the simulation. Using the standard heap implementation of a priority queue, pushing to and popping from the temporally-sorted event queue Q hence incur cost \(O(\log (|\mathcal {V} |))\) in the worst case.
Preprocessing of contacts. Sampling exposures caused by an infectious individual
i relies on querying the contacts with other individuals
j by checking their overlapping visits to sites
\(\mathcal {S}\). To do this efficiently, the mobility traces are preprocessed into efficient interval data structures called
interval trees.
5 For this, we initialize two dictionaries that store two kinds of interval trees, visits by individuals and visits to sites, respectively. Both dictionaries are populated by iterating once over all
\(O(t_{\text{max}}|\mathcal {V} |)\) site visits in the simulated period. For each visit, its time interval is inserted both into the tree of visits by the corresponding individual
\(i \in \mathcal {V}\) as well as the tree associated to the site
\(k \in \mathcal {S}\). Then, by assumption (i), the interval trees stratified by
individual have size
\(O(t_{\text{max}})\) and intervals do not overlap by construction. Moreover, the interval trees stratified by
site contain
\(O(t_{\text{max}}|\mathcal {V} |)\) visits and, by assumption (ii), any interval overlaps with
\(O(1)\) others. Thus, the total time incurred for constructing all visit interval trees is
\(O(t_{\text{max}}|\mathcal {V} | \log (t_{\text{max}}|\mathcal {V} |))\).
Using these two sets of
visit interval trees, we build a collection of
\(O(1)\) contact interval trees for each individual
\(i \in \mathcal {V}\). These contain the contact windows from
i to each of its unique contact persons
j.
6 To generate the contact trees for
i, we iterate over all
\(O(t_{\text{max}})\) visits of
i. For each visit, we query the interval tree of the visited site in time
\(O(\log (t_{\text{max}}|\mathcal {V} |))\) to retrieve the
\(O(1)\) contact persons
j during that visit. Given this, we update the individual contact interval tree from
i to
j in time
\(O(\log (t_{\text{max}}))\). Like individual visit traces, the contact intervals do not overlap by construction. The overall preprocessing cost remains
\(O(t_{\text{max}}|\mathcal {V} | \log (t_{\text{max}}|\mathcal {V} |))\).
Handling events. The backbone of the sampling procedure in Algorithm
2 consists of processing state transition events in the temporal order. All generic state transitions in the model,
i.e., those
not transitioning to an infectious state, consist of updating the correct indicator variables of the corresponding individual
i or discarding events that became invalid due to thinning in constant time. In addition, we push the next state transition of
i to
Q, which takes time
\(O(\log (|\mathcal {V} |))\). Since there are
\(O(|\mathcal {V} |)\) generic events in the worst case, handling all of them takes an overall time of
\(O(|\mathcal {V} | \log (|\mathcal {V} |))\).
When an individual
i first transitions to an
infectious state, i.e., the presymptomatic
\(I^p_i=1\) or asymptomatic
\(I^a_i=1\) state, an additional time cost is incurred because we sample the times of the exposure events
caused by
i to all of its unique contact persons
j in the future. This corresponds to calls of Algorithm
3, where we continually iterate over all
\(O(t_{\text{max}})\) contact windows
i has with
j after some time
t until the first valid exposure event is sampled. Specifically, we sample a next time
t as
\(t \leftarrow t + \tau\) with
\(\tau \sim \text{Expo}(\lambda _{\max })\). If
i is still in contact with
j at time
t, and if the event is not rejected using thinning due to a lower site-specific exposure rate
\(\beta _k\) or asymptomatic infectiousness
\(\gamma\), the exposure time is valid and we push the event to
Q in time
\(O(\log (|\mathcal {V} |))\). Otherwise, we repeat. Since there are at most
\(O(t_{\text{max}})\) contact windows of
i with
j, each query to
InContact as formalized in Algorithm
3 incurs time
\(O(\log (t_{\text{max}}))\) using the interval tree.
Let
U be the random variable representing the runtime of Algorithm
3 incurred by
one contact window from
i to
j. In addition, let
\(q \in (0,1)\) be the probability that a given thinning sample gets accepted. By the memoryless property of thinning, the expected value of
U is given by
In the worst case, thinning is done for all
\(O(t_{\text{max}})\) contact windows from
i to
j until an exposure event time gets accepted. Overall, Algorithm
3 is called
\(O(1)\) times per individual. Thus, the processing of state transitions to infectious states of all
\(O(|\mathcal {V} |)\) infectious individuals incurs an additional overall cost of
\(O(|\mathcal {V} | t_{\text{max}}(\log (|\mathcal {V} |) + \tfrac{1}{q} t_{\text{max}}\log (t_{\text{max}})))\). This also accounts for the cost of sampling household exposures, which can be viewed as visits to an additional site with an additional set of
\(O(1)\) household contacts. We note that
q is a constant that depends only on the exposure rate of the epidemiological model, and any lower bound thereof across sites and individuals suffices for Equation (
18).
Expected runtime. Combining the preprocessing cost, the handling of all generic state transitions, and the handling of transitions to infectious states, Algorithm
2 has a total expected runtime of
□