Abstract
Processes in our world are of a temporal probabilistic relational nature. An epidemic is an example of such a process. This dissertation abstract uses the scenario of an epidemic to illustrate the lifted dynamic junction tree algorithm (LDJT), which is a temporal probabilistic relational inference algorithm. More specifically, we argue that existing propositional temporal probabilistic inference algorithms are not suited to model an epidemic, i.e., without accounting for the relational part, and present how LDJT uses the relational aspect. Additionally, we illustrate how LDJT preserves groups of indistinguishable objects over time and have a look at LDJT from a theoretical side.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Our world is inherently uncertain. With probabilistic inference, we can answer probability queries given a (probabilistic) model. But, our world is not only uncertain. Our world also proceeds in time and is relational. To illustrate, such a temporal probabilistic relational scenario, let us have a look at an epidemic. Overall, the number of people being sick, with a certain illness, determines the probability of an epidemic. In such a scenario, there are uncertainties w.r.t. someone being sick, e.g., there often is an incubation time before symptoms start or individuals simply do not test. Further, people traveling might increase the number of infections. Also, there is the possibility to treat people with medications. Additionally, an epidemic evolves over time. So in this scenario, we have people or individuals in relation to each other. Additionally, for an epidemic it does not matter, which individuals of a population are infected only how many are infected, and the number of sick persons might change over time. Overall, we have a temporal probabilistic relational setting.
Let us now use the scenario of an epidemic to determine challenges in such a temporal probabilistic relational setting. To be able to perform probabilistic inference, a representation or encoding of the scenario is needed. One widely used representation for temporal probabilistic inference is a dynamic Bayesian network (DBN) [10]. A DBN uses random variables, which are set into relation to each other, to model scenarios. Unfortunately, a DBN is a propositional formalism. Therefore, in our scenario, there would be random variables for each individual from a population. A random variable could for example be about whether a person is sick. Thus, a DBN would have millions of random variables to model an epidemic. Having so many random variables is completely inefficient and also infeasible. Hence, we need a way to compactly encode such a scenario using the underlying relational structure.
Once we have such a representation, there are also challenges w.r.t. performing inference. In the scenario, the number of sick persons determines the probability of an epidemic. So to answer queries w.r.t. an epidemic, we are interested in groups of individuals and not a particular individual. Thus, not only a way to compactly model these groups of indistinguishable individuals is needed, but also a way to efficiently reason over those.
Here, my dissertation comes into play which presents parameterised probabilistic dynamic models (PDMs), to model temporal probabilistic relational scenarios, and the lifted dynamic junction tree algorithm (LDJT), to perform efficient exact inference on PDMs [3]. PDMs allow to efficiently model random variables parameterised with logical variables that have a domain over constants [5]. In the epidemic scenario a domain would be a population and another domain the possible medications or treatments. Having a way to represent temporal probabilistic relational scenarios, we still need a way to efficiently perform inference on them. To efficiently handle the relational aspect of our scenario, LDJT treats individuals as indistinguishable, unless proven otherwise, and only performs inference for representatives of indistinguishable groups instead for each individual on its own. By using this exchangeability, LDJT allows for tractable inference w.r.t. domain sizes, i.e., inference in polynomial time w.r.t. domain sizes [11]. Overall, LDJT offers a wide range of query types, such as temporal probability queries [5, 8], most probably explanation (MPE), or maximum a posterior (MAP) queries [7] as well as offer decision making support [6].
As indicated LDJT uses tractability through exchangeability. Unfortunately, especially conditional probability queries can slowly turn groups of indistinguishable individuals into distinguishable individuals over time. However, often these distinguishable individuals are just barely distinguishable. In the worst case, i.e., having only distinguishable individuals, LDJT would not be able to benefit from exchangeability anymore and fall back to the propositional interface algorithm [10]. To counteract this problem, LDJT efficiently approximates groups of indistinguishable individuals over time [9]. Thereby, LDJT tames exact inference in temporal probabilistic relational models.
In the following, we illustrate all parts of LDJT using the epidemic example. Additionally, we outline new theoretical insights w.r.t. lifted inference originating from temporal lifted inference.
2 Inference in Temporal Probabilistic Relational Settings
In this section, we use our epidemic example to illustrate how such a scenario can be modelled efficiently. Afterwards, we illustrate how inference can be performed in an epidemic.
2.1 Parameterised Probabilistic Dynamic Models
A PDM is specified by a model for an initial time step as well as a temporal copy pattern. In Fig. 1, we see the temporal copy pattern of a PDM for our epidemic example. The ellipses correspond to parameterised random variables (PRVs) and the boxes to parametric factors (parfactors). The PRVs are composed of random variable names, which are parameterised with logical variable names. For example, we have \(Treat_t(X,M)\). Here, the random variable name is Treat and the names of the logical variables are X, for people, and M, for medications. Each logical variable has a domain. X has a constant for each individual in its domain and M a constant for each medication that is available. The PRV Epid is not parameterised with any logical variable. Thus, Epid is a propositional random variable. The PRVs are set into relation using parfactors. A parfactor maps the possible range values of its PRVs to a potential, i.e., a value from \(\mathbb {R}^+\). In a parfactor at least one potential has to be greater than 0. Assuming, all our PRVs are Boolean,Footnote 1 then \(g^0\) and \(g^1\) contain 8 mappings and \(g^{ie}\) and \(g^{is}\) contain 4 mappings for each time step. With these mappings, one can model that if someone is sick, the probability of traveling decreases as well as the probability for an epidemic increases.
PDMs have two underlying assumptions, namely a stationary process and the first-order Markov assumption. In the temporal copy pattern, we see the very same model once for time slice \(t-1\) and once for time slice t. Having a stationary process, the parfactors map to the very same potentials in each time step, i.e., an individual being sick influences the probabilities in the same way independent of the time step. Further, the parfactors \(g^{ie}\) and \(g^{is}\) model the temporal behaviour of our scenario. Similar to the mapping within one time slice, if one is sick in one time step there is a chance that one recovers in the next, but it is also likely that one remains sick. Here, we also see the first-order Markov assumption as the temporal copy pattern has only predecessors from the previous timeslice.Footnote 2 As for all factorised temporal approaches, the semantics is defined by unrolling the model for a given number of time steps T, i.e., instantiating it for T time steps, and then multiplying all factors into a full joint probability distribution.
The interesting aspect of PDMs in comparison to, for example, DBNs is that PDMs model processes on a first-order level using logical variables. In case the size of the population changes, from a model perspective all that changes is the domain size of the logical variable X. In a DBN of the epidemic scenario, which is a propositional model, increasing the population by one results in adding \(2 + m\) random variables and \(1+m\) factors to the model, where m is the domain size of M. Hence, modelling an epidemic with possibly millions of people can only be done compactly on a first-order level.
Next, we have a look at how LDJT exploits the model for efficient query answering.
2.2 Lifted Dynamic Junction Tree Algorithm
After having lifted the model representation to the first-order level, we now focus on lifting inference to the first-order level. To that end, we present LDJT. Assume, we can observe at every time step whether people are sick. Given the number of sick people, we could be interested whether there currently is an epidemic. Thus, we have at least as many queries as time steps. However, it would be really cumbersome to always consider the complete past as a whole. Therefore, LDJT uses conditional independences between time steps to make two time steps conditional independent from each other. So information about some PRVs of the previous time step suffices to answer the queries in the current time step.
In the following, we have a look at how LDJT and all its components help us to efficiently perform inference and help us to determine whether there is an epidemic. To do so, we have a look at several questions:
-
(i)
How does LDJT uses conditional independences within and between time steps,
-
(ii)
how does LDJT efficiently proceeds in time,
-
(iii)
what types of queries are supported,
-
(iv)
why multiple queries are crucial,
-
(v)
how does LDJT handle the relational aspect of an epidemic,
-
(vi)
how does LDJT restore symmetries in our representation, once events have broken these symmetries, and
-
(vii)
what kind of an advantages has LDJT in comparison to propositional temporal inference w.r.t. terms of their run time complexity.
How can we use conditional independences within and between time steps?
LDJT uses conditional independences between time steps to be able to proceed in time efficiently and conditional independences within time steps to efficiently answer multiple queries. To exploit conditional independences, LDJT uses a helper structure a so-called FO jtree. Basically, an FO jtree clusters a model into submodels using conditional independences, which then allows inference algorithms to answer queries on these smaller submodels. Before we start to illustrate LDJT using the epidemic scenario, we have a look at FO jtrees and how to include conditional independences between time steps in them on a more general level.
To use conditional independences within time steps, LDJT uses the static, i.e., non-temporal, lifted junction tree algorithm (LJT) [2]. LJT also uses an FO jtree as a helper structure. To be able to answer queries on a cluster, LJT propagates local information of clusters to all other clusters via so-called message passing. After the message passing, each cluster can answer queries about its PRVs. Thereby, LJT answers queries on submodels, which leads to efficient multiple query answering.
LDJT uses LJT as a subroutine to use conditional independences within time steps and to propagate local information via message-passing. Given that LDJT is a temporal algorithm, it constructs an FO jtree structure, which is then instantiated for a given time step. To construct the FO jtree structure from a PDM, LDJT uses all the parfactors that are necessary to answer queries for time step t given PRVs that make time step \(t-1\) and t conditionally independent. Therefore, LDJT construct the FO jtree structure using all parfactors from the temporal copy pattern belonging to time slice t, here \(g^0_{t}, g^{i,e}_{t}, g^{i,s}_{t},\) and \(g^1_{t}\). In Fig. 2, we see on the left the FO jtree that LDJT constructs for time step three, and on the right the FO jtree for time step four.
To be able to uses conditional independences between time steps, LDJT has to ensure that these independences are represented in the FO jtree structure. Let us call the PRVs that make two time steps conditionally independent interface PRVs, here \(\{Sick_t(X),\) \(Epid_t\}\). During the FO jtree structure construction,LDJT ensures that the interface PRVs for time step \(t-1\) are in one cluster, which we call in-cluster. LDJT also ensures that the interface PRVs for time step t are in one cluster, which we call out-cluster. While query answering, LDJT instantiates the FO jtree structure for each queried time step. In Fig. 2 between the FO jtrees for time step three and four, we see a so-called \(\alpha\)-message over the interface PRVs, which LDJT uses for the conditional independences between time steps.
How does LDJT exploit temporal conditional independences?
Given the two assumptions of PDMs, there is a set of PRVs that makes one time step conditionally independent from the next. The temporal conditional independence allows LDJT to answer queries for one time step at a time, which in turn allows LDJT to use LJT as a subroutine to answer queries for a given time step. Once all queries are answered for a given time step, LDJT passes the current state of the interface PRVs to the next time step as a message. To compute the \(\alpha\)-message, similar to messages in LJT, LDJT asks a conjunctive distribution query over the interface PRVs without normalising the result. The \(\alpha\)-message is all that LDJT needs to answer queries in the next time step, due to the temporal conditional independence.
Let us have a closer look at how LDJT computes the \(\alpha\)-message in the epidemic example. To compute the \(\alpha\)-message, LDJT uses the out-cluster as here the interface PRVs are guaranteed to be in the submodel. Thus, to compute \(\alpha _t\), LDJT uses the parfactors at \(\textbf{C}^2_t\). Here, the parfactors are \(g_t^1\) as well as the parfactor(s) from the message that \(\textbf{C}^2_t\) has received from \(\textbf{C}^1_t\) during message passing. To compute \(\alpha _t\), LDJT eliminates all non-interface PRVs from \(\textbf{C}^2_t\). Thus, LDJT eliminates \(Treat_t(X,M)\). To eliminate a PRV, LDJT uses the so-called lifted summing out, which has some preconditions [12]. Here, the preconditions hold. Roughly speaking, lifted summing out performs a summing out operation for a representative and then accounts for the number of ground, i.e., propositional, instances or eliminations that the representative actually stands for. Assume for example that there are two medications available. Then, in the ground model each individual has two factors, one for the first and another for the second medication. Thus, the lifted summing out has to account for two eliminations in the ground case and raise the result of the representative to the power of 2. Hence, LDJT uses the relational structure to avoid redundant computations. In case, we would have 100 medications, there would be 99 redundant computations for each individual if performed on a ground level. LDJT only performs the computation for one representative and then efficiently accounts for the relational structure.
What types of queries can be answered?
Answering queries is similar to computing messages between clusters. Here, LDJT as well has to eliminate all non-query terms, but LDJT has to return a probability distribution. Therefore, unlike for messages, the result of queries is normalised in the end. Given that LDJT is a temporal inference algorithm, it supports the default temporal query types, which are hindsight, filtering and prediction queries [5, 8]. Formally, the problem of answering a query \(P(A^i_\pi |\textbf{E}_{0:t})\), where \(A^i_\pi\) is a ground query term and \(\textbf{E}_{0:t}\) our observations or events up until time step t, w.r.t. the model is called hindsight for \(\pi < t\), filtering for \(\pi = t\), and prediction for \(\pi > t\). Additionally to hindsight, filtering and prediction queries, LDJT also efficiently answers conjunctive queries with query terms from different time steps [4]. Other query types that LDJT supports are lifted temporal MPE and MAP queries. As sum-product and max-product operations are not commutative, LDJT only guarantees to efficiently answer MAP queries over completely time steps. For example, sum out the first n time steps and then answer the corresponding MPE query [7]. Further, it is possible to add action and utility nodes to PDMs and use LDJT to support lifted decision making [6].
Why are multiple queries important?
Let us now have a look if it is really necessary for a temporal algorithm to support multiple queries. One reasonable assumption is that LDJT is provided a set of template queries that is answered in each time step. So in our example, one could for example be interested in the probability of an epidemic now as well as prediction queries for an epidemic in lets say 5 and 10 time steps. Thus, our set of query terms would look like \(\{epid_t, epid_{t+5}, epid_{t+10}\}\).Footnote 3 LDJT does not only prepare the FO jtree via message passing to be able to answer filtering queries, but also to proceed in time, which can be seen as another query. Here, we already have two queries, which LDJT has to answer and most likely in reality it would have to answer even more queries. The complexity of message passing is roughly equivalent to answering two queries on the complete model for a time step. Hence, starting from around the third query such a helper structure with the then necessary message passing pays off. For prediction queries, LDJT anyhow has to propagate the information to the out-cluster and then can directly use the corresponding out-cluster to answer the prediction queries without any additional overhead on a smaller cluster. Therefore, answering multiple queries efficiently is beneficial for temporal inference.
Why is the relational part so important (in an epidemic)?
So let us now have a closer look at how PDMs and LDJT help us answer queries when it comes to an epidemic. First of all, PDMs allow to efficiently model the temporal and relational structure of the underlying process. Therefore, PDMs allow to model an epidemic with millions of individuals, which would not be feasible with a propositional formalisms. LDJT also efficiently uses the temporal and relational structure of the underlying process. By performing lifted inference, LDJT reasons over representatives for each group of indistinguishable objects and thereby allows for tractable inference w.r.t. domain sizes, i.e., inference in polynomial time w.r.t. domain sizes. Hence, LDJT allows to perform inference in a scenario such as an epidemic with millions of individuals. In such a scenario, the runtime of a propositional algorithm would not be reasonable. Actually, most likely a propositional algorithm would not even be able to compute an answer to a single query. Thus, LDJT handles huge domain sizes efficiently.
In an epidemic the number of sick people influences the probability of an epidemic. One might not only be interested in the probability of an epidemic, but also how many people currently are most likely sick. By asking a MAP query on the PRV Sick, LDJT returns exactly the most likely state assignment for the Sick PRV including how many people are most likely sick and how many people are most likely healthy. In a propositional algorithm, one would somehow have to do crude aggregations over millions of random variables to achieve the same. Overall, LDJT efficiently answers for example the probability of an epidemic and with MAP queries also the most likely state assignment.
Can we make distinguishable individuals indistinguishable again?
LDJT treats individuals as indistinguishable unless proven otherwise. Evidence from conditional queries can make indistinguishable individuals distinguishable. In case 1000 individuals report that they tested positive to be sick, then these 1000 individuals become distinguishable from the other individuals. Nonetheless, LDJT still reasons here over one group with 1000 individuals and another group with the remaining individuals. However, a group of people does not always tests positive together at the exact same time step. Thereby, with slightly different events for individuals, the model can slowly ground over time. With a completely ground model, LDJT would fall back to be as efficient as the propositional interface algorithm [10]. However, does it really matter whether an individual has tested positive 90, 91, or 99 days ago? These individuals can be treated as indistinguishable again.
To keep the reasoning of LDJT polynomial, LDJT performs temporal approximate merging to tame temporal probabilistic relational inference. Roughly speaking, the temporal approximate merging identifies similar objects and groups them together. Similar objects are objects that contribute approximately in the very same way to the full join probability distribution. In such a way, LDJT clusters individuals again that for example tested positive a long time ago, as by now they are healthy again and contribute to the full joint distribution of the model in approximately the same, even though they have been once distinguishable given events at different time steps. Thereby, the number of indistinguishable individuals increases again and the reasoning of LDJT remains polynomial w.r.t. domain sizes. The approximation induces a minimal error and the approximation error is indefinitely bounded [9].
What kind of advantages do we obtain from a complexity perspective?
For the complexity comparison of lifted inference vs. propositional inference, we already hinted that lifted inference replaces many redundant computations by raising the result of a representative query to the power of a domain size. If we fix the domain size of M to 2, then by adding one constant to X, this adds 3 factors to each time step to the propositional model of our example. In case the domain size of M is set to 100, then this would add 101 new factors for each new individual. The number of factors in a model is a linear term in the complexity of the interface algorithm. In the lifted case increasing domain sizes leads to increasing a logarithmic term in the complexity of LDJT. Thus, from a theoretical point of view, we can see one benefit of lifted inference here.
But using the example, we can show that in the complexity of the interface algorithm [10] not only a linear term increases but also an exponential term increases when the number of individuals increases. The tree width, which is an exponential term, is determined by the size of largest cluster in a propositional model. The interface algorithm also ensures that all interface random variables are in one cluster. Thus, by increasing the number of individuals the number of interface random variables increases as each grounding of Sick(X) is in the interface. Hence, the propositional tree width increases. So far this problem has been dealt with by approximating the interface and dividing the conjunctive query in smaller subqueries. The approximation yields an exact result if the subqueries are independent [1]. In the epidemic scenario all groundings of Sick(X) are connected to Epid and therefore not independent. Therefore, approximating the interface would not yield an exact solution. For LDJT, the lifted width remains the same if the number of individuals increase and LDJT computes an exact solution. Thus, the lifted width is smaller than the tree width in case there is more than one person in the domain of X. For static lifted inference, the lifted width is only smaller than the tree width under the presence of so-called countings [13, 14]. However, in the temporal case the lifted width is also smaller than the tree width without countings. Overall, modelling an epidemic, which includes a huge number of individuals, is not feasible with a propositional approach. PDMs allow to efficiently model such a scenario and LDJT efficiently tackles the inference problems.
3 Conclusion
This article illustrates all parts of LDJT, an efficient temporal probabilistic relational inference algorithm, in an epidemic scenario. Existing propositional temporal inference algorithms are not suitable to model and perform inference in such a scenario as they ignore the relational structure. By lifting the representation to the first-order level, LDJT reasons over groups of indistinguishable objects and thereby achieves tractability through exchangeability, i.e., it only reasons over representatives and thereby achieve tractable inference w.r.t. domain sizes. Thus, instead of being in the worst case exponential w.r.t. domain sizes, LDJT is in the worst case only polynomial w.r.t. domain sizes. For the epidemic scenario, we present that increasing domain sizes leads to increasing an exponential term in the complexity of the propositional interface algorithm, while in the complexity for LDJT only a logarithmic term increases. Unfortunately, conditional queries can slowly turn groups of indistinguishable objects into distinguishable objects. To keep the benefits of tractable benefits, we also present how LDJT approximates groups of indistinguishable objects over time and thereby tames temporal probabilistic relational inference.
Interesting future directions are to relax assumptions of LDJT w.r.t. first-order Markov assumptions and stationary processes. Another interesting direction in general and for health care in particular is how lifting and logical variables can help to preserve privacy, especially in the temporal setting. Additionally, to move from a correlation based formalism to a causal based formalism might be interesting to generate even better explanations of for example suggested action plans.
Notes
Actually any discrete range is possible.
Easily extendable to k-th-order Markov assumption.
We can also have queries w.r.t. any other PRV.
References
Boyen X, Koller D (1998) Tractable inference for complex stochastic processes. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, pp 33–42. Morgan Kaufmann Publishers Inc, Massachusetts
Braun T, Möller R (2016) Lifted junction tree algorithm. In: Proceedings of KI 2016: Advances in Artificial Intelligence, pp 30–42. Springer, New York
Gehrke M (2021) Taming Exact Inference in Temporal Probabilistic Relational Models. Ph.D. thesis, University of Lübeck
Gehrke M, Braun T, Möller R (2018) Answering multiple conjunctive queries with the lifted dynamic junction tree algorithm. In: Proceedings of the AI 2018: Advances in Artificial Intelligence, pp 543–555. Springer, New York
Gehrke M, Braun T, Möller R (2018) Lifted dynamic junction tree algorithm. In: Proceedings of the 23rd International Conference on Conceptual Structures, pp 55–69. Springer, New York
Gehrke M, Braun T, Möller R (2019) Lifted temporal maximum expected utility. In: Proceedings of the 32nd Canadian Conference on Artificial Intelligence, Canadian AI 2019, pp 380–386. Springer, New York
Gehrke M, Braun T, Möller R (2019) Lifted temporal most probable explanation. In: Proceedings of the 24th International Conference on Conceptual Structures, pp 72–85. Springer, New York
Gehrke M, Braun T, Möller R (2019) Relational forward backward algorithm for multiple queries. In: Proceedings of the 32nd International Florida Artificial Intelligence Research Society Conference (FLAIRS-32), pp 464–469. AAAI Press, New York
Gehrke M, Möller R, Braun T (2020) Taming reasoning in temporal probabilistic relational models. In: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI 2020), pp 2592–2599. IOS Press, New York
Murphy KP (2002) Dynamic Bayesian networks: representation, inference and learning. Ph.D. thesis, University of California, Berkeley
Niepert M, Van den Broeck G (2014) Tractability through exchangeability: a new perspective on efficient probabilistic inference. In: AAAI14 Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp 2467–2475. AAAI Press, New York
Taghipour N (2013) Lifted probabilistic inference by variable elimination. Ph.D. thesis, Ph. D. Dissertation, KU Leuven
Taghipour N, Davis J, Blockeel H (2013) First-order decomposition trees. In: NIPS13 Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, pp 1052–1060. Curran Associates Inc
Taghipour N, Davis J, Blockeel H (2013) Generalized counting for lifted variable elimination. In: International Conference on Inductive Logic Programming, pp 107–122. Springer, New York
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gehrke, M. Dissertation Abstract: Taming Exact Inference in Temporal Probabilistic Relational Models. Künstl Intell 38, 219–224 (2024). https://doi.org/10.1007/s13218-023-00813-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13218-023-00813-w