PROUD: a probabilistic approach to processing similarity queries over uncertain data streams
Proceedings of the 12th international conference on extending database …, 2009•dl.acm.org
We present PROUD--A PRObabilistic approach to processing similarity queries over
Uncertain Data streams, where the data streams here are mainly time series streams. In
contrast to data with certainty, an uncertain series is an ordered sequence of random
variables. The distance between two uncertain series is also a random variable. We use a
general uncertain data model, where only the mean and the deviation of each random
variable at each timestamp are available. We derive mathematical conditions for …
Uncertain Data streams, where the data streams here are mainly time series streams. In
contrast to data with certainty, an uncertain series is an ordered sequence of random
variables. The distance between two uncertain series is also a random variable. We use a
general uncertain data model, where only the mean and the deviation of each random
variable at each timestamp are available. We derive mathematical conditions for …
We present PROUD -- A PRObabilistic approach to processing similarity queries over Uncertain Data streams, where the data streams here are mainly time series streams. In contrast to data with certainty, an uncertain series is an ordered sequence of random variables. The distance between two uncertain series is also a random variable. We use a general uncertain data model, where only the mean and the deviation of each random variable at each timestamp are available. We derive mathematical conditions for progressively pruning candidates to reduce the computation cost. We then apply PROUD to a streaming environment where only sketches of streams, like wavelet synopses, are available. Extensive experiments are conducted to evaluate the effectiveness of PROUD and compare it with Det, a deterministic approach that directly processes data without considering uncertainty. The results show that, compared with Det, PROUD offers a flexible trade-off between false positives and false negatives by controlling a threshold, while maintaining a similar computation cost. In contrast, Det does not provide such flexibility. This trade-off is important as in some applications false negatives are more costly, while in others, it is more critical to keep the false positives low.
ACM Digital Library