Punctuated data streams

January 2005

Author:
Peter A. Tucker
OGI School of Science & Engineering
,
Supervisor:
David Maier
OGI School of Science & Engineering

Publisher:

OGI School of Science & Engineering

ISBN:978-0-542-25737-7

Order Number:AAI3184366

Pages:

280

Purchase on ProQuest

Bibliometrics

Abstract

As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or unbounded data streams. Unbounded stateful operators maintain state with no upper bound on its size, and so eventually run out of memory. Blocking operators read the entire input before emitting a single output, and so might never produce a result. We believe that a priori semantic knowledge of a data stream can permit the use of such operators in some cases. We explore a kind of stream semantics called punctuated streams . Punctuations in a stream mark the end of substreams, allowing us to view a non-terminating stream as a mixture of terminating streams. We introduce three kinds of invariants to specify the proper behavior of query operators in the presence of punctuation. Pass invariants unblock blocking operators by defining when such an operator can pass results on. Keep invariants define what must be kept in local state to continue successful operation. Propagation invariants define when an operator can pass punctuation on. We then present a strategy for proving that implementations of these invariants are faithful to their finite table counterparts.

In practice, it is important to answer the following question: “How much additional overhead is required when using punctuations__ __” We use the scenario of a monitoring system for an online auction. Streams of bids, new items, and new users are sent to an online auction system. There are many interesting queries that can be posed over these auction streams. We define queries for this scenario, and execute them with different kinds and amounts of punctuations embedded in the input streams. We show that, for a reasonable ratio of punctuations to data items, the overhead is minimal. Additionally, we compare the behavior of a query using punctuations with the behavior of the same query using slack over data streams with disorder.

Clearly, not all punctuations are useful to a particular query, and it would be useful to make a determination of when they are. That is, we would like to answer the question “Can stream query Q benefit from a particular set of punctuations__ __” To that end, we first define punctuation schemes to specify the collection of punctuations that will be presented to a query on a particular data stream. We show how both punctuations and query operators induce groupings over the items in the domain of the input(s). We show that a query benefits from an input punctuation scheme (in terms of being able to produce a given output scheme), if each set in the groupings induced by the operators of the query is covered by a finite number of punctuations in the scheme—a kind of compactness .

We conclude with discussion on possible future directions of research related to punctuations and data streams. These directions focus on a variety of questions, ranging from issues in query optimization to other possible semantics that can be expressed using punctuations.

Cited By

Contributors

Peter A Tucker
Whitworth College
- Publication Years2003 - 2017
- Publication counts7
- Citation count537
- Available for Download2
- Downloads (cumulative)2,539
- Downloads (12 months)69
- Downloads (6 weeks)7
- Average Downloads per Article1,270
- Average Citation per Article77
View Full Profile
David Maier
Portland State University
- Publication Years1978 - 2024
- Publication counts197
- Citation count7,934
- Available for Download106
- Downloads (cumulative)76,014
- Downloads (12 months)7,916
- Downloads (6 weeks)964
- Average Downloads per Article717
- Average Citation per Article40
View Full Profile

Index Terms

Punctuated data streams

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Exploiting Punctuation Semantics in Continuous Data Streams

As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or infinite data streams. Unbounded stateful operators ...
Safety guarantee of continuous join queries over punctuated data streams
VLDB '06: Proceedings of the 32nd international conference on Very large data bases

Continuous join queries (CJQ) are needed for correlating data from multiple streams. One fundamental problem for processing such queries is that since the data streams are infinite, this would require the join operator to store infinite states and ...
Sketching distributed sliding-window data streams

While traditional data management systems focus on evaluating single, ad hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely ...

Browse Theses

Sections

Cited By

Index Terms

Exploiting Punctuation Semantics in Continuous Data Streams

Safety guarantee of continuous join queries over punctuated data streams

Sketching distributed sliding-window data streams

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Exploiting Punctuation Semantics in Continuous Data Streams

Safety guarantee of continuous join queries over punctuated data streams

Sketching distributed sliding-window data streams