[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1376616.1376729acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

SPADE: the system s declarative stream processing engine

Published: 09 June 2008 Publication History

Abstract

In this paper, we present Spade - the System S declarative stream processing engine. System S is a large-scale, distributed data stream processing middleware under development at IBM T. J. Watson Research Center. As a front-end for rapid application development for System S, Spade provides (1) an intermediate language for flexible composition of parallel and distributed data-flow graphs, (2) a toolkit of type-generic, built-in stream processing operators, that support scalar as well as vectorized processing and can seamlessly inter-operate with user-defined operators, and (3) a rich set of stream adapters to ingest/publish data from/to outside sources. More importantly, Spade automatically brings performance optimization and scalability to System S applications. To that end, Spade employs a code generation framework to create highly-optimized applications that run natively on the Stream Processing Core (SPC), the execution and communication substrate of System S, and take full advantage of other System S services. Spade allows developers to construct their applications with fine granular stream operators without worrying about the performance implications that might exist, even in a distributed system. Spade's optimizing compiler automatically maps applications into appropriately sized execution units in order to minimize communication overhead, while at the same time exploiting available parallelism. By virtue of the scalability of the System S runtime and Spade's effective code generation and optimization, we can scale applications to a large number of nodes. Currently, we can run Spade jobs on ≈ 500 processors within more than 100 physical nodes in a tightly connected cluster environment. Spade has been in use at IBM Research to create real-world streaming applications, ranging from monitoring financial market feeds to radio telescopes to semiconductor fabrication lines.

References

[1]
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the Borealis stream processing engine. In Proceedings of the Conference on Innovative Data Systems Research, CIDR, 2005.
[2]
L. Amini, H. Andrade, R. Bhagwan, F. Eskesen, R. King, P. Selo, Y. Park, and C. Venkatramani. SPC: A distributed, scalable platform for data mining. In Proceedings of the Workshop on Data Mining Standards, Services and Platforms, DM-SSP, 2006.
[3]
H. Andrade, B. Gedik, K.-L. Wu, and P. S. Yu. On optimizing aggregations and joins for high-performance data stream processing. In to be submitted to International Conference on Supercomputing, ACM ICS, 2008.
[4]
A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, R. Motwani, I. Nishizawa, U. Srivastava, D. Thomas, R. Varma, and J. Widom. STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, 26, 2003.
[5]
A. Arasu, S. Babu, and J. Widom. The CQL continuous query language: Semantic foundations and query execution. Technical report, InfoLab ? Stanford University, October 2003.
[6]
H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, E. Galvez, J. Salz, M. Stonebraker, N. Tatbul, R. Tibbetts, and S. Zdonik. Retrospective on Aurora. VLDB Journal, Special Issue on Data Stream Processing, 2004.
[7]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, V. Raman, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proceedings of the Conference on Innovative Data Systems Research, CIDR, 2003.
[8]
Coral8, inc. http://www.coral8.com/, May 2007.
[9]
IBM DB2. http://www.ibm.com/db2, October 2007.
[10]
B. Gedik, R. R. Bordawekar, and P. S. Yu. CellSort: High performance sorting on the Cell processor. In Proceedings of the Very Large Data Bases Conference, VLDB, 2007.
[11]
B. Gedik, P. S. Yu, and R. R. Bordawekar. Executing stream joins on the Cell processor. In Proceedings of the Very Large Data Bases Conference, VLDB, 2007.
[12]
L. Girod, Y. Mei, R. Newton, S. Rost, A. Thiagarajan, H. Balakrishnan, and S. Madden. XStream: A signal-oriented data stream management system. In Proceedings of the International Conference on Data Engineering, IEEE ICDE, 2008.
[13]
IBM general parallel file system. http://www.ibm.com/systems/clusters/software/gpfs, October 2007.
[14]
G. Hulten and P. Domingos. VFML ? a toolkit for mining high-speed time-changing data streams. http://www.cs.washington.edu/dm/vfml/, 2003.
[15]
IBM. Cell Broadband Engine architecture. Technical Report Version 1.0, IBM Systems and Technology Group, 2005.
[16]
Intel. IXP2400 network processor hardware reference manual. Technical report, Intel Corporation, May 2003.
[17]
N. Jain, L. Amini, H. Andrade, R. King, Y. Park, P. Selo, and C. Venkatramani. Design, implementation, and evaluation of the linear road benchmark on the stream processing core. In Proceedings of the International Conference on Management of Data, ACM SIGMOD, 2006.
[18]
P. Kipfer and R. Westermann. GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, chapter 46. Addison Wesley, 2005.
[19]
Z. Liu, A. Ranganathan, and A. V. Riabov. Use of OWL for describing stream processing components to enable automatic composition. In OWL: Experiences and Directions, OWLED, 2007.
[20]
Mathworks MATLAB. http://www.mathworks.com/, October 2007.
[21]
StreamBase Systems. http://www.streambase.com/, May 2007.
[22]
IBM UIMA. http://www.research.ibm.com/UIMA/, Aug 2007.
[23]
J. D. Ullman. Database and Knowledge-Base Systems. Computer Science Press, 1988.
[24]
IBM WebSphere front office for financial markets. http://www.ibm.com/software/integration/wfo, October 2007.
[25]
K.-L. Wu, P. S. Yu, B. Gedik, K. W. Hildrum, C. C. Aggarwal, E. Bouillet, W. Fan, D. A. George, X. Gu, G. Luo, and H. Wang. Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S. In Proceedings of the Very Large Data Bases Conference, VLDB, 2007.

Cited By

View all
  • (2024)Data-Aware Adaptive Compression for Stream ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337771036:9(4531-4549)Online publication date: Sep-2024
  • (2023)CompressStreamDB: Fine-Grained Adaptive Stream Processing without Decompression2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00038(408-422)Online publication date: Apr-2023
  • (2022)Stream processing with dependency-guided synchronizationProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508413(1-16)Online publication date: 2-Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
June 2008
1396 pages
ISBN:9781605581026
DOI:10.1145/1376616
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. distributed data stream processing

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)60
  • Downloads (Last 6 weeks)11
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Data-Aware Adaptive Compression for Stream ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.337771036:9(4531-4549)Online publication date: Sep-2024
  • (2023)CompressStreamDB: Fine-Grained Adaptive Stream Processing without Decompression2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00038(408-422)Online publication date: Apr-2023
  • (2022)Stream processing with dependency-guided synchronizationProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508413(1-16)Online publication date: 2-Apr-2022
  • (2022)Multi-Query Optimization of Incrementally Evaluated Sliding-Window AggregationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.302977034:8(3899-3911)Online publication date: 1-Aug-2022
  • (2022)Distributed-Swarm: A Real-Time Pattern Detection Model Based on Density ClusteringIEEE Access10.1109/ACCESS.2022.317936710(59832-59842)Online publication date: 2022
  • (2022)Targeting a light-weight and multi-channel approach for distributed stream processingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.04.022167(77-96)Online publication date: Sep-2022
  • (2022)Adaptive SQL Query Optimization in Distributed Stream Processing: A Preliminary StudySoftware Foundations for Data Interoperability10.1007/978-3-030-93849-9_7(96-109)Online publication date: 19-Jan-2022
  • (2021)RailgunProceedings of the VLDB Endowment10.14778/3476311.347638414:12(3069-3082)Online publication date: 28-Oct-2021
  • (2021)AnankeProceedings of the VLDB Endowment10.14778/3430915.343092814:3(391-403)Online publication date: 9-Dec-2021
  • (2021)Distributed Stream KNN JoinProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457269(1597-1609)Online publication date: 9-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media