[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2874551.2874557guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Scalable linked data stream processing via network-aware workload scheduling

Published: 21 October 2013 Publication History

Abstract

In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy--a uniform distribution of computation load among available machines--typically used by stream processing systems, disregards network-load as one of the major bottlenecks for throughput resulting in an immense load in terms of intermachine communication.
In this paper we propose a graph-partitioning based approach for workload scheduling within stream processing systems. We implemented a distributed triple-stream processing engine on top of the Storm realtime computation framework and evaluate its communication behavior using two real-world datasets. We show that the application of graph partitioning algorithms can decrease inter-machine communication substantially (by 40% to 99%) whilst maintaining an even workload distribution, even using very limited data statistics. We also find that processing RDF data as single triples at a time rather than graph fragments (containing multiple triples), may decrease throughput indicating the usefulness of semantics.

References

[1]
Abadi, D., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Erwin, C., Galvez, E., Hatoun, M., Maskey, A., Rasin, A., Et Al.: Aurora: a data stream management system. In: Proc. of the 2003 ACM SIGMOD. pp. 666-666 (2003).
[2]
Abadi, D.J., Ahmad, Y., Balazinska, M., Hwang, J.h., Lindner, W., Maskey, A.S., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.: The Design of the Borealis Stream Processing Engine. In: Proc. CIDR2005. pp. 277-289 (2005).
[3]
Amini, L., Andrade, H., Bhagwan, R., Eskesen, F., King, R., Park, Y., Venkatramani, C.: Spc: A distributed, scalable platform for data mining. In: Proc. Workshop on Data Mining Standards, Services and Platforms, DM-SSP (2006).
[4]
Anicic, D., Fodor, P., Rudolph, S., Stojanovic, N.: EP-SPARQL: a unified language for event processing and stream reasoning. In: WWW2011. pp. 635-644 (2011).
[5]
Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: DEBS2013 (2013).
[6]
Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: C-SPARQL: A Continuous Query Language for RDF Data Streams. Int. J. of Sem. Comp. 4(1), 3-25 (2010).
[7]
Calbimonte, J.p., Corcho, O., Gray, A.J.G.: Enabling Ontology-based Access to Streaming Data Sources. In: Proc. ISWC 2010 (2010).
[8]
Cugola, G., Margara, A.: Processing flows of information. ACM Computing Surveys 44(3), 1-62 (Jun 2012).
[9]
Hoeksema, J., Kotoulas, S.: High-performance Distributed Stream Reasoning using S4. In: First International Workshop on Ordering and Reasoning (2011).
[10]
Karypis, G., Kumar, V.: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. on Scientific Comp. 20(1), 359-392 (Jan 1998).
[11]
Komazec, S., Cerri, D.: Sparkwave: Continuous Schema-Enhanced Pattern Matching over RDF Data Streams. In: DEBS 2012 (2012).
[12]
Lajos, J.F., Toth, G., Racz, R., Panczel, J., Gergely, T., Beszedes, A.: Survey on Complex Event Processing and Predictive Analytics. Tech. rep., Citeseer (2010).
[13]
Le-phuoc, D., Dao-tran, M., Parreira, J.X., Hauswirth, M.: A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data. In: Proc. ISWC 2011. vol. 7031, pp. 370-388 (2011).
[14]
Ou, C.W., Ranka, S.: Parallel incremental graph partitioning. Parallel and Distributed Systems, IEEE Transactions on 8(8), 884-896 (1997).
[15]
Owens, T.: Survey of event processing. Tech. Rep. December, Air Force Research Laboratory Public Affairs Office (2007).
[16]
Pietzuch, P., Ledlie, J., Shneidman, J., Roussopoulos, M., Welsh, M., Seltzer, M.: Network-Aware Operator Placement for Stream-Processing Systems. In: Proc. ICDE2006 (2006).
[17]
Rinne, M., Nuutila, E., Seppo, T.: INSTANS : High-Performance Event Processing with Standard RDF and SPARQL. In: ISWC 2012 Post. & Demos. pp. 6-9 (2012).
[18]
Scharrenbach, T., Urbani, J., Margara, A., della Valle, E., Bernstein, A.: Seven Commandments for Benchmarking Semantic Flow Processing Systems. In: ESWC 2013 (2013).
[19]
Vorburger, P., Bernstein, A.: Entropy-based Concept Shift Detection. In: Proc. ICDM2006. pp. 1113-1118 (2006).
[20]
White, T.: Hadoop: The definitive guide. O'Reilly Media, Inc., 3 edn. (2012).
[21]
Wolf, J., Bansal, N., Hildrum, K., Parekh, S., Rajan, D., Wagle, R.,Wu, K.L., Fleischer, L.: SODA: An optimizing scheduler for large-scale stream-based distributed computer systems. In: Proc. Middleware2008 (2008).
[22]
Xia, C., Towsley, D., Zhang, C.: Distributed resource management and admission control of stream processing systems with max utility. In: Proc. ICDCS2007 (2007).
[23]
Zhang, Y., Duc, P.M., Corcho, O., Calbimonte, J.p.: SRBench : A Streaming RDF / SPARQL Benchmark. In: Proc. ISWC 2012 (2012).

Cited By

View all
  • (2017)StriderProceedings of the VLDB Endowment10.14778/3137765.313780510:12(1905-1908)Online publication date: 1-Aug-2017
  • (2017)Optimal Operator Replication and Placement for Distributed Stream Processing SystemsACM SIGMETRICS Performance Evaluation Review10.1145/3092819.309282344:4(11-22)Online publication date: 10-May-2017
  • (2017)Semantic access to streaming and static data at SiemensWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2017.02.00144:C(54-74)Online publication date: 1-May-2017
  • Show More Cited By
  1. Scalable linked data stream processing via network-aware workload scheduling

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    SSWS'13: Proceedings of the 9th International Conference on Scalable Semantic Web Knowledge Base Systems - Volume 1046
    October 2013
    96 pages

    Publisher

    CEUR-WS.org

    Aachen, Germany

    Publication History

    Published: 21 October 2013

    Author Tags

    1. complex event processing
    2. graph partitioning
    3. linked data
    4. semantic flow processing
    5. stream processing
    6. workload scheduling

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)StriderProceedings of the VLDB Endowment10.14778/3137765.313780510:12(1905-1908)Online publication date: 1-Aug-2017
    • (2017)Optimal Operator Replication and Placement for Distributed Stream Processing SystemsACM SIGMETRICS Performance Evaluation Review10.1145/3092819.309282344:4(11-22)Online publication date: 10-May-2017
    • (2017)Semantic access to streaming and static data at SiemensWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2017.02.00144:C(54-74)Online publication date: 1-May-2017
    • (2016)Optimal operator placement for distributed stream processing applicationsProceedings of the 10th ACM International Conference on Distributed and Event-based Systems10.1145/2933267.2933312(69-80)Online publication date: 13-Jun-2016
    • (2016)Ontology-Based Integration of Streaming and Static Relational Data with OptiqueProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2899385(2109-2112)Online publication date: 26-Jun-2016

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media