[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3332186.3332256acmotherconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

VAStream: A Visual Analytics System for Fast Data Streams

Published: 28 July 2019 Publication History

Abstract

Processing high-volume, high-velocity data streams is an important big data problem in many sciences, engineering, and technology domains. There are many open-source distributed stream processing and cloud platforms that offer low-latency stream processing at scale, but the visualization and user-interaction components of these systems are limited to visualizing the outcome of stream processing results. Visual analysis represents a new form of analysis where the user has more control and interactive capabilities either to dynamically change the visualization, analytics or data management processes. VAStream provides an environment for big data stream processing along with interactive visualization capabilities. The system environment consists of hardware and software modules to optimize streaming data workflow (that includes data ingest, pre-processing, analytics, visualization, and collaboration components). The system environment is evaluated for two real-time streaming applications. The real-time event detection using social media streams uses text data arriving from sources such as Twitter to detect emerging events of interest. The real-time river sensor network analysis project uses unsupervised classification methods to classify sensor network streams arriving from the US river network to detect water quality problems. We discuss implementation details and provide performance comparison results of various individual stream processing operations for both stream processing applications.

References

[1]
Matei Zaharia, Reynold S Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J Franklin, et al. Apache spark: a unified engine for big data processing. Communications of the ACM, 59(11):56--65, 2016.
[2]
Sanket Chintapalli, Derek Dagit, Bobby Evans, Reza Farivar, Thomas Graves, Mark Holderbaugh, Zhuo Liu, Kyle Nusbaum, Kishorkumar Patil, Boyang Jerry Peng, et al. Benchmarking streaming computation engines: Storm, flink and spark streaming. In 2016 IEEE international parallel and distributed processing symposium workshops (IPDPSW), pages 1789--1792. IEEE, 2016.
[3]
Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, et al. Mllib: Machine learning in apache spark. The Journal of Machine Learning Research, 17 (1):1235--1241, 2016.
[4]
Shixia Liu, Jialun Yin, Xiting Wang, Weiwei Cui, Kelei Cao, and Jian Pei. Online visual analytics of text streams. IEEE transactions on visualization and computer graphics, 22(11):2451--2466, 2016.
[5]
Jonas Traub, Nikolaas Steenbergen, Philipp Grulich, Tilmann Rabl, and Volker Markl. I2: Interactive real-time visualization for streaming data. In EDBT, pages 526--529, 2017.
[6]
Lokmanyathilak Govindan Sankar Selvan and Teng-Sheng Moh. A framework for fast-feedback opinion mining on twitter data streams. In 2015 International Conference on Collaboration Technologies and Systems (CTS), pages 314--318. IEEE, 2015.
[7]
Jianlong Zhou, Zelin Li, Zongjian Zhang, Bin Liang, and Fang Chen. Visual analytics of relations of multi-attributes in big infrastructure data. In 2016 Big Data Visual Analytics (BDVA), pages 1--2. IEEE, 2016.
[8]
Dominik Sacha, Hansi Senaratne, Bum Chul Kwon, Geoffrey Ellis, and Daniel A Keim. The role of uncertainty, awareness, and trust in visual analytics. IEEE transactions on visualization and computer graphics, 22(1):240--249, 2016.
[9]
SR Venna, RN Gottumukkala, and VV Raghavan. Visual analytic decision-making environments for large-scale time-evolving graphs. In Handbook of Statistics, volume 35, pages 81--115. Elsevier, 2016.
[10]
Satya Katragadda, Raju Gottumukkala, Murali Pusala, Vijay Raghavan, and Jessica Wojtkiewicz. Distributed real time link prediction on graph streams. In 2018 IEEE International Conference on Big Data (Big Data), pages 2912--2917. IEEE, 2018.
[11]
A MadhaviLatha and G Vijaya Kumar. Streaming data analysis using apache cassandra and zeppelin. IJISET-International Journal of Innovative Science, Engineering & Technology, 3(10), 2016.
[12]
Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. Apache flink: Stream and batch processing ina single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4), 2015.
[13]
Muhammad Hussain Iqbal and Tariq Rahim Soomro. Big data analysis: Apache storm perspective. International journal of computer trends and technology, 19(1): 9--14, 2015.
[14]
NSF Center for Visual and Decision Informatics. http://www.nsfcvdi.org. Accessed: 2019-03-31.
[15]
Jiawei Zhang, Yang Wang, Piero Molino, Lezhi Li, and David SE bert. Mani fold: A model-agnostic framework for interpretation and diagnosis of machine learning models. IEEE transactions on visualization and computer graphics, 25(1):364--373, 2019.
[16]
A Dell Deployment Guide. Dell| apache hadoop solution dell| cloudera solution deployment guide v1. 6.
[17]
Cloudera Manager 5 Overview. https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cm_intro_primer.html. Accessed: 2019-05-22.
[18]
Josiah L Carlson. Redis in action. Manning Publications Co., 2013.
[19]
Avinash Lakshman and Prashant Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010.
[20]
Lars George. HBase: the definitive guide: random access to your planet-size data. " O'Reilly Media, Inc.", 2011.
[21]
JanusGraph. https://janusgraph.org. Accessed: 2019-05-22.
[22]
Todd Persen and Robert Winslow. Benchmarking influxdb vs. cassandra for time-series data, metrics & management. InfluxData Tech. Pap, 2016.
[23]
Satya Katragadda, Shahid Virani, Ryan Benton, and Vijay Raghavan. Detection of event onset using twitter. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 1539--1546. IEEE, 2016.
[24]
Nicholas G Lipari, Christoph W Borst, and Mehmet Engin Tozal. Visual analytics using graph sampling and summarization on multitouch displays. In International Symposium on Visual Computing, pages 462--471. Springer, 2016.
[25]
USGS Water Services. https://waterservices.usgs.gov. Accessed: 2019-03-31.
[26]
Michael F Chislock, Enrique Doster, Rachel A Zitomer, and AE Wilson. Eutrophication: causes, consequences, and controls in aquatic ecosystems. Nature Education Knowledge, 4(4):10, 2013.

Cited By

View all
  • (2022)A Visual Analytics Approach for Hardware System Monitoring with Streaming Functional Data AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.316534828:6(2338-2349)Online publication date: 1-Jun-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
PEARC '19: Practice and Experience in Advanced Research Computing 2019: Rise of the Machines (learning)
July 2019
775 pages
ISBN:9781450372275
DOI:10.1145/3332186
  • General Chair:
  • Tom Furlani
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Visual analytics
  2. big data infrastructure
  3. machine learning
  4. stream computing
  5. visualization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PEARC '19

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Visual Analytics Approach for Hardware System Monitoring with Streaming Functional Data AnalysisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2022.316534828:6(2338-2349)Online publication date: 1-Jun-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media