Abstract
Data becomes too big to see. Yet visualization is a central way people understand data. We need to learn new ways to accommodate data visualization that scales up and out for large data to enable people to explore visually their data interactively in real-time as a means to understanding it. The five V’s of big data—value, volume, variety, velocity, and veracity—each highlights the challenges of this endeavor.
We present these challenges and a system, Skydive, that we are developing to meet them. Skydive presents an approach that tightly couples a database back-end with a visualization front-end for scaling up and out. We show how hierarchical aggregation can be used to drive this, and the powerful types of interactive visual presentations that can be supported. We are preparing for the day soon when visualization becomes the sixth V of big data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This includes U.S.A. with 319 million, Mexico with 122 million, and Canada with 35 million, as of 2013.
- 2.
Though they are cognizant of the need, and are working toward addressing this.
- 3.
We shall also show ways that categorical data as measures can be accommodated.
- 4.
We use the same number of divisions—power of two—along each of the dimensions, without loss of generality. It is trivial to allow for different “aspect” ratios with different numbers of divisions for different dimensions, however.
- 5.
For simplicity, we shall refer to strata \(s_0\), ..., \(s_l\), from the top to the bottom, respectively, forgoing the minus sign when understood in context.
- 6.
At least not standard versions of these.
- 7.
- 8.
- 9.
Or vice versa: the bins of the t-pyramid are then hierarchically aggregated by x,y. This is commutative.
- 10.
Also called Morton order [22]. This is a one-dimensional, linear ordering for any multi-dimensional data.
- 11.
“Bins” into which no base data aggregates—“empty bins”—are never created. These numbers in the Z-order are simply skipped over.
- 12.
This is sometimes referred to as a linear quadtree (for 2-D) [10].
- 13.
The dataset is over three dimensions—\(X\), \(Y\), and \(T\)—so assume \(B = 2^{3d}\) for some \(d\), without loss of generality.
References
Andrienko, N., Andrienko, G.: Exploratory analysis of spatial and temporal data: a systematic approach. Springer Science and Business Media, Heidelberg (2006)
Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., et al.: Spark SQL: relational data processing in spark. In: Proceedings of SIGMOD, pp. 1383–1394. ACM (2015)
Battle, L., Stonebraker, M., Chang, R.: Dynamic reduction of query result sets for interactive visualizaton. In: Proceedings of the International Conference on Big Data, Santa Clara, CA, USA, pp. 1–8 (2013)
Bertin, J.: Semiology of Graphics. University of Wisconsin Press, Madison (1983)
Beyer, M.A., Laney, D.: The importance of “big data”: a definition. Gartner report (2015)
Cable, D.: The racial dot map, demographics Research Group. Weldon Cooper Center for Public Service, University of Virginia, July 2013. www.coopercenter.org/demographics/Racial-Dot-Map
Dijcks, J.P.: Oracle: Big data for the enterprise. Oracle White Paper (2012)
Elmqvist, N., Fekete, J.D.: Hierarchical aggregation for information visualization: overview, techniques, and design guidelines. IEEE Trans. Vis. Comput. Graph. 16(3), 439–454 (2010)
Erickson, J.: Private correspondence, conveyed along with permission to use by Tilmann Rabl, May 2015
Gargantini, I.: An effective way to represent quadtrees. Commun. ACM 25(12), 905–910 (1982)
Godfrey, P., Gryz, J., Lasek, P., Razavi, N.: Skydive: an interactive data visualization engine. In: IEEE Symposium on Large Data Analytics and Visualization, Chicago, USA, October 25–26, pp. 129–130 (2015)
Godfrey, P., Gryz, J., Lasek, P.: Interactive visualization of large data sets. Technical report EECS-2015-03, York University, March 2015
Godfrey, P., Gryz, J., Lasek, P., Razavi, N.: Visualization through inductive aggregation. In: Proceedings of EDBT, March 2016
Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min. Knowl. Disc. 1(1), 29–53 (1997)
Hausenblas, M., Nadeau, J.: Apache drill: interactive ad-hoc analysis at scale. Big Data 1(2), 100–104 (2013)
Jugel, U., Jerzak, Z., Hackenbroich, G., Markl, V.: Faster visual analytics through pixel-perfect aggregation. Proc. VLDB Endowment 7(13), 1705–1708 (2014)
Jugel, U., Jerzak, Z., Hackenbroich, G., Markl, V.: M4: a visualization-oriented time series data aggregation. Proc. VLDB Endowment 7(10), 797–808 (2014)
Laney, D.: Meta Group Res Note 6. META (2001)
Liu, Z., Jiang, B., Heer, J.: imMens: real-time visual querying of big data. Comput. Graph. Forum 32(3), 421–430 (2013)
Magdy, A., Aly, A.M., Mokbel, M.F., Elnikety, S., He, Y., Nath, S.: Mars: real-time spatio-temporal queries on microblogs. In: ICDE, pp. 1238–1241 (2014)
Magdy, A., Mokbel, M.F., Elnikety, S., Nath, S., He, Y.: Mercury: a memory-constrained spatio-temporal real-time search on microblogs. In: ICDE, pp. 172–183. IEEE (2014)
Morton, G.M.: A Computer Oriented Geodetic Data Base and A New Technique in File Sequencing. International Business Machines Company, New York (1966)
Sallam, R.L., Hostmann, B., Schlegel, K., Tapadinhas, J., Parenteau, J., Oestreich, T.W.: Magic quadrant for business intelligence and analytics platforms. Gartner report (2015)
Samet, H.: The quadtree and related hierarchical data structures. ACM Comput. Surv. (CSUR) 16(2), 187–260 (1984)
Samet, H.: Applications of Spatial Data Structures: Computer Graphics, Image Processing, and GIS. Addison-Wesley Longman Publishing Co., Inc., Boston (1990)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)
Schroeck, M., Shockley, R., Smart, J., Romero-Morales, D., Tufano, P.: Analytics: The Real-World Use of Big Data. IBM Global Business Services, Somers (2012)
Shneiderman, B.: The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the 1996 IEEE Symposium on Visual Languages, pp. 336–343. IEEE (1996)
Shneiderman, B.: Extreme visualization: squeezing a billion records into a million pixels. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 3–12. ACM (2008)
Stolte, C., Tang, D., Hanrahan, P.: Polaris: a system for query, analysis, and visualization of multidimensional relational databases. IEEE Trans. Vis. Comput. Graph. 8(1), 52–65 (2002)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endowment 2(2), 1626–1629 (2009)
Tigani, J., Naidu, S.: Google BigQuery Analytics. John Wiley & Sons, Hoboken (2014)
Tufte, E.: Envisioning Information. Graphics Press, Cheshire (1990)
Wesley, R., Eldridge, M., Terlecki, P.T.: An analytic data engine for visualization in tableau. In: Proceedings of SIGMOD, pp. 1185–1194. ACM (2011)
Wesley, R.M.G., Terlecki, P.: Leveraging compression in the tableau data engine. In: Proceedings of SIGMOD, pp. 563–573. ACM (2014)
White, T.: Hadoop: The definitive guide. O’Reilly Media Inc, Sebastopol (2012)
Wu, E., Battle, L., Madden, S.R.: The case for data visualization management systems: vision paper. Proc. VLDB Endowment 7(10), 903–906 (2014)
Zikopoulos, P.C., Eaton, C., DeRoos, D., Deutsch, T., Lapis, G.: Understanding Big Data. McGraw-Hill, New York (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Godfrey, P., Gryz, J., Lasek, P., Razavi, N. (2016). Interactive Visualization of Big Data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-34099-9_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)