[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Real-Time Analytics: Benefits, Limitations, and Tradeoffs

  • Published:
Programming and Computer Software Aims and scope Submit manuscript

Abstract

Real-time analytics is a relatively new branch of analytics. A common definition of real-time analytics is that it consists in analyzing data as quickly as possible over the most recent data possible. This defines the essence of the fundamental needs of users, but in no way is a specific requirement for the corresponding software systems due to the vagueness of the definition. As a result, different manufacturers of analytical data-management systems and researchers classify real-time analytics systems as extremely different systems, which differ in architecture, functionality, and even timing. The purpose of this article is to analyze the different approaches to providing real-time analytics, their advantages and disadvantages, and the tradeoffs that both designers and users of the systems inevitably have to make.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.
Fig. 14.
Fig. 15.
Fig. 16.
Fig. 17.
Fig. 18.
Fig. 19.
Fig. 20.
Fig. 21.
Fig. 22.
Fig. 23.

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

REFERENCES

  1. Inmon, W.H., Building the Data Warehouse, John Wiley & Sons, 1992.

    Google Scholar 

  2. Kimball, R., The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses, Wiley, 1996.

    Google Scholar 

  3. Information Technology. Gartner Glossary. Real-Time Analytics. https://www.gartner.com/en/information-technology/glossary/real-time-analytics. Accessed 06.16.2021.

  4. Kejariwal, A., Kulkarni, S., and Ramasamy, K., Real Time Analytics: Algorithms and Systems. Extended Version of VLDB’15 Tutorial Proposal, 2017. arXiv:1708.02621

  5. Milosevic, Z., Chen, W., Berry, A., and Rabhi, F.A., Real-time analytics, in Big Data: Principles and Paradigms, Morgan Kaufmann, 2016, pp. 39–61.

    Google Scholar 

  6. Özcan, F., Tian, Y., and Tözün, P., Hybrid transactional/analytical processing: a survey, Proc. ACM Int. Conf. on Management of Data, Chicago, 2017, pp. 1771–1775.

  7. Kuznetsov, S.D., Velikhov, P.E., and Qiang Fu, Real-time analytics, hybrid transactional/analytical processing, in-memory data management, and non-volatile memory, Proc. Ivannikov ISPRAS Open Conf., 2020, pp. 78–90.

  8. Henzinger, M.R., Raghavan, P., and Rajagopalan, S., Computing on data streams, SRC Technical Note, May 26, 1998, no. 1998-11.

  9. The “Stream Team” Page. http://infolab.stanford.edu/sdt/. Accessed 07.07.2021.

  10. Special issue on data stream processing, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1.

  11. Zdonik, S., Stonebraker, M., et al., The Aurora and Medusa projects, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1, pp. 3–10.

    Google Scholar 

  12. Krishnamurthy, S., Chandrasekaran, S., et al., TelegraphCQ: an architectural status report, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1, pp. 11–18.

    Google Scholar 

  13. Arasu, A., Babcock, B., et al., STREAM: the Stanford Stream Data Manager, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1, pp. 19–26.

    Google Scholar 

  14. Terry, D., Goldberg, D., Nichols, D., and Oki, B., Continuous queries over append-only databases, ACM SIGMOD Record, 1992, vol. 21, issue 2, pp. 321–330.

    Article  Google Scholar 

  15. Chen, J., DeWitt, D.J., Tian, F., and Wang, Y., NiagaraCQ: a scalable continuous query system for Internet databases, ACM SIGMOD Record, 2000, vol. 29, issue 2, pp. 379–390.

    Article  Google Scholar 

  16. Chandrasekaran, S., Cooper, O., et al., TelegraphCQ: continuous dataflow processing for an uncertain world, Proc. 2003 CIDR Conf., Monterey, 2003.

  17. Gehrke, J., Korn, F., and Srivastava, D., On computing correlated aggregates over continual data streams, Proc. ACM SIGMOD Int. Conf. on Management of

  18. Arasu, A., Babcock, B., et al., STREAM: The Stanford Data Stream Management System, Technical Report, Stanford InfoLabData, Santa Barbara, 2001, pp. 13–24., 2004. Later appeared as a chapter in Data Stream Management. Processing High-Speed Data Streams, Springer, 2016, pp. 317–336.

  19. Arasu, A., Babu, S., and Widom, J., CQL: a Language for Continuous Queries over Streams and Relations, Berlin, Heidelberg: Springer, 2003.

    Google Scholar 

  20. Abadi, D.J., Carney, D., et al., Aurora: a new model and architecture for data stream management, Int. J. Very Large Data Bases, 2003, vol. 12, no. 2, pp. 120–139.

    Article  Google Scholar 

  21. Çetintemel, U. and Abadi, D., The Aurora and Borealis stream processing engines, in Data Stream Management. Processing High-Speed Data Streams, Springer, 2016, pp. 337–359.

    Google Scholar 

  22. Abadi, D.J., Ahmad, Y., et al., The design of the Borealis stream processing engine, Proc. CIDR Conf., Asilomar, CA, 2005, pp. 277–289.

  23. TIBCO StreamBase. https://www.tibco.com/sites/tibco/files/resources/DS-TIBCO-StreamBase-final.pdf. Accessed 07.14.2021.

  24. StreamSQL Guide. https://docs.tibco.com/pub/sb-lv/2.1.8/doc/html/streamsql/index.html. Accessed 07.14.2021.

  25. Jain, N., Mishra, S., et al., Towards a streaming SQL standard, Proc. VLDB Endowment, 2008, vol. 1, issue 2, pp 1379–1390.

  26. Stonebraker, M., Çetintemel, U., and Zdonik, S., The 8 requirements of real-time stream processing, ACM SIGMOD Record, 2005, vol. 34, issue 4, pp. 42–47.

    Article  Google Scholar 

  27. Geisler, S., Data stream management systems, in Data Exchange, Integration, and Streams, Dagstuhl Follow-Ups, 2013, vol. 5, pp. 275–304.

    Google Scholar 

  28. Special issue on next-generation stream processing, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4.

  29. Kleppmann, M. and Kreps, J., Kafka, Samza and the Unix philosophy of distributed data, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4, pp. 4–14.

    Google Scholar 

  30. Carbone, P., Ewen, S., and Flink, A., Stream and batch processing in a single engine, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4, pp. 28–38.

    Google Scholar 

  31. Schneider, S., Gedik, B., and Hirzel, M., Language runtime and optimizations in IBM streams, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4, pp. 61–72.

    Google Scholar 

  32. Witkowski, A., Bellamkonda, S., et al., Continuous queries in Oracle, Proc. 33rd Int. Conf. on Very Large Data Bases, Vienna, 2007, pp. 1173–1184.

  33. Oracle Fusion Middleware Understanding Stream Analytics. https://docs.oracle.com/en/middleware/fusion-middleware/osa/18.1/understanding-stream-analytics/understanding-oracle-stream-analytics.pdf. Accessed 07.16.2021.

  34. Vengal, T., What is Oracle stream analytics?. https://blogs.oracle.com/dataintegration/what-is-oracle-stream-analytics. Accessed 07.16.2021.

  35. IBM, Streams. https://www.ibm.com/cloud/streaming-analytics. Accessed 07.16.2021.

  36. Biem, A., Bouillet, E., et al., IBM InfoSphere streams for scalable, real-time, intelligent transportation services, Proc. ACM SIGMOD Int. Conf. on Management of Data, Indianapolis, 2010, pp. 1093–1104.

  37. Hirzel, M., Andrade, H., et al., IBM streams processing language: analyzing BigData in motion, IBM J. Res. Develop., 2013, vol. 57, no. 3/4.

  38. Ali, M., Chandramouli, B., et al., Spatio-temporal stream processing in microsoft StreamInsight, IEEE Bull. Tech. Comm. Data Eng., 2010, vol. 33, no. 2, pp. 69–74.

    Google Scholar 

  39. Ali, M., Chandramouli, B., et al., The extensibility framework in Microsoft StreamInsight, Proc. 27th IEEE Int. Conf. on Data Engineering, Hannover, 2011, pp. 1242–1253.

  40. Pierry, R.,Streaminsight – master large data streams with Microsoft StreamInsight, MSDN Mag., 2011, vol. 26, no. 06.

  41. What is Microsoft StreamInsight?. https://azurecloudai.blog/2013/01/30/what-is-microsoft-streaminsight/. Accessed 07.16.2021.

  42. Welcome to Azure stream analytics. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction. Accessed 07.16.2021.

  43. Data Engineering Streaming. https://www.informatica.com/products/big-data/big-data-streaming.html. Accessed 07.16.2021.

  44. SAS’s Event Stream Processing. https://www.sas.com/en_us/software/event-stream-processing.html. Accessed 07.16.2021.

  45. Apache Kafka. https://kafka.apache.org/. Accessed 07.16.2021.

  46. Apache Samza. http://samza.apache.org/. Accessed 07.16.2021.

  47. Apache Kafka Architecture – Kafka Component Overview. https://www.instaclustr.com/apache-kafka-architecture/#. Accessed 07.16.2021.

  48. Apache ZooKeeper. https://zookeeper.apache.org/. Accessed 07.16.2021.

  49. Apache Hadoop YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 07.16.2021.

  50. Anand, R., What is Apache Samza?. https://www.quora.com/What-is-Apache-Samza-1. Accessed 07.16.2021.

  51. What is Apache Flink? – Architecture. https://flink.apache.org/flink-architecture.html. Accessed 07.16.2021.

  52. Spark Streaming Programming Guide. https://spark.apache.org/docs/latest/streaming-programming-guide.html. Accessed 07.16.2021.

  53. Spark API Documentation. https://spark.apache.org/docs/2.4.0/api.html. Accessed 07.16.2021.

  54. BigQuery. https://cloud.google.com/bigquery. Accessed 07.17.2021.

  55. A Deep Dive into Google BigQuery Architecture. https://panoply.io/data-warehouse-guide/bigquery-architecture/. Accessed 07.17.2021.

  56. Melnik, S., Gubarev, A., et al., Dremel: interactive analysis of web-scale datasets, Proc. VLDB Endowment, 2010, vol. 3, no. 1, pp. 330–339.

  57. Afrati, F.N., Delorey, D., et al., Storing and querying tree structured records in Dremel, Proc. VLDB Endowment, 2014, vol. 7, no. 11, pp. 1131–1142.

  58. Pasumansky, M., Inside Capacitor, BigQuery’s next-generation columnar storage format. https://cloud.google.com/blog/products/bigquery/inside-capacitor-bigquerys-next-generation-columnar-storage-format. Accessed 07.17.2021.

  59. Serenyi, D., Colossus under the hood: a peek into Google’s scalable storage system. https://cloud.google.com/blog/products/storage-data-transfer/a-peek-behind-colossus-googles-file-system. Accessed 07.17.2021.

  60. Verma, A., Pedrosa, L., et al., Large-scale cluster management at Google with Borg, Proc. 10th European Conf. on Computer Systems, Bordeaux, 2015, pp. 1–17.

  61. Singh, A., Ong, J., et al., Jupiter rising: a decade of clos topologies and centralized control in Google’s datacenter network, in ACM SIGCOMM Computer Communication Review, New York: Association for Computing Machinery, 2015, pp. 183–197.

    Google Scholar 

  62. Amazon Redshift and PostgreSQL. https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html. Accessed 07.17.2021.

  63. Data Warehouse System Architecture. https://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html. Accessed 07.17.2021.

  64. Gupta, A., Agarwal, D., et al., Amazon redshift and the case for simpler data warehouses, Proc. ACM SIGMOD Int. Conf. on Management of Data, Melbourne, 2015, pp. 1917–1923.

  65. The Microsoft Modern Data Warehouse. White Paper, 2016. http://download.microsoft.com/download/C/2/D/C2D2D5FA-768A-49AD-8957-1A434C6C8126/Microsoft_Modern_Data_Warehouse_white_paper.pdf. Accessed 07.18.2021.

  66. Azure Synapse SQL Architecture. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/overview-architecture. Accessed 07.18.2021.

  67. What is Azure Synapse Analytics? https://docs.microsoft.com/en-us/azure/synapse-analytics/overview-what-is. Accessed 07.18.2021.

  68. Use Transactions in a SQL Pool in Azure Synapse. https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-develop-transactions.md. Accessed 07.18.2021.

  69. Motivala, A. and Yan, J., The Snowflake Elastic Data Warehouse, SIGMOD 2016 and beyond. https://15721.courses.cs.cmu.edu/spring2018/slides/25-snowflake.pdf. Accessed 07.18.2021.

  70. Dageville, B., Cruanes, T., et al., The snowflake elastic data warehouse, Proc. Int. Conf. on Management of Data, San Francisco, 2016, pp. 215–226.

  71. Ailamaki, A., DeWitt, D.J., et al., Weaving relations for Cache performance, Proc. 27th Int. Conf. on Very Large Data Bases, Roma, Sept. 2001, pp. 169–180.

  72. Karger, D., Lehman, E., et al., Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web, Proc. 29th Annu. ACM Symp. on Theory of Computing, El Paso, TX, 1997, pp. 654–663.

  73. Graefe, G., The cascades framework for query optimization, IEEE Bull. Tech. Comm. Data Eng., 1995, vol. 18, no. 3, pp. 19–29.

    Google Scholar 

  74. Faerber, F., Kemper, A., et al., Main memory database systems, Found. Trends Databases, 2016, vol. 8, no. 1–2, pp. 1–130.

    Article  Google Scholar 

  75. Transier, F. and Sanders, P., Engineering basic algorithms of an in-memory text search engine, ACM Trans. Inf. Syst., 2010, art. no. 2.

  76. Ross, J.A., SAP NetWeaver BI Accelerator, SAP PRESS, 2008.

    Google Scholar 

  77. Cha, S.K. and Song, C., P*TIME: highly scalable oltp dbms for managing update-intensive stream workload, Proc. 30th VLDB Conf., Toronto, 2004, pp. 1033–1044.

  78. Bögelsack, A., Gradl, S., Mayer, M., and Krcmar, H., SAP MaxDB Administration, SAP PRESS, 2009.

    Google Scholar 

  79. Faerber, F., May, N., et al., The SAP HANA database – an architecture overview, IEEE Bull. Tech. Comm. Data Eng., 2012, vol. 35, no. 1, pp. 28–33.

    Google Scholar 

  80. Larson, P.-Å., Clinciu, C., et al., SQL server column store indexes, Proc. ACM SIGMOD Int. Conf. on Management of Data, Athens, 2011, pp. 1177–1184.

  81. Larson, P.-Å., Zwilling, M., and Farlee, K., The Hekaton memory-optimized OLTP engine, Bull. Tech. Comm. Data Eng., 2013, vol. 36, no. 2, pp. 34–40.

    Google Scholar 

  82. Larson, P.-Å., Birka, A., et al., Real-time analytical processing with SQL server, Proc. VLDB Endowment, 2015, vol. 8, no. 12, pp. 1740–1751.

  83. Eldawy, A., Levandoski, J., and Larson, P.-Å., Trekking through Siberia: managing cold data in a memory-optimized database, Proc. VLDB Endowment, 2014, vol. 7, no. 11, pp. 931–942.

  84. Lahiri, T., Neimat, M.-A., and Folkman, S., Oracle timesten: an in-memory database for enterprise applications, Bull. Tech. Comm. Data Eng., 2013, vol. 36, no. 2, pp. 6–13.

    Google Scholar 

  85. Listgarten, S. and Neimat, M.-A., Modelling costs for a MM-DBMS, Proc. Int. Workshop on Real-Time Databases, Issues and Applications (RTDB), Newport Beach, CA, 1996, pp. 72–78.

  86. Lahiri, T., Chavan, S., et al., Oracle database in-memory: a dual format in-memory database, Proc. 31st IEEE Int. Conf. on Data Engineering, Seoul, 2015, pp. 1253–1258.

  87. Mukherjee, N., Chavan, S., et al., Distributed architecture of oracle database in-memory, Proc. VLDB Endowment, 2015, vol. 8, no. 12, pp. 1630–1641.

  88. Chavan, S. and Goindi, G., Oracle Database In-Memory on Exadata: a Potent Combination. Oracle OpenWorld 2018. https://www.oracle.com/technetwork/database/exadata/pro4016-exadataandinmemory-5187037.pdf. Accessed 07.18.2021.

  89. Barber, R., Bendel, P., et al., Business analytics in (a) blink, Bull. IEEE Comput. Soc. Tech. Comm. Data Eng., 2012, vol. 35, no. 1, pp. 9–14.

    Google Scholar 

  90. IBM Informix Warehouse Accelerator. Technical White Paper. https://www.iiug.org/library/ids_12/IWA%20White%20Paper-2013-03-21.pdf. Accessed 07.18.2021.

  91. Raman, V., Attaluri, G., et al., DB2 with BLU acceleration: so much more than just a column store, Proc. VLDB Endowment, 2013, vol. 6, no. 11, pp. 1080–1091.

  92. Chen, W.-J., Bläser, B., et al., Architecting and Deploying DB2 with BLU Acceleration, IBM Redbooks, 2014.

    Google Scholar 

  93. Faster Analytics with HyPer. https://www.tableau.com/products/new-features/hyper. Accessed 07.18.2021.

  94. Kemper, A. and Neumann, T., HyPer – hybrid OLTP&OLAP high performance database system, Technical Report, Munich: Technical Univ., 2010, no. TUM-I1010.

  95. Kemper, A., Neumann, T., et al., Transaction processing in the hybrid OLTP&OLAP main-memory database system HyPer, Bull. Tech. Comm. Data Eng., 2013, vol. 36, no. 2, pp. 41–47.

    Google Scholar 

  96. Albutiu, M.-C., Kemper, A., and Neumann, T., Massively parallel sort-merge joins in main memory multi-core database systems, Proc. VLDB Endowment, 2012, vol. 5, no. 10, pp. 1064–1075.

  97. Neumann, T., Mühlbauer, T., and Kemper, A., Fast serializable multi-version concurrency control for main-memory database systems, Proc. ACM SIGMOD Int. Conf. on Management of Data, Melbourne, 2015, pp. 677–689.

  98. Andrei, M., Lemke, C., et al., SAP HANA adoption of non-volatile memory, Proc. VLDB Endowment, 2017, vol. 10, no. 12, pp. 1754–1765.

  99. Dorr, B., How It Works (It Just Runs Faster): Non-Volatile Memory SQL Server Tail of Log Caching on NVDIMM. https://docs.microsoft.com/ru-ru/archive/blogs/bobsql/how-it-works-it-just-runs-faster-non-volatile-memory-sql-server-tail-of-log-caching-on-nvdimm. Accessed 07.18.2021.

  100. Oracle Database 20c. Database Administrator’s Guide. Using Persistent Memory Database. https://docs.oracle.com/en/database/oracle/oracle-database/. Accessed 07.18.2021.

  101. Arulraj, J. and Pavlo, A., Non-Volatile Memory Database Management Systems. Synthesis Lectures on Data Management, Morgan & Claypool Publ., 2019.

    Google Scholar 

  102. Oukid, I., Architectural Principles for Database Systems on Storage-Class Memory, Bonn: Gesellschaft fur Informatik, 2019, pp. 477–486.

    Google Scholar 

Download references

ACKNOWLEDGMENTS

This article is based on the materials of a report at the seventh international conference “Actual Problems of System and Software Engineering” (APSSE 2021).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to S. D. Kuznetsov, P. E. Velikhov or Q. Fu.

Ethics declarations

The authors declare that they have no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuznetsov, S.D., Velikhov, P.E. & Fu, Q. Real-Time Analytics: Benefits, Limitations, and Tradeoffs. Program Comput Soft 49, 1–25 (2023). https://doi.org/10.1134/S036176882301005X

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S036176882301005X

Navigation