Abstract
Real-time analytics is a relatively new branch of analytics. A common definition of real-time analytics is that it consists in analyzing data as quickly as possible over the most recent data possible. This defines the essence of the fundamental needs of users, but in no way is a specific requirement for the corresponding software systems due to the vagueness of the definition. As a result, different manufacturers of analytical data-management systems and researchers classify real-time analytics systems as extremely different systems, which differ in architecture, functionality, and even timing. The purpose of this article is to analyze the different approaches to providing real-time analytics, their advantages and disadvantages, and the tradeoffs that both designers and users of the systems inevitably have to make.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.REFERENCES
Inmon, W.H., Building the Data Warehouse, John Wiley & Sons, 1992.
Kimball, R., The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses, Wiley, 1996.
Information Technology. Gartner Glossary. Real-Time Analytics. https://www.gartner.com/en/information-technology/glossary/real-time-analytics. Accessed 06.16.2021.
Kejariwal, A., Kulkarni, S., and Ramasamy, K., Real Time Analytics: Algorithms and Systems. Extended Version of VLDB’15 Tutorial Proposal, 2017. arXiv:1708.02621
Milosevic, Z., Chen, W., Berry, A., and Rabhi, F.A., Real-time analytics, in Big Data: Principles and Paradigms, Morgan Kaufmann, 2016, pp. 39–61.
Özcan, F., Tian, Y., and Tözün, P., Hybrid transactional/analytical processing: a survey, Proc. ACM Int. Conf. on Management of Data, Chicago, 2017, pp. 1771–1775.
Kuznetsov, S.D., Velikhov, P.E., and Qiang Fu, Real-time analytics, hybrid transactional/analytical processing, in-memory data management, and non-volatile memory, Proc. Ivannikov ISPRAS Open Conf., 2020, pp. 78–90.
Henzinger, M.R., Raghavan, P., and Rajagopalan, S., Computing on data streams, SRC Technical Note, May 26, 1998, no. 1998-11.
The “Stream Team” Page. http://infolab.stanford.edu/sdt/. Accessed 07.07.2021.
Special issue on data stream processing, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1.
Zdonik, S., Stonebraker, M., et al., The Aurora and Medusa projects, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1, pp. 3–10.
Krishnamurthy, S., Chandrasekaran, S., et al., TelegraphCQ: an architectural status report, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1, pp. 11–18.
Arasu, A., Babcock, B., et al., STREAM: the Stanford Stream Data Manager, IEEE Bull. Tech. Comm. Data Eng., 2003, vol. 26, no. 1, pp. 19–26.
Terry, D., Goldberg, D., Nichols, D., and Oki, B., Continuous queries over append-only databases, ACM SIGMOD Record, 1992, vol. 21, issue 2, pp. 321–330.
Chen, J., DeWitt, D.J., Tian, F., and Wang, Y., NiagaraCQ: a scalable continuous query system for Internet databases, ACM SIGMOD Record, 2000, vol. 29, issue 2, pp. 379–390.
Chandrasekaran, S., Cooper, O., et al., TelegraphCQ: continuous dataflow processing for an uncertain world, Proc. 2003 CIDR Conf., Monterey, 2003.
Gehrke, J., Korn, F., and Srivastava, D., On computing correlated aggregates over continual data streams, Proc. ACM SIGMOD Int. Conf. on Management of
Arasu, A., Babcock, B., et al., STREAM: The Stanford Data Stream Management System, Technical Report, Stanford InfoLabData, Santa Barbara, 2001, pp. 13–24., 2004. Later appeared as a chapter in Data Stream Management. Processing High-Speed Data Streams, Springer, 2016, pp. 317–336.
Arasu, A., Babu, S., and Widom, J., CQL: a Language for Continuous Queries over Streams and Relations, Berlin, Heidelberg: Springer, 2003.
Abadi, D.J., Carney, D., et al., Aurora: a new model and architecture for data stream management, Int. J. Very Large Data Bases, 2003, vol. 12, no. 2, pp. 120–139.
Çetintemel, U. and Abadi, D., The Aurora and Borealis stream processing engines, in Data Stream Management. Processing High-Speed Data Streams, Springer, 2016, pp. 337–359.
Abadi, D.J., Ahmad, Y., et al., The design of the Borealis stream processing engine, Proc. CIDR Conf., Asilomar, CA, 2005, pp. 277–289.
TIBCO StreamBase. https://www.tibco.com/sites/tibco/files/resources/DS-TIBCO-StreamBase-final.pdf. Accessed 07.14.2021.
StreamSQL Guide. https://docs.tibco.com/pub/sb-lv/2.1.8/doc/html/streamsql/index.html. Accessed 07.14.2021.
Jain, N., Mishra, S., et al., Towards a streaming SQL standard, Proc. VLDB Endowment, 2008, vol. 1, issue 2, pp 1379–1390.
Stonebraker, M., Çetintemel, U., and Zdonik, S., The 8 requirements of real-time stream processing, ACM SIGMOD Record, 2005, vol. 34, issue 4, pp. 42–47.
Geisler, S., Data stream management systems, in Data Exchange, Integration, and Streams, Dagstuhl Follow-Ups, 2013, vol. 5, pp. 275–304.
Special issue on next-generation stream processing, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4.
Kleppmann, M. and Kreps, J., Kafka, Samza and the Unix philosophy of distributed data, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4, pp. 4–14.
Carbone, P., Ewen, S., and Flink, A., Stream and batch processing in a single engine, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4, pp. 28–38.
Schneider, S., Gedik, B., and Hirzel, M., Language runtime and optimizations in IBM streams, IEEE Bull. Tech. Comm. Data Eng., 2013, vol. 38, no. 4, pp. 61–72.
Witkowski, A., Bellamkonda, S., et al., Continuous queries in Oracle, Proc. 33rd Int. Conf. on Very Large Data Bases, Vienna, 2007, pp. 1173–1184.
Oracle Fusion Middleware Understanding Stream Analytics. https://docs.oracle.com/en/middleware/fusion-middleware/osa/18.1/understanding-stream-analytics/understanding-oracle-stream-analytics.pdf. Accessed 07.16.2021.
Vengal, T., What is Oracle stream analytics?. https://blogs.oracle.com/dataintegration/what-is-oracle-stream-analytics. Accessed 07.16.2021.
IBM, Streams. https://www.ibm.com/cloud/streaming-analytics. Accessed 07.16.2021.
Biem, A., Bouillet, E., et al., IBM InfoSphere streams for scalable, real-time, intelligent transportation services, Proc. ACM SIGMOD Int. Conf. on Management of Data, Indianapolis, 2010, pp. 1093–1104.
Hirzel, M., Andrade, H., et al., IBM streams processing language: analyzing BigData in motion, IBM J. Res. Develop., 2013, vol. 57, no. 3/4.
Ali, M., Chandramouli, B., et al., Spatio-temporal stream processing in microsoft StreamInsight, IEEE Bull. Tech. Comm. Data Eng., 2010, vol. 33, no. 2, pp. 69–74.
Ali, M., Chandramouli, B., et al., The extensibility framework in Microsoft StreamInsight, Proc. 27th IEEE Int. Conf. on Data Engineering, Hannover, 2011, pp. 1242–1253.
Pierry, R.,Streaminsight – master large data streams with Microsoft StreamInsight, MSDN Mag., 2011, vol. 26, no. 06.
What is Microsoft StreamInsight?. https://azurecloudai.blog/2013/01/30/what-is-microsoft-streaminsight/. Accessed 07.16.2021.
Welcome to Azure stream analytics. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-introduction. Accessed 07.16.2021.
Data Engineering Streaming. https://www.informatica.com/products/big-data/big-data-streaming.html. Accessed 07.16.2021.
SAS’s Event Stream Processing. https://www.sas.com/en_us/software/event-stream-processing.html. Accessed 07.16.2021.
Apache Kafka. https://kafka.apache.org/. Accessed 07.16.2021.
Apache Samza. http://samza.apache.org/. Accessed 07.16.2021.
Apache Kafka Architecture – Kafka Component Overview. https://www.instaclustr.com/apache-kafka-architecture/#. Accessed 07.16.2021.
Apache ZooKeeper. https://zookeeper.apache.org/. Accessed 07.16.2021.
Apache Hadoop YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 07.16.2021.
Anand, R., What is Apache Samza?. https://www.quora.com/What-is-Apache-Samza-1. Accessed 07.16.2021.
What is Apache Flink? – Architecture. https://flink.apache.org/flink-architecture.html. Accessed 07.16.2021.
Spark Streaming Programming Guide. https://spark.apache.org/docs/latest/streaming-programming-guide.html. Accessed 07.16.2021.
Spark API Documentation. https://spark.apache.org/docs/2.4.0/api.html. Accessed 07.16.2021.
BigQuery. https://cloud.google.com/bigquery. Accessed 07.17.2021.
A Deep Dive into Google BigQuery Architecture. https://panoply.io/data-warehouse-guide/bigquery-architecture/. Accessed 07.17.2021.
Melnik, S., Gubarev, A., et al., Dremel: interactive analysis of web-scale datasets, Proc. VLDB Endowment, 2010, vol. 3, no. 1, pp. 330–339.
Afrati, F.N., Delorey, D., et al., Storing and querying tree structured records in Dremel, Proc. VLDB Endowment, 2014, vol. 7, no. 11, pp. 1131–1142.
Pasumansky, M., Inside Capacitor, BigQuery’s next-generation columnar storage format. https://cloud.google.com/blog/products/bigquery/inside-capacitor-bigquerys-next-generation-columnar-storage-format. Accessed 07.17.2021.
Serenyi, D., Colossus under the hood: a peek into Google’s scalable storage system. https://cloud.google.com/blog/products/storage-data-transfer/a-peek-behind-colossus-googles-file-system. Accessed 07.17.2021.
Verma, A., Pedrosa, L., et al., Large-scale cluster management at Google with Borg, Proc. 10th European Conf. on Computer Systems, Bordeaux, 2015, pp. 1–17.
Singh, A., Ong, J., et al., Jupiter rising: a decade of clos topologies and centralized control in Google’s datacenter network, in ACM SIGCOMM Computer Communication Review, New York: Association for Computing Machinery, 2015, pp. 183–197.
Amazon Redshift and PostgreSQL. https://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html. Accessed 07.17.2021.
Data Warehouse System Architecture. https://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html. Accessed 07.17.2021.
Gupta, A., Agarwal, D., et al., Amazon redshift and the case for simpler data warehouses, Proc. ACM SIGMOD Int. Conf. on Management of Data, Melbourne, 2015, pp. 1917–1923.
The Microsoft Modern Data Warehouse. White Paper, 2016. http://download.microsoft.com/download/C/2/D/C2D2D5FA-768A-49AD-8957-1A434C6C8126/Microsoft_Modern_Data_Warehouse_white_paper.pdf. Accessed 07.18.2021.
Azure Synapse SQL Architecture. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/overview-architecture. Accessed 07.18.2021.
What is Azure Synapse Analytics? https://docs.microsoft.com/en-us/azure/synapse-analytics/overview-what-is. Accessed 07.18.2021.
Use Transactions in a SQL Pool in Azure Synapse. https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/synapse-analytics/sql-data-warehouse/sql-data-warehouse-develop-transactions.md. Accessed 07.18.2021.
Motivala, A. and Yan, J., The Snowflake Elastic Data Warehouse, SIGMOD 2016 and beyond. https://15721.courses.cs.cmu.edu/spring2018/slides/25-snowflake.pdf. Accessed 07.18.2021.
Dageville, B., Cruanes, T., et al., The snowflake elastic data warehouse, Proc. Int. Conf. on Management of Data, San Francisco, 2016, pp. 215–226.
Ailamaki, A., DeWitt, D.J., et al., Weaving relations for Cache performance, Proc. 27th Int. Conf. on Very Large Data Bases, Roma, Sept. 2001, pp. 169–180.
Karger, D., Lehman, E., et al., Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web, Proc. 29th Annu. ACM Symp. on Theory of Computing, El Paso, TX, 1997, pp. 654–663.
Graefe, G., The cascades framework for query optimization, IEEE Bull. Tech. Comm. Data Eng., 1995, vol. 18, no. 3, pp. 19–29.
Faerber, F., Kemper, A., et al., Main memory database systems, Found. Trends Databases, 2016, vol. 8, no. 1–2, pp. 1–130.
Transier, F. and Sanders, P., Engineering basic algorithms of an in-memory text search engine, ACM Trans. Inf. Syst., 2010, art. no. 2.
Ross, J.A., SAP NetWeaver BI Accelerator, SAP PRESS, 2008.
Cha, S.K. and Song, C., P*TIME: highly scalable oltp dbms for managing update-intensive stream workload, Proc. 30th VLDB Conf., Toronto, 2004, pp. 1033–1044.
Bögelsack, A., Gradl, S., Mayer, M., and Krcmar, H., SAP MaxDB Administration, SAP PRESS, 2009.
Faerber, F., May, N., et al., The SAP HANA database – an architecture overview, IEEE Bull. Tech. Comm. Data Eng., 2012, vol. 35, no. 1, pp. 28–33.
Larson, P.-Å., Clinciu, C., et al., SQL server column store indexes, Proc. ACM SIGMOD Int. Conf. on Management of Data, Athens, 2011, pp. 1177–1184.
Larson, P.-Å., Zwilling, M., and Farlee, K., The Hekaton memory-optimized OLTP engine, Bull. Tech. Comm. Data Eng., 2013, vol. 36, no. 2, pp. 34–40.
Larson, P.-Å., Birka, A., et al., Real-time analytical processing with SQL server, Proc. VLDB Endowment, 2015, vol. 8, no. 12, pp. 1740–1751.
Eldawy, A., Levandoski, J., and Larson, P.-Å., Trekking through Siberia: managing cold data in a memory-optimized database, Proc. VLDB Endowment, 2014, vol. 7, no. 11, pp. 931–942.
Lahiri, T., Neimat, M.-A., and Folkman, S., Oracle timesten: an in-memory database for enterprise applications, Bull. Tech. Comm. Data Eng., 2013, vol. 36, no. 2, pp. 6–13.
Listgarten, S. and Neimat, M.-A., Modelling costs for a MM-DBMS, Proc. Int. Workshop on Real-Time Databases, Issues and Applications (RTDB), Newport Beach, CA, 1996, pp. 72–78.
Lahiri, T., Chavan, S., et al., Oracle database in-memory: a dual format in-memory database, Proc. 31st IEEE Int. Conf. on Data Engineering, Seoul, 2015, pp. 1253–1258.
Mukherjee, N., Chavan, S., et al., Distributed architecture of oracle database in-memory, Proc. VLDB Endowment, 2015, vol. 8, no. 12, pp. 1630–1641.
Chavan, S. and Goindi, G., Oracle Database In-Memory on Exadata: a Potent Combination. Oracle OpenWorld 2018. https://www.oracle.com/technetwork/database/exadata/pro4016-exadataandinmemory-5187037.pdf. Accessed 07.18.2021.
Barber, R., Bendel, P., et al., Business analytics in (a) blink, Bull. IEEE Comput. Soc. Tech. Comm. Data Eng., 2012, vol. 35, no. 1, pp. 9–14.
IBM Informix Warehouse Accelerator. Technical White Paper. https://www.iiug.org/library/ids_12/IWA%20White%20Paper-2013-03-21.pdf. Accessed 07.18.2021.
Raman, V., Attaluri, G., et al., DB2 with BLU acceleration: so much more than just a column store, Proc. VLDB Endowment, 2013, vol. 6, no. 11, pp. 1080–1091.
Chen, W.-J., Bläser, B., et al., Architecting and Deploying DB2 with BLU Acceleration, IBM Redbooks, 2014.
Faster Analytics with HyPer. https://www.tableau.com/products/new-features/hyper. Accessed 07.18.2021.
Kemper, A. and Neumann, T., HyPer – hybrid OLTP&OLAP high performance database system, Technical Report, Munich: Technical Univ., 2010, no. TUM-I1010.
Kemper, A., Neumann, T., et al., Transaction processing in the hybrid OLTP&OLAP main-memory database system HyPer, Bull. Tech. Comm. Data Eng., 2013, vol. 36, no. 2, pp. 41–47.
Albutiu, M.-C., Kemper, A., and Neumann, T., Massively parallel sort-merge joins in main memory multi-core database systems, Proc. VLDB Endowment, 2012, vol. 5, no. 10, pp. 1064–1075.
Neumann, T., Mühlbauer, T., and Kemper, A., Fast serializable multi-version concurrency control for main-memory database systems, Proc. ACM SIGMOD Int. Conf. on Management of Data, Melbourne, 2015, pp. 677–689.
Andrei, M., Lemke, C., et al., SAP HANA adoption of non-volatile memory, Proc. VLDB Endowment, 2017, vol. 10, no. 12, pp. 1754–1765.
Dorr, B., How It Works (It Just Runs Faster): Non-Volatile Memory SQL Server Tail of Log Caching on NVDIMM. https://docs.microsoft.com/ru-ru/archive/blogs/bobsql/how-it-works-it-just-runs-faster-non-volatile-memory-sql-server-tail-of-log-caching-on-nvdimm. Accessed 07.18.2021.
Oracle Database 20c. Database Administrator’s Guide. Using Persistent Memory Database. https://docs.oracle.com/en/database/oracle/oracle-database/. Accessed 07.18.2021.
Arulraj, J. and Pavlo, A., Non-Volatile Memory Database Management Systems. Synthesis Lectures on Data Management, Morgan & Claypool Publ., 2019.
Oukid, I., Architectural Principles for Database Systems on Storage-Class Memory, Bonn: Gesellschaft fur Informatik, 2019, pp. 477–486.
ACKNOWLEDGMENTS
This article is based on the materials of a report at the seventh international conference “Actual Problems of System and Software Engineering” (APSSE 2021).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Rights and permissions
About this article
Cite this article
Kuznetsov, S.D., Velikhov, P.E. & Fu, Q. Real-Time Analytics: Benefits, Limitations, and Tradeoffs. Program Comput Soft 49, 1–25 (2023). https://doi.org/10.1134/S036176882301005X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S036176882301005X