Configurable Distributed Data Management for the Internet of the Things †
<p>Anatomy of the Distributed Data Analytics Engine (see [<a href="#B16-information-10-00360" class="html-bibr">16</a>]).</p> "> Figure 2
<p>Configurable Data Preprocessing at the Edge.</p> "> Figure 3
<p>Elements of an APM Schema.</p> "> Figure 4
<p>Root of the FAR-EDGE Data Models.</p> "> Figure 5
<p>Data Definitions and Manifests Subschemas. (<b>a</b>) Factory Data Definitions and Manifests Subschemas (<b>b</b>) metadata Definitions and Manifests Subschemas.</p> "> Figure 6
<p>Concept of CIR Approach as implemented in the PROPHESY Project.</p> "> Figure 7
<p>Dashboard for managing the EA-Engine Implementation.</p> ">
Abstract
:1. Introduction
- High-performance and low-latency operations, i.e., low overhead, near real-time processing of data streams.
- Scalability, i.e., ability to process an arbitrarily large number of streams, with only marginal upgrades to the underlying hardware infrastructure.
- Dynamicity, i.e., ability to dynamically update the results of the analytics functions, upon changes in their configuration, which is essential in cases of in volatile industrial environments where data sources may join or leave dynamically.
- Stream handling, i.e., effective processing of streaming data in addition to transactional static or semi-static data (i.e., data at rest).
- Configurability, i.e., flexible adaptation to different business and factory automation requirements, such as the calculation of various KPIs (Key Performance Indicators) for production processes involving various data sources on diverse industrial environments that may even leverage different streaming middleware platforms and toolkits.
- First, it introduces some of the foundational concepts of the DDA, such as the DSL for describing analytics pipelines. The latter DSL is also illustrated through a practical example.
- Second, it also introduces the concept of the Common Interoperability Registry (CIR) for linking data sources that are described based on other schemes, different that the project’s DSLs. The CIR concept was not part of the conference paper.
- Third, it provides more details and richer information about the architecture of the DDA and its use in the scope of cloud/edge deployments.
2. Architecture of the Configurable Data Analytics Engine
2.1. DDA Overview
2.2. Edge Analytics Engine (EA-Engine)
- Preprocessors, which preprocess (e.g., filtering) data streams and prepare them for analysis by other processors. Preprocessors acquire streaming data through the DR&P component and produce new streams that are made accessible to other processors and applications through the Data Bus of the infrastructure.
- Storage processors, which store streams to some repository such as a data bus, a data store or a database. They provide the persistence functions, which is a key element of any data analytics pipeline.
- Analytics processors, which execute analytics processing functions over data streams ranging from simple statistical computations (e.g., average or a standard deviation) to more complex machine learning tasks (e.g., execution of a classification function). Like preprocessors, analytics processors can access and persist data to the Data Bus.
2.3. Distributed Analytics Engine (DA-Engine)
2.4. Open API for Analytics
- CRUD (Create Update and Delete) operations for AMs, which enable the management of AMs.
- Access to and management of information about deployed instances of analytics workflows, including access to information about AMs and their status.
- Management of a specified AM, including starting, stopping, posing and resuming the execution of an analytics instance expressed in the AM DSL.
3. Digital Modelling and Common Interoperability Registry
3.1. Overview
3.2. Plant Data and Metadata
- Data Source Definition (DSD): Defines the properties of a data source in the shop floor, such as a data stream from a sensor or an automation device.
- Data Interface Specification (DI): It is associated with a data source and provides the information needed to connect to it and access its data (e.g., network protocol, port, network address).
- Data Kind (DK): This specifies the semantics of the data of the data source. It can be used to define virtually any type of data in an extensible way.
- Data Source Manifest (DSM): Specifies a specific instance of a data source in line with its DSD, DI and DK specifications. Multiple manifests are therefore used to represent the data sources that are available in the factory.
- Data Consumer Manifest (DCM): Models an instance of a data consumer, i.e., any application that accesses a data source.
- Data Channel Descriptor (DCD): Models the association between an instance of a consumer and an instance of a data source. Keeps track of the established connections and associations between data sources and data consumers.
- LiveDataSet: Models the actual dataset that stems from an instance of a data source that is represented through a DSM. It is t is associated with a timestamp and keeps track of the location of the data source in case it is associated with a mobile edge node. In principle, the data source comprises a set of name–value pairs, which adhere to different data types in line with the DK of the DSM.
- Edge Gateway: Models an edge gateway of an edge computing deployment. Data sources are associated with an edge gateway, which usually implies not only a logical association, but also a physical association as well.
3.3. Data Analytics Data and Metadata
- Analytics Processor Definition (APD): Specifies processing functions applied on one or more data sources. As outlined, three types of processing functions are supported, including data preprocessing, data storage, and data analytics functions. These three types of functions can be combined in various configurations over the data sources in order to define analytics workflows.
- Analytics Processor Manifest (APM): Represents an instance of a processor that is defined through an APD. Each instance specifies the type of processor and its actual logic through linking to an implementation function (like a Java class).
- Analytics orchestrator Manifest (AM): Represents an analytics workflow as a combination of analytics processor instances (i.e., APMs). It is likely to span multiple edge gateways and to operate over their data sources.
3.4. Common Interoperability Registry
- A database of the digital models presented in the previous paragraphs i.e., models representing diverse data sources such as sensors and automation devices deployed in the field.
- The CIR, which provides the infrastructure for linking objects’ data that resides in different databases. In this way, the CIR enriches of the datasets of the core digital models with additional data and metadata residing in “linked” databases.
- Discoverability: CIR enables the discoverability of all registered objects and helps third party applications to combine the information provided from different systems and databases. To this end, a global unique identifier (in a Universally Unique Identifier (UUID) format) is assigned to each registered object.
- Extensibility: CIR facilitates the flexible extension of the digital models’ infrastructure with information (data/metadata) stemming for additional repositories and databases. This requires however that each new repository extends the data and metadata of the digital models and links them to other models through the CIR at the time of their deployment.
- Manageability: Figure 4 depicts a “Data Management Console” as a core element of the infrastructure. This console is aimed at enabling the configuration of the CIR and the databases of the infrastructure through a single point access.
4. Implementation and Validation
4.1. Open Source Implementation
4.2. Complete Analytics Modelling Scenarios in a Pilot Production Line: Validation in Power Consumption Analytics
- The first scenario utilizes only one Edge Gateway with the Edge Analytics, and
- The second utilizes the full-fledged deployment and of the DDA infrastructure i.e., operates across multiple edge gateways.
- Step 1 (IB Modelling): One Edge Gateway is deployed/bind with each Infrastructure Box. The Infrastructure Box is modelled using the following constructs of the data models infrastructure: (i) The Data Interface (DI) (Table 1); (ii) The Data Source (DSD) (Table 2); (iii) The Data Kind (DK) (Table 3). This information is stored in the Data Model repository of the DDA, which is deployed in the cloud.
- Step 2 (IB Instantiation/Registration): The data models of the IB are used to generate the Data Source Manifest (DSM) (Table 4) and register it to each Edge Gateway.
- Step 3 (Edge Analytics Modelling): The required processors are modelled with the help of an Analytics Processor Definition (APD) object (Table 5). They include: (i) A processor for hourly average calculation from a single data stream; (ii) A processor for persisting results in a nonSQL repository (i.e., a MongoDB in our case). This information is also stored at the data model repository of the DDA.
- Step 4 (Edge Analytics Installation/Registration): Based on the data models specified in the previous steps, it is possible to generate the Analytics Processor Manifest (APM) for each required processor which is registered to the Edge Gateway, in particular: (i) The processor for hourly average calculation from the TotalRealPower data stream; (ii) The processor for hourly average calculation from the TotalRealEnergy data stream; (iii) The processor for persisting results in a database (i.e., MongoDB in our case) of the Edge Gateway, which is to be used by the edge analytics; (iv) The processor for persisting results in the cloud, which is to based used for distributed analytics. Furthermore, a, Analytics Manifest (AM) (Table 6) is generated for orchestrating the instantiated processors. The AM is registered and started through the Edge Gateway Analytics Engine API.
- Step 5 (Distributed Analytics Modelling): In this step the required processors are modelled with the help of an Analytics Processor Definition (APD) object. In particular, the following processors are modelled: (i) A processor for hourly average calculation for values from a database; (ii) A processor for persisting results in the database. The above models are stored in the data model repository at the cloud.
- Step 6 (Distributed Analytics Installation/Registration): The previously specified data models can be used to generate the Analytics Processor Manifest (APM) for each required Processor which is registered to the cloud. These processors include: (i) A for Processor hourly average calculation for TotalRealPower for all Infrastructure Boxes from the global database (i.e., MongoDB in our implementation); (ii) A processor for hourly average calculation from TotalRealEnergy for all Infrastructure Boxes from the global database (i.e., MongoDB in our implementation); (iii) A processor for persisting results in the global database (i.e., MongoDB in our implementation). Moreover, an Analytics Manifest (AM) is modelled and generated for orchestrating the instantiated processors. The AM is registered and started through the Distributed Data Analytics Engine API.
4.3. Validation in Real Production Lines
4.1.1. Validation in Mass Customization
4.1.2. Validation in Predictive Maintenance
5. Conclusions and Future Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Abadi, D.; Carney, D.; Çetintemel, U.; Cherniack, M.; Convey, C.; Lee, S.; Stonebraker, M.; Tatbul, N.; Zdonik, S. Aurora: A new model and architecture for data stream management. VLDB J. 2003, 12, 120–139. [Google Scholar] [CrossRef]
- Chandrasekaran, S.; Cooper, O.; Deshpande, A.; Franklin, M.; Hellerstein, J.; Hong, W.; Krishnamurthy, S.; Madden; Reiss, F.; Shah, M. TelegraphCQ: Continuous dataflow processing. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD ′03), New York, NY, USA, 9–12 June 2003; p. 668. [Google Scholar]
- Arasu, A.; Babu, S.; Widom, J. The CQL continuous query language: Semantic foundations and query execution. VLDB J. 2006, 15, 121–142. [Google Scholar] [CrossRef]
- Ahmad, Y.; Berg, B.; Cetintemrel, U.; Humphrey, M.; Hwang, J.; Jhingran, A.; Maskey, A.; Papaemmanouil, O.; Rasin, A.; Tatbul, N.; et al. Distributed operation in the Borealis stream processing engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD ′05), New York, NY, USA, 14–16 June 2005; pp. 882–884. [Google Scholar]
- Biem, A.; Bouillet, E.; Feng, H.; Ranganathan, A.; Riabov, A.; Verscheure, O.; Koutsopoulos, H.; Moran, C. IBM infosphere streams for scalable, real-time, intelligent transportation services. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD ′10), New York, NY, USA, 6–10 June 2010; pp. 1093–1104. [Google Scholar]
- Gulisano, V.; Jiménez-Peris, R.; Patiño-Martínez, M.; Soriente, C.; Valduriez, P. StreamCloud: An Elastic and Scalable Data Streaming System. IEEE Trans. Parallel Distrib. Syst. 2012, 23, 2351–2365. [Google Scholar] [CrossRef]
- Zaharia, M.; Xin, R.; Wendell, P.; Das, T.; Armbrust, M.; Dave, A.; Meng, X.; Rosen, J.; Venkataraman, S.; Franklin, M.; et al. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM 2016, 59, 56–65. [Google Scholar] [CrossRef]
- Carbone, P.; Katsifodimos, A.; Ewen, S.; Markl, V.; Haridi, S.; Tzoumas, K.K. Apache Flink™: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 2015, 38, 28–38. [Google Scholar]
- Noghabi, S.; Paramasivam, K.; Pan, Y.; Ramesh, N.; Bringhurst, J.; Gupta, I.; Campbell, R. Samza: Stateful scalable stream processing at Linked. Proc. VLDB Endow. 2017, 10, 1634–1645. [Google Scholar] [CrossRef]
- Murray, D.; McSherry, F.; Isaacs, R.; Isard, M.; Barham, P.; Abadi, M. Naiad: A timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ′13), New York, NY, USA, 3–6 November 2013; pp. 439–455. [Google Scholar]
- Kreps, J.; Narkhede, N.; Rao, J. Kafka: A Distributed Messaging System for Log Processing. In Proceedings of the 6th International Workshop on Networking Meets Databases (NetDB), Athens, Greece, 12–16 June 2011. [Google Scholar]
- Isaja, M.; Soldatos, J.; Gezer, V. Combining Edge Computing and Blockchains for Flexibility and Performance in Industrial Automation. In Proceedings of the Eleventh International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies (UBICOMM), Barcelona, Spain, 12–16 November 2017; pp. 159–164. [Google Scholar]
- Kosar, T.; Bohrab, S.; Mernika, M. Domain-Specific Languages: A Systematic Mapping Study. Inf. Softw. Technol. 2016, 71, 77–91. [Google Scholar] [CrossRef]
- Mernik, M.; Heering, J.; Sloane, A. When and how to develop domain-specific languages. ACM Comput. Surv. 2005, 37, 316–344. [Google Scholar] [CrossRef]
- Anagnostopoulos, A.; Soldatos, J.; Michalakos, S. REFiLL: A lightweight programmable middleware platform for cost effective RFID application development. Pervasive Mob. Comput. 2009, 5, 49–63. [Google Scholar] [CrossRef]
- Kefalakis, N.; Soldatos, J.; Konstantinou, N.; Prasad, N. APDL: A reference XML schema for process-centered definition of RFID solutions. J. Syst. Softw. 2011, 84, 1244–1259. [Google Scholar] [CrossRef]
- Kosar, T.; López, P.M.; Barrientos, P.; Mernik, M. A preliminary study on various implementation approaches of domain-specific language. Inf. Softw. Technol. 2008, 50, 390–405. [Google Scholar] [CrossRef]
- Johanson, A.N.; Hasselbring, W. Effectiveness and efficiency of a domain-specific language for high-performance marine ecosystem simulation: A controlled experiment. Empir. Softw. Eng. 2017, 22, 2206–2236. [Google Scholar] [CrossRef]
- Kosar, T.; Gaberc, S.; Carver, J.; Mernik, M. Program comprehension of domain-specific and general-purpose languages: Replication of a family of experiments using integrated development environments. Empir. Softw. Eng. 2018, 23, 2734–2763. [Google Scholar] [CrossRef]
- Silva, A. Model-driven engineering: A survey supported by a unified conceptual model. Comput. Lang. Syst. Struct. 2015, 43, 139–155. [Google Scholar]
- Lelandais, B.; Oudot, M.; Combemale, B. Applying Model-Driven Engineering to High-Performance Computing: Experience Report, Lessons Learned, and Remaining Challenges. J. Comput. Lang. 2019, 55, 1–19. [Google Scholar] [CrossRef]
- Petrali, P.; Isaja, M.; Soldatos, J. Edge Computing and Distributed Ledger Technologies for Flexible Production Lines: A White-Appliances Industry Case; In IFAC: Geneva, Switzerland, 2018; Volume 51, pp. 388–392. [Google Scholar]
- Soldatos, J.; Lazaro, O.; Cavadini, F. (Eds.) The Digital Shopfloor: Industrial Automation in the Industry 4.0 Era Forthcoming Performance Analysis and Applications. River Publishers Series in Automation, Control and Robotics, ISBN 9788770220415, e-ISBN 9788770220408. February 2019. Available online: https://www.riverpublishers.com/book_details.php?book_id=676 (accessed on 18 November 2019).
- Kefalakis, N.; Roukounaki, A.; Soldatos, J. A Configurable Distributed Data Analytics Infrastructure for the Industrial Internet of things. In Proceedings of the DCOSS, Santorini Island, Greece, 29–31 May 2019; pp. 179–181. [Google Scholar]
- Mathew, A.; Zhang, L.; Zhang, S.; Ma, L. A Review of the MIMOSA OSA-EAI Database for Condition Monitoring Systems. In Engineering Asset Management; Mathew, J., Kennedy, J., Ma, L., Tan, A., Anderson, D., Eds.; Springer: London, UK, 2006. [Google Scholar]
- Soldatos, J. Introduction to Industry 4.0 and the Digital Shopfloor Vision. In The Digital Shopfloor: Industrial Automation in the Industry 4.0 Era; River Publishers: Delft, The Netherlands, 2019; ISBN 108770220417. [Google Scholar]
{ |
"_id": "c64709bb-6565-41fe-8256-f6802ad9a538", |
"name": "MQTT topic", |
"communicationProtocol": "MQTT", |
"parameters": { |
"parameter": [ |
{ |
"name": "host", |
"description": "The host where the MQTT broker runs.", |
"dataType": "string", |
"defaultValue": "localhost" |
}, |
{ |
"name": "port", |
"description": "The port where the MQTT broker listens.", |
"dataType": "int", |
"defaultValue": 1883 |
}, |
{ |
"name": "username", |
"description": "The username to use to connect to the MQTT broker.", |
"dataType": "string" |
}, |
{ |
"name": "password", |
"description": "The password to use to connect to the MQTT broker.", |
"dataType": "string" |
}, |
{ |
"name": "topic", |
"description": "The MQTT topic.", |
"dataType": "string" |
} |
] |
} |
} |
{ |
"_id": "e810e6cb-f93e-4c1d-8365-2dda74b63bb8", |
"name": "Total real energy in JSON format over MQTT", |
"dataInterfaceReferenceID": "c64709bb-6565-41fe-8256-f6802ad9a538", |
"dataKindReferenceIDs": { |
"dataKindReferenceID": [ |
"5071a52f-be67-4d21-bddb-24de8b6144a7" |
] |
} |
} |
{ |
"_id": "5071a52f-be67-4d21-bddb-24de8b6144a7", |
"name": "Total real energy in plain text", |
"description": "Total real energy values (in KWh) in plain text.", |
"format": "Plain text", |
"quantityKind": "Energy" |
} |
{ |
"id": "7efa7286-f689-4f51-980c-8e270a1c6d7d", |
"macAddress": "8c:85:90:96:40:cd", |
"dataSourceDefinitionReferenceID": "e810e6cb-f93e-4c1d-8365-2dda74b63bb8", |
"dataSourceDefinitionInterfaceParameters": { |
"parameter": [ |
{ |
"key": "host", |
"value": "192.168.253.240" |
}, |
{ |
"key": "port", |
"value": 1883 |
}, |
{ |
"key": "username", |
"value": "faredge" |
}, |
{ |
"key": "password", |
"value": "*********" |
}, |
{ |
"key": "topic", |
"value": "PhoenixIB/TotalRealEnergy" |
} |
] |
} |
} |
{ |
"_id": "d5dbd123-0cd4-42bf-aa4f-6c459b4f3060", |
"name": "Average calculator", |
"description": "Processors that read values and calculate their average.", |
"processorType": "average", |
"version": "1.0.0" |
} |
{ |
"edgeGatewayReferenceID": "da2282f0-4a13-4f68-a1a3-fe05da03c704", |
"analyticsProcessors": { |
"apm": [ |
{ |
"id": "950354ff-7ce7-4601-a7ea-11f5175d086c", |
"analyticsProcessorDefinitionReferenceID": |
"d5dbd123-0cd4-42bf-aa4f-6c459b4f3060", |
"dataSources": { |
"dataSource": [ |
{ |
"dataSourceManifestReferenceID": |
"2fae2556-d516-464d-8b18-aeb2b473e759" |
} |
] |
}, |
"dataSink": { |
"dataSourceManifestReferenceID": "f006a60b-c40f-4489-a7f7-c73d10d7d0eb" |
} |
} |
] |
} |
} |
{ |
"observation": [{ |
"id": "84190868-8dbb-41ea-8b90-ea8f4699f43a", |
"name": "PhoenixIB-TotalRealEnergy", |
"dataKindReferenceID": "c4b0ecec-0086-46e8-9812-04bde8be2f08", |
"timestamp": "2018-05-24 14:59:14.924", |
"value": 210 |
}], |
"id": "1aed50c9-d507-4e16-b7ca-a44321bc37ac", |
"dataSourceManifestReferenceID": "2fae2556-d516-464d-8b18-aeb2b473e759", |
"mobile": false, |
"timestamp": "2018-05-24 14:59:14.924" |
}{ |
"observation": [{ |
"id": "1dd0666c-b61e-494d-91bc-71ab4a4717c3", |
"name": " PhoenixIB-TotalRealEnergy", |
"dataKindReferenceID": "c4b0ecec-0086-46e8-9812-04bde8be2f08", |
"timestamp": "2018-05-24 15:00:14.931", |
"value": 215 |
}], |
"id": "ba98ea74-646d-4519-bb9e-57a439acb82b", |
"dataSourceManifestReferenceID": "2fae2556-d516-464d-8b18-aeb2b473e759", |
"mobile": false, |
"timestamp": "2018-05-24 15:00:14.931" |
}{ |
"observation": [{ |
"id": "14358864-c768-4bc2-bdb3-19aa88d482e1", |
"name": " PhoenixIB-TotalRealEnergy", |
"dataKindReferenceID": "c4b0ecec-0086-46e8-9812-04bde8be2f08", |
"timestamp": "2018-05-24 15:01:14.95", |
"value": 205 |
}], |
"id": "f2454dad-a6b9-43c1-a741-fbeeebf96b5b", |
"dataSourceManifestReferenceID": "2fae2556-d516-464d-8b18-aeb2b473e759", |
"mobile": false, |
"timestamp": "2018-05-24 15:01:14.95" |
} |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kefalakis, N.; Roukounaki, A.; Soldatos, J. Configurable Distributed Data Management for the Internet of the Things. Information 2019, 10, 360. https://doi.org/10.3390/info10120360
Kefalakis N, Roukounaki A, Soldatos J. Configurable Distributed Data Management for the Internet of the Things. Information. 2019; 10(12):360. https://doi.org/10.3390/info10120360
Chicago/Turabian StyleKefalakis, Nikos, Aikaterini Roukounaki, and John Soldatos. 2019. "Configurable Distributed Data Management for the Internet of the Things" Information 10, no. 12: 360. https://doi.org/10.3390/info10120360
APA StyleKefalakis, N., Roukounaki, A., & Soldatos, J. (2019). Configurable Distributed Data Management for the Internet of the Things. Information, 10(12), 360. https://doi.org/10.3390/info10120360