[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Issue
Volume 5, June
Previous Issue
Volume 4, December
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Big Data Cogn. Comput., Volume 5, Issue 1 (March 2021) – 15 articles

Cover Story (view full-size image): Traditional IoT using Wi-Fi connectivity has inherent compatibility issues. Seamless integration among IoT devices is required to offer smart data-driven sensor controls and insightful user decisions. When information collected by one device is shared with others non-intrusively and intelligently, user acceptance becomes achievable for a smart automation of the future. This research work factors in the optimisation considerations of big data and machine learning approaches to propose a novel methodology for modelling a non-intrusive smart automation system. To validate it, we developed a prototype of our model to uniquely combine personalisation using an IoT hub implementation in a contemporary home environment. A real-time smart home automation use case was demonstrated by employing our model in big data processing and smart analytics via frameworks such as Apache Spark, Apache NiFi and FB-Prophet in [...] Read more.
Order results
Result details
Select all
Export citation of selected articles as:
16 pages, 882 KiB  
Article
ParlTech: Transformation Framework for the Digital Parliament
by Dimitris Koryzis, Apostolos Dalas, Dimitris Spiliotopoulos and Fotios Fitsilis
Big Data Cogn. Comput. 2021, 5(1), 15; https://doi.org/10.3390/bdcc5010015 - 15 Mar 2021
Cited by 24 | Viewed by 9841
Abstract
Societies are entering the age of technological disruption, which also impacts governance institutions such as parliamentary organizations. Thus, parliaments need to adjust swiftly by incorporating innovative methods into their organizational culture and novel technologies into their working procedures. Inter-Parliamentary Union World e-Parliament Reports [...] Read more.
Societies are entering the age of technological disruption, which also impacts governance institutions such as parliamentary organizations. Thus, parliaments need to adjust swiftly by incorporating innovative methods into their organizational culture and novel technologies into their working procedures. Inter-Parliamentary Union World e-Parliament Reports capture digital transformation trends towards open data production, standardized and knowledge-driven business processes, and the implementation of inclusive and participatory schemes. Nevertheless, there is still a limited consensus on how these trends will materialize into specific tools, products, and services, with added value for parliamentary and societal stakeholders. This article outlines the rapid evolution of the digital parliament from the user perspective. In doing so, it describes a transformational framework based on the evaluation of empirical data by an expert survey of parliamentarians and parliamentary administrators. Basic sets of tools and technologies that are perceived as vital for future parliamentary use by intra-parliamentary stakeholders, such as systems and processes for information and knowledge sharing, are analyzed. Moreover, boundary conditions for development and implementation of parliamentary technologies are set and highlighted. Concluding recommendations regarding the expected investments, interdisciplinary research, and cross-sector collaboration within the defined framework are presented. Full article
(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)
Show Figures

Figure 1

Figure 1
<p>ParlTech hype cycle for year 2020.</p>
Full article ">Figure 2
<p>Digital parliament transformation framework.</p>
Full article ">
20 pages, 3775 KiB  
Article
Stacked Community Prediction: A Distributed Stacking-Based Community Extraction Methodology for Large Scale Social Networks
by Christos Makris and Georgios Pispirigos
Big Data Cogn. Comput. 2021, 5(1), 14; https://doi.org/10.3390/bdcc5010014 - 12 Mar 2021
Cited by 5 | Viewed by 4712
Abstract
Nowadays, due to the extensive use of information networks in a broad range of fields, e.g., bio-informatics, sociology, digital marketing, computer science, etc., graph theory applications have attracted significant scientific interest. Due to its apparent abstraction, community detection has become one of the [...] Read more.
Nowadays, due to the extensive use of information networks in a broad range of fields, e.g., bio-informatics, sociology, digital marketing, computer science, etc., graph theory applications have attracted significant scientific interest. Due to its apparent abstraction, community detection has become one of the most thoroughly studied graph partitioning problems. However, the existing algorithms principally propose iterative solutions of high polynomial order that repetitively require exhaustive analysis. These methods can undoubtedly be considered resource-wise overdemanding, unscalable, and inapplicable in big data graphs, such as today’s social networks. In this article, a novel, near-linear, and highly scalable community prediction methodology is introduced. Specifically, using a distributed, stacking-based model, which is built on plain network topology characteristics of bootstrap sampled subgraphs, the underlined community hierarchy of any given social network is efficiently extracted in spite of its size and density. The effectiveness of the proposed methodology has diligently been examined on numerous real-life social networks and proven superior to various similar approaches in terms of performance, stability, and accuracy. Full article
Show Figures

Figure 1

Figure 1
<p>The flowchart and the diagram of the stacking ensemble prediction methodology.</p>
Full article ">Figure 2
<p>Prediction performance metrics per different community prediction methodology per distinct social graph: (<b>a</b>) Accuracy; (<b>b</b>) Recall; (<b>c</b>) Precision; (<b>d</b>) Specificity; (<b>e</b>) F1-Score.</p>
Full article ">Figure 2 Cont.
<p>Prediction performance metrics per different community prediction methodology per distinct social graph: (<b>a</b>) Accuracy; (<b>b</b>) Recall; (<b>c</b>) Precision; (<b>d</b>) Specificity; (<b>e</b>) F1-Score.</p>
Full article ">Figure 2 Cont.
<p>Prediction performance metrics per different community prediction methodology per distinct social graph: (<b>a</b>) Accuracy; (<b>b</b>) Recall; (<b>c</b>) Precision; (<b>d</b>) Specificity; (<b>e</b>) F1-Score.</p>
Full article ">Figure 3
<p>Extracted Community Structure per different community prediction methodology for Facebook [<a href="#B20-BDCC-05-00014" class="html-bibr">20</a>] graph: (<b>a</b>) Louvain algorithm’s [<a href="#B7-BDCC-05-00014" class="html-bibr">7</a>]; (<b>b</b>) Distributed LR model [<a href="#B15-BDCC-05-00014" class="html-bibr">15</a>] methodology’s returned; (<b>c</b>) Distributed bagging ensemble [<a href="#B16-BDCC-05-00014" class="html-bibr">16</a>] methodology’s; (<b>d</b>) Distributed stacking ensemble methodology’s.</p>
Full article ">Figure 3 Cont.
<p>Extracted Community Structure per different community prediction methodology for Facebook [<a href="#B20-BDCC-05-00014" class="html-bibr">20</a>] graph: (<b>a</b>) Louvain algorithm’s [<a href="#B7-BDCC-05-00014" class="html-bibr">7</a>]; (<b>b</b>) Distributed LR model [<a href="#B15-BDCC-05-00014" class="html-bibr">15</a>] methodology’s returned; (<b>c</b>) Distributed bagging ensemble [<a href="#B16-BDCC-05-00014" class="html-bibr">16</a>] methodology’s; (<b>d</b>) Distributed stacking ensemble methodology’s.</p>
Full article ">Figure 3 Cont.
<p>Extracted Community Structure per different community prediction methodology for Facebook [<a href="#B20-BDCC-05-00014" class="html-bibr">20</a>] graph: (<b>a</b>) Louvain algorithm’s [<a href="#B7-BDCC-05-00014" class="html-bibr">7</a>]; (<b>b</b>) Distributed LR model [<a href="#B15-BDCC-05-00014" class="html-bibr">15</a>] methodology’s returned; (<b>c</b>) Distributed bagging ensemble [<a href="#B16-BDCC-05-00014" class="html-bibr">16</a>] methodology’s; (<b>d</b>) Distributed stacking ensemble methodology’s.</p>
Full article ">Figure 4
<p>Network analysis comparison of the proposed stacking methodology against the Louvain [<a href="#B7-BDCC-05-00014" class="html-bibr">7</a>,<a href="#B36-BDCC-05-00014" class="html-bibr">36</a>] and the Girvan-Newman [<a href="#B8-BDCC-05-00014" class="html-bibr">8</a>,<a href="#B37-BDCC-05-00014" class="html-bibr">37</a>] community detection algorithms: (<b>a</b>) Coverage; (<b>b</b>) Performance; (<b>c</b>) Modularity.</p>
Full article ">Figure 4 Cont.
<p>Network analysis comparison of the proposed stacking methodology against the Louvain [<a href="#B7-BDCC-05-00014" class="html-bibr">7</a>,<a href="#B36-BDCC-05-00014" class="html-bibr">36</a>] and the Girvan-Newman [<a href="#B8-BDCC-05-00014" class="html-bibr">8</a>,<a href="#B37-BDCC-05-00014" class="html-bibr">37</a>] community detection algorithms: (<b>a</b>) Coverage; (<b>b</b>) Performance; (<b>c</b>) Modularity.</p>
Full article ">Figure 5
<p>The average execution time of the proposed community prediction methodology and the classic community detection algorithms per distinct social graph in seconds.</p>
Full article ">
16 pages, 2518 KiB  
Article
From data Processing to Knowledge Processing: Working with Operational Schemas by Autopoietic Machines
by Mark Burgin and Rao Mikkilineni
Big Data Cogn. Comput. 2021, 5(1), 13; https://doi.org/10.3390/bdcc5010013 - 10 Mar 2021
Cited by 11 | Viewed by 6365
Abstract
Knowledge processing is an important feature of intelligence in general and artificial intelligence in particular. To develop computing systems working with knowledge, it is necessary to elaborate the means of working with knowledge representations (as opposed to data), because knowledge is an abstract [...] Read more.
Knowledge processing is an important feature of intelligence in general and artificial intelligence in particular. To develop computing systems working with knowledge, it is necessary to elaborate the means of working with knowledge representations (as opposed to data), because knowledge is an abstract structure. There are different forms of knowledge representations derived from data. One of the basic forms is called a schema, which can belong to one of three classes: operational, descriptive, and representation schemas. The goal of this paper is the development of theoretical and practical tools for processing operational schemas. To achieve this goal, we use schema representations elaborated in the mathematical theory of schemas and use structural machines as a powerful theoretical tool for modeling parallel and concurrent computational processes. We describe the schema of autopoietic machines as physical realizations of structural machines. An autopoietic machine is a technical system capable of regenerating, reproducing, and maintaining itself by production, transformation, and destruction of its components and the networks of processes downstream contained in them. We present the theory and practice of designing and implementing autopoietic machines as information processing structures integrating both symbolic computing and neural networks. Autopoietic machines use knowledge structures containing the behavioral evolution of the system and its interactions with the environment to maintain stability by counteracting fluctuations. Full article
(This article belongs to the Special Issue Big Data Analytics and Cloud Data Management)
Show Figures

Figure 1

Figure 1
<p>A schema of transducer hardware.</p>
Full article ">Figure 2
<p>A schema of parallel information processing device hardware.</p>
Full article ">Figure 3
<p>A schema of a Turing machine.</p>
Full article ">Figure 4
<p>A schema of an inductive Turing machine.</p>
Full article ">Figure 5
<p>A schema of the transducer hardware.</p>
Full article ">Figure 6
<p>Data and the program stored in the computer memory are processed by the CPU in the information processor.</p>
Full article ">Figure 7
<p>The schema with a triadic automaton represents a knowledge structure containing various object, inter-object, and intra-object relationships and behaviors, which emerge when an event occurs, changing the objects or their relationships.</p>
Full article ">Figure 8
<p>A knowledge structure modeling intra-object and inter-object behaviors.</p>
Full article ">Figure 9
<p>Schema managing infware, hardware, and software for the deployment, configuring, monitoring, and managing distributed application workloads on cloud resources.</p>
Full article ">
18 pages, 1958 KiB  
Article
Processing Big Data with Apache Hadoop in the Current Challenging Era of COVID-19
by Otmane Azeroual and Renaud Fabre
Big Data Cogn. Comput. 2021, 5(1), 12; https://doi.org/10.3390/bdcc5010012 - 9 Mar 2021
Cited by 26 | Viewed by 11099
Abstract
Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should [...] Read more.
Big data have become a global strategic issue, as increasingly large amounts of unstructured data challenge the IT infrastructure of global organizations and threaten their capacity for strategic forecasting. As experienced in former massive information issues, big data technologies, such as Hadoop, should efficiently tackle the incoming large amounts of data and provide organizations with relevant processed information that was formerly neither visible nor manageable. After having briefly recalled the strategic advantages of big data solutions in the introductory remarks, in the first part of this paper, we focus on the advantages of big data solutions in the currently difficult time of the COVID-19 pandemic. We characterize it as an endemic heterogeneous data context; we then outline the advantages of technologies such as Hadoop and its IT suitability in this context. In the second part, we identify two specific advantages of Hadoop solutions, globality combined with flexibility, and we notice that they are at work with a “Hadoop Fusion Approach” that we describe as an optimal response to the context. In the third part, we justify selected qualifications of globality and flexibility by the fact that Hadoop solutions enable comparable returns in opposite contexts of models of partial submodels and of models of final exact systems. In part four, we remark that in both these opposite contexts, Hadoop’s solutions allow a large range of needs to be fulfilled, which fits with requirements previously identified as the current heterogeneous data structure of COVID-19 information. In the final part, we propose a framework of strategic data processing conditions. To the best of our knowledge, they appear to be the most suitable to overcome COVID-19 massive information challenges. Full article
(This article belongs to the Special Issue Big Data Analytics and Cloud Data Management)
Show Figures

Figure 1

Figure 1
<p>Bipartite Graph.</p>
Full article ">Figure 2
<p>Determine the augmentation route.</p>
Full article ">Figure 3
<p>Augmentation way.</p>
Full article ">Figure 4
<p>Optimal matching.</p>
Full article ">Figure 5
<p>Schematic process of MapReduce.</p>
Full article ">
17 pages, 23087 KiB  
Article
A Network-Based Analysis of a Worksite Canteen Dataset
by Vincenza Carchiolo, Marco Grassia, Alessandro Longheu, Michele Malgeri and Giuseppe Mangioni
Big Data Cogn. Comput. 2021, 5(1), 11; https://doi.org/10.3390/bdcc5010011 - 8 Mar 2021
Cited by 5 | Viewed by 6352
Abstract
The provision of wellness in workplaces gained interest in recent decades. A factor that contributes significantly to workers’ health is their diet, especially when provided by canteen services. The assessment of such a service involves questions as food cost, its sustainability, quality, nutritional [...] Read more.
The provision of wellness in workplaces gained interest in recent decades. A factor that contributes significantly to workers’ health is their diet, especially when provided by canteen services. The assessment of such a service involves questions as food cost, its sustainability, quality, nutritional facts and variety, as well as employees’ health and disease prevention, productivity increase, economic convenience vs. eating satisfaction when using canteen services. Even if food habits have already been studied using traditional statistical approaches, here we adopt an approach based on Network Science that allows us to deeply study, for instance, the interconnections among people, company and meals and that can be easily used for further analysis. In particular, this work concerns a multi-company dataset of workers and dishes they chose at a canteen worksite. We study eating habits and health consequences, also considering the presence of different companies and the corresponding contact network among workers. The macro-nutrient content and caloric values assessment is carried out both for dishes and for employees, in order to establish when food is balanced and healthy. Moreover, network analysis lets us discover hidden correlations among people and the environment, as communities that cannot be usually inferred with traditional or methods since they are not known a priori. Finally, we represent the dataset as a tripartite network to investigate relationships between companies, people, and dishes. In particular, the so-called network projections can be extracted, each one being a network among specific kind of nodes; further community analysis tools will provide hidden information about people and their food habits. In summary, the contribution of the paper is twofold: it provides a study of a real dataset spanning over several years that gives a new interesting point of view on food habits and healthcare, and it also proposes a new approach based on Network Science. Results prove that this kind of analysis can provide significant information that complements other traditional methodologies. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Figure 1
<p>Total number of meals chosen by workers, grouped by category.</p>
Full article ">Figure 2
<p>Tripartite network representation of the dataset. Each colored line connects the employee (center column) with chosen dish (right column) and the company he/she belongs to (left column).</p>
Full article ">Figure 3
<p>Number of ingredients used in dishes preparation. The graph shows how many dishes (y-axis) use k ingredients (k ranging from 1 to 11 on x-axis).</p>
Full article ">Figure 4
<p>Distribution of the proteins, lipids, carbohydrates and kilocalories across categories (1: Bread and pizza, 2: Cold cuts, 3: First course, 4: Main course, 5: Salads).</p>
Full article ">Figure 5
<p>Degree and Strength distributions in the People network. The degree (<b>a</b>) is the number of unique different dishes consumed by a given number of people (x-axis). The strength (<b>b</b>) is the number of times that persons had a meal.</p>
Full article ">Figure 6
<p>Dishes degree and strength distributions.The degree (<b>a</b>) is the number of different people that ordered a given number of dishes at least once (x-axis). The strength (<b>b</b>) is the number of times meals have been ordered.</p>
Full article ">Figure 7
<p>Dishes category distribution. How many people (y-axis) choose food categories (1: Bread and pizza, 2: Cold cuts, 3: First course, 4: Main course, 5: Salads)</p>
Full article ">Figure 8
<p>Distribution of caloric values among served dishes.</p>
Full article ">Figure 9
<p>Division of the dishes (<b>a</b>) and diet (<b>b</b>) of employees in balanced (True) and unbalanced (False) according to [<a href="#B42-BDCC-05-00011" class="html-bibr">42</a>,<a href="#B43-BDCC-05-00011" class="html-bibr">43</a>] guidelines.</p>
Full article ">Figure 10
<p>Distribution of the unbalance of macro-nutrients (Carbohydrates, Lipids, Proteins) among dishes. Caloric value of most dishes comes from an excess of lipids at the expense of carbohydrates.</p>
Full article ">Figure 11
<p>Distribution of overall and macro-nutrients’ caloric value of customers’ diet.</p>
Full article ">Figure 12
<p>Distribution of the unbalance of macro-nutrients of the diet of customers. Unbalance comes from an abundance of lipids at the expense of carbohydrates.</p>
Full article ">Figure 13
<p>Communities of people. Three communities (identified by a numeric id on the right) were discovered with the Louvain algorithm variant implemented in Pajek [<a href="#B48-BDCC-05-00011" class="html-bibr">48</a>].</p>
Full article ">Figure 14
<p>Communities of dishes. Three communities (identified by a numeric id on the right) were discovered with the Louvain algorithm variant implemented in Pajek [<a href="#B48-BDCC-05-00011" class="html-bibr">48</a>].</p>
Full article ">Figure 15
<p>Different category of dishes inside each community.</p>
Full article ">
21 pages, 7004 KiB  
Article
Automatic Defects Segmentation and Identification by Deep Learning Algorithm with Pulsed Thermography: Synthetic and Experimental Data
by Qiang Fang, Clemente Ibarra-Castanedo and Xavier Maldague
Big Data Cogn. Comput. 2021, 5(1), 9; https://doi.org/10.3390/bdcc5010009 - 26 Feb 2021
Cited by 39 | Viewed by 6044
Abstract
In quality evaluation (QE) of the industrial production field, infrared thermography (IRT) is one of the most crucial techniques used for evaluating composite materials due to the properties of low cost, fast inspection of large surfaces, and safety. The application of deep neural [...] Read more.
In quality evaluation (QE) of the industrial production field, infrared thermography (IRT) is one of the most crucial techniques used for evaluating composite materials due to the properties of low cost, fast inspection of large surfaces, and safety. The application of deep neural networks tends to be a prominent direction in IRT Non-Destructive Testing (NDT). During the training of the neural network, the Achilles heel is the necessity of a large database. The collection of huge amounts of training data is the high expense task. In NDT with deep learning, synthetic data contributing to training in infrared thermography remains relatively unexplored. In this paper, synthetic data from the standard Finite Element Models are combined with experimental data to build repositories with Mask Region based Convolutional Neural Networks (Mask-RCNN) to strengthen the neural network, learning the essential features of objects of interest and achieving defect segmentation automatically. These results indicate the possibility of adapting inexpensive synthetic data merging with a certain amount of the experimental database for training the neural networks in order to achieve the compelling performance from a limited collection of the annotated experimental data of a real-world practical thermography experiment. Full article
(This article belongs to the Special Issue Machine Learning and Data Analysis for Image Processing)
Show Figures

Figure 1

Figure 1
<p>Pulsed thermography experimental setup optical excitation.</p>
Full article ">Figure 2
<p>Proposed segmentation strategy.</p>
Full article ">Figure 3
<p>Mask-RCNN processing architecture [<a href="#B6-BDCC-05-00009" class="html-bibr">6</a>].</p>
Full article ">Figure 4
<p>(<b>a</b>) Finite Element Modeling (FEM) 3D model; (<b>b</b>) Simulated thermogram at t = 106.5 s; (<b>c</b>) Real experimental data t = 106.5 s.</p>
Full article ">Figure 5
<p>Proposed workflow to train with a deep learning model based on the data generation by Finite Element Modeling.</p>
Full article ">Figure 6
<p>Scheme of preprocessing stage.</p>
Full article ">Figure 7
<p>Labels for preprocessed sample image.</p>
Full article ">Figure 7 Cont.
<p>Labels for preprocessed sample image.</p>
Full article ">Figure 8
<p>The best obtained validation results of Mask-RCNN segmentation on different training databases. From left to right: original images, training on the preprocessed raw images database, training on the mixed database (preprocessed data from synthetic and raw images). From the first three rows to the last two rows: plexiglass (<b>a</b>–<b>c</b>), carbon fiber-reinforced polymer (CFRP) (<b>d</b>,<b>e</b>).</p>
Full article ">Figure 8 Cont.
<p>The best obtained validation results of Mask-RCNN segmentation on different training databases. From left to right: original images, training on the preprocessed raw images database, training on the mixed database (preprocessed data from synthetic and raw images). From the first three rows to the last two rows: plexiglass (<b>a</b>–<b>c</b>), carbon fiber-reinforced polymer (CFRP) (<b>d</b>,<b>e</b>).</p>
Full article ">Figure 9
<p>The average learning loss for two types of specimens: plexiglass (<b>a</b>,<b>b</b>); CFRP (<b>c</b>,<b>d</b>).</p>
Full article ">Figure 10
<p>Different detection results with four groups of datasets (two types of materials).</p>
Full article ">Figure 11
<p>Probability of distribution curve of different methods for processing on CFRP samples (<b>a</b>)/PLEXI samples (<b>b</b>) (confidence score = 0.75).</p>
Full article ">Figure 12
<p>The total performance of accuracy with Mask-RCNN on CFRP and PLEXI samples with/without synthetic data.</p>
Full article ">Figure 13
<p>Detection results on a reprehensive CFRP specimen provided by different objective detection algorithms or scenarios (<b>a</b>) Master-RCNN without synthetic data; (<b>b</b>) Master-RCNN with synthetic data; (<b>c</b>)YOLO-V3; (<b>d</b>) Faster-RCNN.</p>
Full article ">Figure 14
<p>Probability of distribution of different deep learning methods on CFRP databases (confidence score = 0.75).</p>
Full article ">
40 pages, 4247 KiB  
Review
IoT Technologies for Livestock Management: A Review of Present Status, Opportunities, and Future Trends
by Bernard Ijesunor Akhigbe, Kamran Munir, Olugbenga Akinade, Lukman Akanbi and Lukumon O. Oyedele
Big Data Cogn. Comput. 2021, 5(1), 10; https://doi.org/10.3390/bdcc5010010 - 26 Feb 2021
Cited by 69 | Viewed by 18910
Abstract
The world population currently stands at about 7 billion amidst an expected increase in 2030 from 9.4 billion to around 10 billion in 2050. This burgeoning population has continued to influence the upward demand for animal food. Moreover, the management of finite resources [...] Read more.
The world population currently stands at about 7 billion amidst an expected increase in 2030 from 9.4 billion to around 10 billion in 2050. This burgeoning population has continued to influence the upward demand for animal food. Moreover, the management of finite resources such as land, the need to reduce livestock contribution to greenhouse gases, and the need to manage inherent complex, highly contextual, and repetitive day-to-day livestock management (LsM) routines are some examples of challenges to overcome in livestock production. The Internet of Things (IoT)’s usefulness in other vertical industries (OVI) shows that its role will be significant in LsM. This work uses the systematic review methodology of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) to guide a review of existing literature on IoT in OVI. The goal is to identify the IoT’s ecosystem, architecture, and its technicalities—present status, opportunities, and expected future trends—regarding its role in LsM. Among identified IoT roles in LsM, the authors found that data will be its main contributor. The traditional approach of reactive data processing will give way to the proactive approach of augmented analytics to provide insights about animal processes. This will undoubtedly free LsM from the drudgery of repetitive tasks with opportunities for improved productivity. Full article
Show Figures

Figure 1

Figure 1
<p>Illustration of a multi-layer smart Internet of Things (IoT) approach for livestock management.</p>
Full article ">Figure 2
<p>(<b>a</b>) A typical IoT ecosystem with (<b>b</b>) architecture.</p>
Full article ">Figure 3
<p>Data as a common denominator and possible integrator with sundry domains.</p>
Full article ">Figure 4
<p>Different types of actuators.</p>
Full article ">Figure 5
<p>The systematic review procedure using PRISMA.</p>
Full article ">Figure 6
<p>Showing IoT characteristics (i.e., role) that integrate AI by way of being helped to take the management responsibility in LsM.</p>
Full article ">Figure 7
<p>Standard principles for assessing animal welfare.</p>
Full article ">Figure 8
<p>A typical mobile-edge computing (MEC) architecture.</p>
Full article ">
20 pages, 750 KiB  
Article
NLA-Bit: A Basic Structure for Storing Big Data with Complexity O(1)
by Krasimira Borislavova Ivanova
Big Data Cogn. Comput. 2021, 5(1), 8; https://doi.org/10.3390/bdcc5010008 - 24 Feb 2021
Cited by 4 | Viewed by 4330
Abstract
This paper introduces a novel approach for storing Resource Description Framework (RDF) data based on the possibilities of Natural Language Addressing (NLA) and on a special NLA basic structure for storing Big Data, called “NLA-bit”, which is aimed to support middle-size or large [...] Read more.
This paper introduces a novel approach for storing Resource Description Framework (RDF) data based on the possibilities of Natural Language Addressing (NLA) and on a special NLA basic structure for storing Big Data, called “NLA-bit”, which is aimed to support middle-size or large distributed RDF triple or quadruple stores with time complexity O(1). The main idea of NLA is to use letter codes as coordinates (addresses) for data storing. This avoids indexing and provides high-speed direct access to the data with time complexity O(1). NLA-bit is a structured set of all RDF instances with the same “Subject”. An example based on a document system, where every document is stored as NLA-bit, which contains all data connected to it by metadata links, is discussed. The NLA-bits open up a wide field for research and practical implementations in the field of large databases with dynamic semi-structured data (Big Data). Important advantages of the approach are as follow: (1) The reduction of the amount of occupied memory due to the complete absence of additional indexes, absolute addresses, pointers, and additional files; (2) reduction of processing time due to the complete lack of demand—the data are stored/extracted to/from a direct address. Full article
Show Figures

Figure 1

Figure 1
<p>Natural Language Addressing (NLA)-bits with same relations (layers).</p>
Full article ">Figure A1
<p>Results for recording and retrieving (<b>a</b>) 1000 documents and (<b>b</b>) 10,000 documents.</p>
Full article ">
15 pages, 4355 KiB  
Article
The Potential of the SP System in Machine Learning and Data Analysis for Image Processing
by J. Gerard Wolff
Big Data Cogn. Comput. 2021, 5(1), 7; https://doi.org/10.3390/bdcc5010007 - 23 Feb 2021
Cited by 2 | Viewed by 3924
Abstract
This paper aims to describe how pattern recognition and scene analysis may with advantage be viewed from the perspective of the SP system (meaning the SP theory of intelligence and its realisation in the SP computer model (SPCM), both described in an appendix), [...] Read more.
This paper aims to describe how pattern recognition and scene analysis may with advantage be viewed from the perspective of the SP system (meaning the SP theory of intelligence and its realisation in the SP computer model (SPCM), both described in an appendix), and the strengths and potential of the system in those areas. In keeping with evidence for the importance of information compression (IC) in human learning, perception, and cognition, IC is central in the structure and workings of the SPCM. Most of that IC is achieved via the powerful concept of SP-multiple-alignment, which is largely responsible for the AI-related versatility of the system. With examples from the SPCM, the paper describes: how syntactic parsing and pattern recognition may be achieved, with corresponding potential for visual parsing and scene analysis; how those processes are robust in the face of errors in input data; how in keeping with what people do, the SP system can “see” things in its data that are not objectively present; the system can recognise things at multiple levels of abstraction and via part-whole hierarchies, and via an integration of the two; the system also has potential for the creation of a 3D construct from pictures of a 3D object from different viewpoints, and for the recognition of 3D entities. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The best SP-multiple-alignment created by the SP computer model (SPCM) that achieves the effect of parsing a sentence (“<tt>t h e a p p l e s a r e s w e e t</tt>”), as described in the text. Reproduced from Figure 4 in [<a href="#B16-BDCC-05-00007" class="html-bibr">16</a>], with permission.</p>
Full article ">Figure 2
<p>The best SPMA created by the SPCM that achieves the effect of parsing a sentence (“<tt>t h e a p p l e s a r e s w e e t</tt>”), as described in the text.</p>
Full article ">Figure 3
<p>The best SPMA created by the SPCM, with a set of new SP-patterns (in column 0) that describe some features of an unknown plant, and a set of old SP-patterns, including those shown in columns 1 to 6, that describe different categories of plant, with their parts and sub-parts, and other attributes. Reproduced with permission from Figure 16 in [<a href="#B10-BDCC-05-00007" class="html-bibr">10</a>].</p>
Full article ">Figure 4
<p>Plan view of a 3D object, with five views of it as seen from above, as described in the text. Adapted from Figure 11 in [<a href="#B18-BDCC-05-00007" class="html-bibr">18</a>], with permission.</p>
Full article ">Figure A1
<p>Schematic representation of the SP system from an “input” perspective. Reproduced, with permission, from <a href="#BDCC-05-00007-f001" class="html-fig">Figure 1</a> in [<a href="#B10-BDCC-05-00007" class="html-bibr">10</a>].</p>
Full article ">Figure A2
<p>The best SPMA created by the SPCM that achieves the effect of parsing a sentence (“<tt>t h e a p p l e s a r e s w e e t</tt>”), as described in the text.</p>
Full article ">Figure A3
<p>The best SPMA created by the SPCM with a new SP-pattern “<tt>t h e b l a c k c a t w a l k s</tt>” and an old SP-pattern “<tt>&lt; 1 t h e c a t w a l k s &gt;</tt>”.</p>
Full article ">Figure A4
<p>Schematic representation of the development and application of the SP machine. Reproduced from Figure 2 in [<a href="#B10-BDCC-05-00007" class="html-bibr">10</a>], with permission.</p>
Full article ">
21 pages, 3920 KiB  
Article
Big Data and Personalisation for Non-Intrusive Smart Home Automation
by Suriya Priya R. Asaithambi, Sitalakshmi Venkatraman and Ramanathan Venkatraman
Big Data Cogn. Comput. 2021, 5(1), 6; https://doi.org/10.3390/bdcc5010006 - 30 Jan 2021
Cited by 30 | Viewed by 9264
Abstract
With the advent of the Internet of Things (IoT), many different smart home technologies are commercially available. However, the adoption of such technologies is slow as many of them are not cost-effective and focus on specific functions such as energy efficiency. Recently, IoT [...] Read more.
With the advent of the Internet of Things (IoT), many different smart home technologies are commercially available. However, the adoption of such technologies is slow as many of them are not cost-effective and focus on specific functions such as energy efficiency. Recently, IoT devices and sensors have been designed to enhance the quality of personal life by having the capability to generate continuous data streams that can be used to monitor and make inferences by the user. While smart home devices connect to the home Wi-Fi network, there are still compatibility issues between devices from different manufacturers. Smart devices get even smarter when they can communicate with and control each other. The information collected by one device can be shared with others for achieving an enhanced automation of their operations. This paper proposes a non-intrusive approach of integrating and collecting data from open standard IoT devices for personalised smart home automation using big data analytics and machine learning. We demonstrate the implementation of our proposed novel technology instantiation approach for achieving non-intrusive IoT based big data analytics with a use case of a smart home environment. We employ open-source frameworks such as Apache Spark, Apache NiFi and FB-Prophet along with popular vendor tech-stacks such as Azure and DataBricks. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Figure 1
<p>Generic IoT architecture with cloud and big data processing extensions.</p>
Full article ">Figure 2
<p>Logical architecture of modelling big data and personalisation.</p>
Full article ">Figure 3
<p>High level physical big data architecture.</p>
Full article ">Figure 4
<p>Physical layout of a smart home prototype with IoT deployment.</p>
Full article ">Figure 5
<p>A typical smart home hub in a star topology.</p>
Full article ">Figure 6
<p>Software components employed in smart home hub, cloud and big data processing.</p>
Full article ">Figure 7
<p>Analytics components using Azure Databricks tools.</p>
Full article ">Figure 8
<p>Apache NiFi data flow design.</p>
Full article ">Figure 9
<p>Interactions among a smart home and Azure Databricks.</p>
Full article ">Figure 10
<p>Spark Stream processing pipeline for activity patterns.</p>
Full article ">
21 pages, 8633 KiB  
Article
An Exploratory Study of COVID-19 Information on Twitter in the Greater Region
by Ninghan Chen, Zhiqiang Zhong and Jun Pang
Big Data Cogn. Comput. 2021, 5(1), 5; https://doi.org/10.3390/bdcc5010005 - 28 Jan 2021
Cited by 8 | Viewed by 6169
Abstract
The outbreak of the COVID-19 led to a burst of information in major online social networks (OSNs). Facing this constantly changing situation, OSNs have become an essential platform for people expressing opinions and seeking up-to-the-minute information. Thus, discussions on OSNs may become a [...] Read more.
The outbreak of the COVID-19 led to a burst of information in major online social networks (OSNs). Facing this constantly changing situation, OSNs have become an essential platform for people expressing opinions and seeking up-to-the-minute information. Thus, discussions on OSNs may become a reflection of reality. This paper aims to figure out how Twitter users in the Greater Region (GR) and related countries react differently over time through conducting a data-driven exploratory study of COVID-19 information using machine learning and representation learning methods. We find that tweet volume and COVID-19 cases in GR and related countries are correlated, but this correlation only exists in a particular period of the pandemic. Moreover, we plot the changing of topics in each country and region from 22 January 2020 to 5 June 2020, figuring out the main differences between GR and related countries. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Figure 1
<p>User location heatmap of the Greater Region (GR) and the related countries.</p>
Full article ">Figure 2
<p>Daily tweet volume and COVID-19 new cases (On 3rd June, France published a revision of data that lead to a negative number of new cases, see [<a href="#B42-BDCC-05-00005" class="html-bibr">42</a>] for the original news).</p>
Full article ">Figure 3
<p>Effective reproductive rate (<math display="inline"><semantics> <mrow> <mi>R</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>).</p>
Full article ">Figure 4
<p>Total days for each pandemic period.</p>
Full article ">Figure 5
<p>PC (Pearson’s correlation) between tweet volume and COVID-19 daily cases with different lags.</p>
Full article ">Figure 6
<p>Workflow of topic modelling and classification.</p>
Full article ">Figure 7
<p>A sample of Uniform Manifold Approximation and Projection (UMAP) clustering results.</p>
Full article ">Figure 8
<p>Topic categories in the GR and related countries. 1: Wuhan and China; 2: Measures; 3: Local news; 4: International news; 5: Policy and daily life; 6: Racism; 7: Other.</p>
Full article ">Figure 9
<p>Distribution of proportion of tweets on ‘policy and daily life’ and ‘local news’ during Free-contagious and Measures period.</p>
Full article ">Figure 10
<p>Word cloud of Luxembourg Tweets from 22 January 2020 to 1 March 2020.</p>
Full article ">
16 pages, 1054 KiB  
Article
NLP-Based Customer Loyalty Improvement Recommender System (CLIRS2)
by Katarzyna Anna Tarnowska and Zbigniew Ras
Big Data Cogn. Comput. 2021, 5(1), 4; https://doi.org/10.3390/bdcc5010004 - 19 Jan 2021
Cited by 23 | Viewed by 6919
Abstract
Structured data on customer feedback is becoming more costly and timely to collect and organize. On the other hand, unstructured opinionated data, e.g., in the form of free-text comments, is proliferating and available on public websites, such as social media websites, blogs, forums, [...] Read more.
Structured data on customer feedback is becoming more costly and timely to collect and organize. On the other hand, unstructured opinionated data, e.g., in the form of free-text comments, is proliferating and available on public websites, such as social media websites, blogs, forums, and websites that provide recommendations. This research proposes a novel method to develop a knowledge-based recommender system from unstructured (text) data. The method is based on applying an opinion mining algorithm, extracting aspect-based sentiment score per text item, and transforming text into a structured form. An action rule mining algorithm is applied to the data table constructed from sentiment mining. The proposed application of the method is the problem of improving customer satisfaction ratings. The results obtained from the dataset of customer comments related to the repair services were evaluated with accuracy and coverage. Further, the results were incorporated into the framework of a web-based user-friendly recommender system to advise the business on how to maximally increase their profits by introducing minimal sets of changes in their service. Experiments and evaluation results from comparing the structured data-based version of the system CLIRS (Customer Loyalty Improvement Recommender System) with the unstructured data-based version of the system (CLIRS2) are provided. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing: Feature Papers 2020)
Show Figures

Figure 1

Figure 1
<p>The original procedure for generating data-driven recommendations for customer loyalty improvement based on quantitative and qualitative customer feedback.</p>
Full article ">Figure 2
<p>Interactive web-based visualization showing results of the hierarchical clustering procedure—expanding the current client’s dataset with its semantic neighbors. The semantic neighbors of the current company (here, <span class="html-italic">Client9</span>) are color-coded and numbered in ascending order of similarity measure.</p>
Full article ">Figure 3
<p>Web-based interactive visualization for displaying the system’s recommendations, which are placed on a two-dimensional chart and color-coded according to their attractiveness, as determined by their feasibility and NPS impact.</p>
Full article ">Figure 4
<p>The new format of a short mobile-based customer survey with mostly open-ended questions.</p>
Full article ">Figure 5
<p>Sentiment analysis based procedure for generating recommendations for customer loyalty improvement. The primary step is text mining which is used to transform text data into a structured form, used for further data mining.</p>
Full article ">Figure 6
<p>A hierarchy for aspect words in the service domain.</p>
Full article ">Figure 7
<p>A hierarchy for aspect words in the equipment parts domain.</p>
Full article ">Figure 8
<p>CLIRS2-Customer Loyalty Improvement Recommender System adopted to generate recommendations based on qualitative customer feedback.</p>
Full article ">Figure 9
<p>CLIRS2—step of inspecting the most impactful recommendation and comments associated with the recommendation.</p>
Full article ">
2 pages, 177 KiB  
Editorial
Acknowledgment to Reviewers of Big Data and Cognitive Computing in 2020
by Big Data and Cognitive Computing Editorial Office
Big Data Cogn. Comput. 2021, 5(1), 3; https://doi.org/10.3390/bdcc5010003 - 14 Jan 2021
Cited by 1 | Viewed by 3717
Abstract
Rigorous peer-review is the corner-stone of high-quality academic publishing [...] Full article
24 pages, 1556 KiB  
Review
Forecasting Plant and Crop Disease: An Explorative Study on Current Algorithms
by Gianni Fenu and Francesca Maridina Malloci
Big Data Cogn. Comput. 2021, 5(1), 2; https://doi.org/10.3390/bdcc5010002 - 12 Jan 2021
Cited by 95 | Viewed by 15391
Abstract
Every year, plant diseases cause a significant loss of valuable food crops around the world. The plant and crop disease management practice implemented in order to mitigate damages have changed considerably. Today, through the application of new information and communication technologies, it is [...] Read more.
Every year, plant diseases cause a significant loss of valuable food crops around the world. The plant and crop disease management practice implemented in order to mitigate damages have changed considerably. Today, through the application of new information and communication technologies, it is possible to predict the onset or change in the severity of diseases using modern big data analysis techniques. In this paper, we present an analysis and classification of research studies conducted over the past decade that forecast the onset of disease at a pre-symptomatic stage (i.e., symptoms not visible to the naked eye) or at an early stage. We examine the specific approaches and methods adopted, pre-processing techniques and data used, performance metrics, and expected results, highlighting the issues encountered. The results of the study reveal that this practice is still in its infancy and that many barriers need to be overcome. Full article
Show Figures

Figure 1

Figure 1
<p>Disease triangle.</p>
Full article ">Figure 2
<p>An illustration of how climate, crop growth, and disease models can be combined to produce projections of crop growth stages and disease incidence/severity for different climate change scenarios. Source: [<a href="#B6-BDCC-05-00002" class="html-bibr">6</a>].</p>
Full article ">Figure 3
<p>(<b>a</b>) Number of research publications per year (2010–2020) related to plant and crop disease prediction, which predicted the onset of the disease in a pre-symptomatic (i.e., symptoms not visible to the naked eye) or early stage, recovered by adopting the methodology described in <a href="#sec2dot1-BDCC-05-00002" class="html-sec">Section 2.1</a>; (<b>b</b>) number of citations for each year considered; (<b>c</b>) number of citations for each paper, which have been grouped according to the journal’s H-index.</p>
Full article ">Figure 4
<p>(<b>a</b>) Number of publications based on crop and disease examined; (<b>b</b>) current state of crops and plants explored during the last 10 years in terms of percentage of research papers.</p>
Full article ">Figure 5
<p>Techniques popularly explored in the domain of plant and crop disease prediction models.</p>
Full article ">
24 pages, 2168 KiB  
Review
A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams
by Omar Alghushairy, Raed Alsini, Terence Soule and Xiaogang Ma
Big Data Cogn. Comput. 2021, 5(1), 1; https://doi.org/10.3390/bdcc5010001 - 29 Dec 2020
Cited by 147 | Viewed by 20507
Abstract
Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in [...] Read more.
Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on LOF algorithms. It collects and categorizes existing local outlier detection algorithms and analyzes their characteristics. Furthermore, the paper discusses the advantages and limitations of those algorithms and proposes several promising directions for developing improved local outlier detection methods for data streams. Full article
Show Figures

Figure 1

Figure 1
<p>The types of outliers, where grey points are global outliers, and the red point is a local outlier.</p>
Full article ">Figure 2
<p>The search strategy flowchart for selecting articles.</p>
Full article ">Figure 3
<p>Summary of references used in the literature review for static data [<a href="#B53-BDCC-05-00001" class="html-bibr">53</a>,<a href="#B54-BDCC-05-00001" class="html-bibr">54</a>,<a href="#B55-BDCC-05-00001" class="html-bibr">55</a>,<a href="#B56-BDCC-05-00001" class="html-bibr">56</a>,<a href="#B57-BDCC-05-00001" class="html-bibr">57</a>,<a href="#B58-BDCC-05-00001" class="html-bibr">58</a>,<a href="#B59-BDCC-05-00001" class="html-bibr">59</a>,<a href="#B60-BDCC-05-00001" class="html-bibr">60</a>,<a href="#B61-BDCC-05-00001" class="html-bibr">61</a>,<a href="#B62-BDCC-05-00001" class="html-bibr">62</a>,<a href="#B63-BDCC-05-00001" class="html-bibr">63</a>,<a href="#B64-BDCC-05-00001" class="html-bibr">64</a>,<a href="#B65-BDCC-05-00001" class="html-bibr">65</a>,<a href="#B66-BDCC-05-00001" class="html-bibr">66</a>,<a href="#B67-BDCC-05-00001" class="html-bibr">67</a>,<a href="#B68-BDCC-05-00001" class="html-bibr">68</a>,<a href="#B69-BDCC-05-00001" class="html-bibr">69</a>,<a href="#B70-BDCC-05-00001" class="html-bibr">70</a>,<a href="#B71-BDCC-05-00001" class="html-bibr">71</a>,<a href="#B72-BDCC-05-00001" class="html-bibr">72</a>,<a href="#B73-BDCC-05-00001" class="html-bibr">73</a>,<a href="#B74-BDCC-05-00001" class="html-bibr">74</a>].</p>
Full article ">Figure 4
<p>The reachability distance for different data points <span class="html-italic">p</span> with regard to <span class="html-italic">o</span>, when <span class="html-italic">k</span> equals 5.</p>
Full article ">Figure 5
<p>Comparison between the LOF and the INFLO. For the red data point, LOF will consider the data points in the blue area to be neighbors, which will result in a high outlier value. By contrast, the INFLO will take into account the green data points so that the value of the red data point will be more reasonable and will be more likely to be considered an outlier.</p>
Full article ">Figure 6
<p>Summary of references used in the literature review for streaming data [<a href="#B75-BDCC-05-00001" class="html-bibr">75</a>,<a href="#B76-BDCC-05-00001" class="html-bibr">76</a>,<a href="#B77-BDCC-05-00001" class="html-bibr">77</a>,<a href="#B78-BDCC-05-00001" class="html-bibr">78</a>,<a href="#B79-BDCC-05-00001" class="html-bibr">79</a>,<a href="#B80-BDCC-05-00001" class="html-bibr">80</a>,<a href="#B81-BDCC-05-00001" class="html-bibr">81</a>,<a href="#B82-BDCC-05-00001" class="html-bibr">82</a>,<a href="#B83-BDCC-05-00001" class="html-bibr">83</a>,<a href="#B84-BDCC-05-00001" class="html-bibr">84</a>,<a href="#B85-BDCC-05-00001" class="html-bibr">85</a>,<a href="#B64-BDCC-05-00001" class="html-bibr">86</a>,<a href="#B87-BDCC-05-00001" class="html-bibr">87</a>,<a href="#B88-BDCC-05-00001" class="html-bibr">88</a>,<a href="#B89-BDCC-05-00001" class="html-bibr">89</a>,<a href="#B90-BDCC-05-00001" class="html-bibr">90</a>,<a href="#B91-BDCC-05-00001" class="html-bibr">91</a>,<a href="#B92-BDCC-05-00001" class="html-bibr">92</a>,<a href="#B93-BDCC-05-00001" class="html-bibr">93</a>,<a href="#B94-BDCC-05-00001" class="html-bibr">94</a>,<a href="#B95-BDCC-05-00001" class="html-bibr">95</a>,<a href="#B96-BDCC-05-00001" class="html-bibr">96</a>,<a href="#B97-BDCC-05-00001" class="html-bibr">97</a>,<a href="#B98-BDCC-05-00001" class="html-bibr">98</a>,<a href="#B99-BDCC-05-00001" class="html-bibr">99</a>].</p>
Full article ">Figure 7
<p>Diagram of the overall design and workflow for the methodology.</p>
Full article ">
Previous Issue
Next Issue
Back to TopTop