Big Data and Cognitive Computing

8 pages, 748 KiB

Open AccessArticle

A Deep Learning Model of Perception in Color-Letter Synesthesia

by Joel R. Bock

Big Data Cogn. Comput. 2018, 2(1), 8; https://doi.org/10.3390/bdcc2010008 - 13 Mar 2018

Cited by 1 | Viewed by 8106

Synesthesia is a psychological phenomenon where sensory signals become mixed. Input to one sensory modality produces an experience in a second, unstimulated modality. In “grapheme-color synesthesia”, viewed letters and numbers evoke mental imagery of colors. The study of this condition has implications for [...] Read more.

Synesthesia is a psychological phenomenon where sensory signals become mixed. Input to one sensory modality produces an experience in a second, unstimulated modality. In “grapheme-color synesthesia”, viewed letters and numbers evoke mental imagery of colors. The study of this condition has implications for increasing our understanding of brain architecture and function, language, memory and semantics, and the nature of consciousness. In this work, we propose a novel application of deep learning to model perception in grapheme-color synesthesia. Achromatic letter images, taken from database of handwritten characters, are used to train the model, and to induce computational synesthesia. Results show the model learns to accurately create a colored version of the inducing stimulus, according to a statistical distribution from experiments on a sample population of grapheme-color synesthetes. To the author’s knowledge, this work represents the first model that accurately produces spontaneous, creative mental imagery characteristic of the synesthetic perceptual experience. Experiments in cognitive science have contributed to our understanding of some of the observable behavioral effects of synesthesia, and previous models have outlined neural mechanisms that may account for these observations. A model of synesthesia that generates testable predictions on brain activity and behavior is needed to complement large scale data collection efforts in neuroscience, especially when articulating simple descriptions of cause (stimulus) and effect (behavior). The research and modeling approach reported here provides a framework that begins to address this need. Full article

(This article belongs to the Special Issue Learning with Big Data: Scalable Algorithms and Novel Applications)

► Show Figures

Figure 1

15 pages, 523 KiB

Open AccessArticle

A Multi-Modality Deep Network for Cold-Start Recommendation

by Mingxuan Sun, Fei Li and Jian Zhang

Big Data Cogn. Comput. 2018, 2(1), 7; https://doi.org/10.3390/bdcc2010007 - 5 Mar 2018

Cited by 19 | Viewed by 6696

Abstract

Collaborative filtering (CF) approaches, which provide recommendations based on ratings or purchase history, perform well for users and items with sufficient interactions. However, CF approaches suffer from the cold-start problem for users and items with few ratings. Hybrid recommender systems that combine collaborative [...] Read more.

Collaborative filtering (CF) approaches, which provide recommendations based on ratings or purchase history, perform well for users and items with sufficient interactions. However, CF approaches suffer from the cold-start problem for users and items with few ratings. Hybrid recommender systems that combine collaborative filtering and content-based approaches have been proved as an effective way to alleviate the cold-start issue. Integrating contents from multiple heterogeneous data sources such as reviews and product images is challenging for two reasons. Firstly, mapping contents in different modalities from the original feature space to a joint lower-dimensional space is difficult since they have intrinsically different characteristics and statistical properties, such as sparse texts and dense images. Secondly, most algorithms only use content features as the prior knowledge to improve the estimation of user and item profiles but the ratings do not directly provide feedback to guide feature extraction. To tackle these challenges, we propose a tightly-coupled deep network model for fusing heterogeneous modalities, to avoid tedious feature extraction in specific domains, and to enable two-way information propagation from both content and rating information. Experiments on large-scale Amazon product data in book and movie domains demonstrate the effectiveness of the proposed model for cold-start recommendation. Full article

(This article belongs to the Special Issue Learning with Big Data: Scalable Algorithms and Novel Applications)

► Show Figures

Figure 1

19 pages, 797 KiB

Open AccessArticle

A Rule Extraction Study from SVM on Sentiment Analysis

by Guido Bologna and Yoichi Hayashi

Big Data Cogn. Comput. 2018, 2(1), 6; https://doi.org/10.3390/bdcc2010006 - 2 Mar 2018

Cited by 18 | Viewed by 4884

Abstract

A natural way to determine the knowledge embedded within connectionist models is to generate symbolic rules. Nevertheless, extracting rules from Multi Layer Perceptrons (MLPs) is NP-hard. With the advent of social networks, techniques applied to Sentiment Analysis show a growing interest, but rule [...] Read more.

A natural way to determine the knowledge embedded within connectionist models is to generate symbolic rules. Nevertheless, extracting rules from Multi Layer Perceptrons (MLPs) is NP-hard. With the advent of social networks, techniques applied to Sentiment Analysis show a growing interest, but rule extraction from connectionist models in this context has been rarely performed because of the very high dimensionality of the input space. To fill the gap we present a case study on rule extraction from ensembles of Neural Networks and Support Vector Machines (SVMs), the purpose being the characterization of the complexity of the rules on two particular Sentiment Analysis problems. Our rule extraction method is based on a special Multi Layer Perceptron architecture for which axis-parallel hyperplanes are precisely located. Two datasets representing movie reviews are transformed into Bag-of-Words vectors and learned by ensembles of neural networks and SVMs. Generated rules from ensembles of MLPs are less accurate and less complex than those extracted from SVMs. Moreover, a clear trade-off appears between rules’ accuracy, complexity and covering. For instance, if rules are too complex, less complex rules can be re-extracted by sacrificing to some extent their accuracy. Finally, rules can be viewed as feature detectors in which very often only one word must be present and a longer list of words must be absent. Full article

(This article belongs to the Special Issue Big Data Analytic: From Accuracy to Interpretability)

► Show Figures

Figure 1

Figure 1
Plot of average complexity of rules versus average fidelity (RT-2k problem). Average complexity is the product of average number of rules by average number of antecedents per rule. Full article ">Figure 2
Plot of average complexity of rules versus average fidelity (RT-s problem). Full article ">Figure 3
A DIMLP network creating two discriminative hyperplanes. The activation function of neurons <math display="inline"> <semantics> <msub> <mi>h</mi> <mn>1</mn> </msub> </semantics> </math> and <math display="inline"> <semantics> <msub> <mi>h</mi> <mn>2</mn> </msub> </semantics> </math> is a step function, while for output neuron <math display="inline"> <semantics> <msub> <mi>y</mi> <mn>1</mn> </msub> </semantics> </math> it is a sigmoid. Full article ">Figure 4
Transparency of DIMLP ensembles by majority voting, linear combinations and non-linear combinations. Full article ">Figure 5
A QSVM network with Gaussian kernel. Full article ">

15 pages, 595 KiB

Open AccessArticle

A Machine Learning Approach for Air Quality Prediction: Model Regularization and Optimization

by Dixian Zhu, Changjie Cai, Tianbao Yang and Xun Zhou

Big Data Cogn. Comput. 2018, 2(1), 5; https://doi.org/10.3390/bdcc2010005 - 24 Feb 2018

Cited by 137 | Viewed by 16994

Abstract

In this paper, we tackle air quality forecasting by using machine learning approaches to predict the hourly concentration of air pollutants (e.g., ozone, particle matter (

{PM}_{2.5}

) and sulfur dioxide). Machine learning, as one of the most popular techniques, is able [...] Read more.

In this paper, we tackle air quality forecasting by using machine learning approaches to predict the hourly concentration of air pollutants (e.g., ozone, particle matter (

{PM}_{2.5}

) and sulfur dioxide). Machine learning, as one of the most popular techniques, is able to efficiently train a model on big data by using large-scale optimization algorithms. Although there exist some works applying machine learning to air quality prediction, most of the prior studies are restricted to several-year data and simply train standard regression models (linear or nonlinear) to predict the hourly air pollution concentration. In this work, we propose refined models to predict the hourly air pollution concentration on the basis of meteorological data of previous days by formulating the prediction over 24 h as a multi-task learning (MTL) problem. This enables us to select a good model with different regularization techniques. We propose a useful regularization by enforcing the prediction models of consecutive hours to be close to each other and compare it with several typical regularizations for MTL, including standard Frobenius norm regularization, nuclear norm regularization, and

?_{2, 1}

-norm regularization. Our experiments have showed that the proposed parameter-reducing formulations and consecutive-hour-related regularizations achieve better performance than existing standard regression models and existing regularizations. Full article

(This article belongs to the Special Issue Learning with Big Data: Scalable Algorithms and Novel Applications)

► Show Figures

Figure 1

13 pages, 854 KiB

Open AccessArticle

Reimaging Research Methodology as Data Science

by Ben Kei Daniel

Big Data Cogn. Comput. 2018, 2(1), 4; https://doi.org/10.3390/bdcc2010004 - 12 Feb 2018

Cited by 14 | Viewed by 9199

Abstract

The growing volume of data generated by machines, humans, software applications, sensors and networks, together with the associated complexity of the research environment, requires immediate pedagogical innovations in academic programs on research methodology. This article draws insights from a large-scale research project examining [...] Read more.

The growing volume of data generated by machines, humans, software applications, sensors and networks, together with the associated complexity of the research environment, requires immediate pedagogical innovations in academic programs on research methodology. This article draws insights from a large-scale research project examining current conceptions and practices of academics (n = 144) involved in the teaching of research methods in research-intensive universities in 17 countries. The data was obtained through an online questionnaire. The main findings reveal that a large number of academics involved in the teaching of research methods courses tend to teach the same classes for many years, in the same way, despite the changing nature of data, and complexity of the environment in which research is conducted. Furthermore, those involved in the teaching of research methods courses are predominantly volunteer academics, who tend to view the subject only as an “add-on” to their other teaching duties. It was also noted that universities mainly approach the teaching of research methods courses as a “service” to students and departments, not part of the core curriculum. To deal with the growing changes in data structures, and technology driven research environment, the study recommends institutions to reimage research methodology programs to enable students to develop appropriate competences to deal with the challenges of working with complex and large amounts of data and associated analytics. Full article

(This article belongs to the Special Issue Big Data Analytic: From Accuracy to Interpretability)

► Show Figures

Figure 1

15 pages, 1111 KiB

Open AccessArticle

Big Data Processing and Analytics Platform Architecture for Process Industry Factories

by Martin Sarnovsky, Peter Bednar and Miroslav Smatana

Big Data Cogn. Comput. 2018, 2(1), 3; https://doi.org/10.3390/bdcc2010003 - 26 Jan 2018

Cited by 27 | Viewed by 9572

Abstract

This paper describes the architecture of a cross-sectorial Big Data platform for the process industry domain. The main objective was to design a scalable analytical platform that will support the collection, storage and processing of data from multiple industry domains. Such a platform [...] Read more.

This paper describes the architecture of a cross-sectorial Big Data platform for the process industry domain. The main objective was to design a scalable analytical platform that will support the collection, storage and processing of data from multiple industry domains. Such a platform should be able to connect to the existing environment in the plant and use the data gathered to build predictive functions to optimize the production processes. The analytical platform will contain a development environment with which to build these functions, and a simulation environment to evaluate the models. The platform will be shared among multiple sites from different industry sectors. Cross-sectorial sharing will enable the transfer of knowledge across different domains. During the development, we adopted a user-centered approach to gather requirements from different stakeholders which were used to design architectural models from different viewpoints, from contextual to deployment. The deployed architecture was tested in two process industry domains, one from the aluminium production and the other from the plastic molding industry. Full article

(This article belongs to the Special Issue Big Data Analytic: From Accuracy to Interpretability)

► Show Figures

Figure 1

18 pages, 9322 KiB

Open AccessArticle

The Internet and the Anti-Vaccine Movement: Tracking the 2017 EU Measles Outbreak

by Amaryllis Mavragani and Gabriela Ochoa

Big Data Cogn. Comput. 2018, 2(1), 2; https://doi.org/10.3390/bdcc2010002 - 16 Jan 2018

Cited by 35 | Viewed by 16244

Abstract

In the Internet Era of information overload, how does the individual filter and process available knowledge? In addressing this question, this paper examines the behavioral changes in the online interest in terms related to Measles and the Anti-Vaccine Movement from 2004 to 2017, [...] Read more.

In the Internet Era of information overload, how does the individual filter and process available knowledge? In addressing this question, this paper examines the behavioral changes in the online interest in terms related to Measles and the Anti-Vaccine Movement from 2004 to 2017, in order to identify any relationships between the decrease in immunization percentages, the Anti-Vaccine Movement, and the increased reported Measles cases. The results show that statistically significant positive correlations exist between monthly Measles cases and Google queries in the respective translated terms in most EU28 countries from January 2011 to August 2017. Furthermore, a strong negative correlation (p < 0.01) exists between the online interest in the term ‘Anti Vaccine’ and the Worldwide immunization percentages from 2004 to 2016. The latter could be supportive of previous work suggesting that conspiracist ideation is related to the rejection of scientific propositions. As Measles require the highest immunization percentage out of the vaccine preventable diseases, the 2017 EU outbreak could be the first of several other diseases’ outbreaks or epidemics in the near future should the immunization percentages continue to decrease. Big Data Analytics in general and the analysis of Google queries in specific have been shown to be valuable in addressing health related topics up to this point. Therefore, analyzing the variations and patterns of available online information could assist health officials with the assessment of reported cases, as well as taking the required preventive actions. Full article

(This article belongs to the Special Issue Health Assessment in the Big Data Era)

► Show Figures

Figure 1

1 pages, 139 KiB

Open AccessEditorial

Acknowledgement to Reviewers of BDCC in 2017

by BDCC Editorial Office

Big Data Cogn. Comput. 2018, 2(1), 1; https://doi.org/10.3390/bdcc2010001 - 12 Jan 2018

Viewed by 2826

Abstract

Peer review is an essential part in the publication process, ensuring that BDCC maintains high quality standards for its published papers [...]
Full article

Journal Menu

Journal Browser

Big Data Cogn. Comput., Volume 2, Issue 1 (March 2018) – 8 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI