[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
entropy-logo

Journal Browser

Journal Browser

Machine Learning and Entropy: Discover Unknown Unknowns in Complex Data Sets

A special issue of Entropy (ISSN 1099-4300).

Deadline for manuscript submissions: closed (30 January 2016) | Viewed by 97560

Special Issue Editor


E-Mail Website1 Website2
Guest Editor
1. Human-Centered AI Lab, Institute of Forest Engineering, Department of Forest and Soil Sciences, University of Natural Resources and Life Sciences, 1190 Vienna, Austria
2. xAI Lab, Alberta Machine Intelligence Institute, University of Alberta, Edmonton, AB T5J 3B1, Canada
Interests: artificial intelligence (AI); machine learning (ML); explainable AI (xAI); causability; decision support systems; medical AI; health informatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In the real world, we are confronted, not only with complex and high-dimensional data sets, but also usually with noisy, incomplete, and uncertain data, where the application of traditional methods of knowledge discovery and data mining always entail the danger of modeling artifacts. Originally, information entropy was introduced by Shannon (1949), as a measure of uncertainty in data. Up to the present, many different types of entropy methods with a large number of different purposes and possible application areas have emerged. In this Special Issue we are seeking papers discussing advances in the application of learning algorithms and entropy for use in knowledge discovery and data mining, to discover unknowns in complex data sets, e.g., for biomarker discovery in biomedical data sets.

Prof. Dr. Andreas Holzinger
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Machine Learning
  • Knowledge Discovery
  • Entropy-based Data Mining

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

1057 KiB  
Article
Distribution Entropy Boosted VLAD for Image Retrieval
by Qiuzhan Zhou, Cheng Wang, Pingping Liu, Qingliang Li, Yeran Wang and Shuozhang Chen
Entropy 2016, 18(8), 311; https://doi.org/10.3390/e18080311 - 24 Aug 2016
Cited by 11 | Viewed by 5345
Abstract
Several recent works have shown that aggregating local descriptors to generate global image representation results in great efficiency for retrieval and classification tasks. The most popular method following this approach is VLAD (Vector of Locally Aggregated Descriptors). We present a novel image presentation [...] Read more.
Several recent works have shown that aggregating local descriptors to generate global image representation results in great efficiency for retrieval and classification tasks. The most popular method following this approach is VLAD (Vector of Locally Aggregated Descriptors). We present a novel image presentation called Distribution Entropy Boosted VLAD (EVLAD), which extends the original vector of locally aggregated descriptors. The original VLAD adopts only residuals to depict the distribution information of every visual word and neglects other statistical clues, so its discriminative power is limited. To address this issue, this paper proposes the use of the distribution entropy of each cluster as supplementary information to enhance the search accuracy. To fuse two feature sources organically, two fusion methods after a new normalization stage meeting power law are also investigated, which generate identically sized and double-sized vectors as the original VLAD. We validate our approach in image retrieval and image classification experiments. Experimental results demonstrate the effectiveness of our algorithm. Full article
Show Figures

Figure 1

Figure 1
<p>The demonstration of VLAD shortcoming. The point sets are quantized into two clusters. C1 and C2 are the centers of the clusters. C1 and C2 have identical residuals but possess different distribution entropies; it can be observed that the points of C2 are distributed differently to those of C1.</p>
Full article ">Figure 2
<p>Examples of images retrieved from (<b>a</b>) Holiday, (<b>b</b>) UKB and (<b>c</b>) Oxford datasets. For each query (left), results obtained by the original VLAD (the first row) and distribution entropy boosted VLAD (the second row) are demonstrated. The green border indicates that the retrieval result meets the ground truth.</p>
Full article ">
4576 KiB  
Article
Voice Activity Detection Using Fuzzy Entropy and Support Vector Machine
by R. Johny Elton, P. Vasuki and J. Mohanalin
Entropy 2016, 18(8), 298; https://doi.org/10.3390/e18080298 - 12 Aug 2016
Cited by 21 | Viewed by 7557
Abstract
This paper proposes support vector machine (SVM) based voice activity detection using FuzzyEn to improve detection performance under noisy conditions. The proposed voice activity detection (VAD) uses fuzzy entropy (FuzzyEn) as a feature extracted from noise-reduced speech signals to train an SVM model [...] Read more.
This paper proposes support vector machine (SVM) based voice activity detection using FuzzyEn to improve detection performance under noisy conditions. The proposed voice activity detection (VAD) uses fuzzy entropy (FuzzyEn) as a feature extracted from noise-reduced speech signals to train an SVM model for speech/non-speech classification. The proposed VAD method was tested by conducting various experiments by adding real background noises of different signal-to-noise ratios (SNR) ranging from −10 dB to 10 dB to actual speech signals collected from the TIMIT database. The analysis proves that FuzzyEn feature shows better results in discriminating noise and corrupted noisy speech. The efficacy of the SVM classifier was validated using 10-fold cross validation. Furthermore, the results obtained by the proposed method was compared with those of previous standardized VAD algorithms as well as recently developed methods. Performance comparison suggests that the proposed method is proven to be more efficient in detecting speech under various noisy environments with an accuracy of 93.29%, and the FuzzyEn feature detects speech efficiently even at low SNR levels. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Block diagram for the proposed fuzzy entropy and support vector machine based voice activity detection (VAD).</p>
Full article ">Figure 2
<p>Speech signals (<b>a</b>–<b>c</b>) with its additive noise [0, 5 dB] with amplitude along vertical axis and time (s) along horizontal axis; its corresponding histograms (<b>d</b>–<b>f</b>), with amplitude along horizontal axis and frequency along the vertical axis.</p>
Full article ">Figure 3
<p>Exponential function (exp(−<span class="html-italic">d<sup>n</sup>/r</span>)) for different parameter selection. (<b>a</b>) Exponential membership function fixed with <span class="html-italic">n</span> = 2, and <span class="html-italic">r</span> varied from 0.1 to 0.3; (<b>b</b>) exponential membership function fixed with <span class="html-italic">r</span> = 0.2 and <span class="html-italic">n</span> varied from 1 to 5.</p>
Full article ">Figure 4
<p>Average computing time of the SVM classifier for the various noises such as airport, babble, car, and train.</p>
Full article ">Figure 5
<p>Average of (<b>a</b>) sensitivity (<b>b</b>) specificity and (<b>c</b>) F-Measure for the proposed FuzyyEn based VAD for the various noises such as airport, babble, car, and train.</p>
Full article ">Figure 6
<p>Standard deviation of (<b>a</b>) sensitivity (<b>b</b>) specificity and (<b>c</b>) F-Measure for the proposed FuzzyEn based VAD for the various noises such as airport, babble, car, and train.</p>
Full article ">Figure 7
<p>Accuracy comparisons for G.729, Sohn [<a href="#B36-entropy-18-00298" class="html-bibr">36</a>], Ramirez [<a href="#B13-entropy-18-00298" class="html-bibr">13</a>], and proposed FE-SVM based VAD for SNR values from −10 dB to 10 dB (<b>a</b>) airport (<b>b</b>) babble (<b>c</b>) car and (<b>d</b>) train.</p>
Full article ">Figure 8
<p>HR<sub>s</sub> comparisons for three VAD methods with our proposed method for five different SNR levels {−10 dB, −5 dB, 0 dB, 5 dB, 10 dB} (<b>a</b>) Airport noise—HR<sub>s</sub> (<b>b</b>) Babble noise—HR<sub>s</sub> (<b>c</b>) Car noise—HR<sub>s</sub> (<b>d</b>) Train noise—HR<sub>s</sub>.</p>
Full article ">Figure 9
<p>HR<sub>ns</sub> comparisons for three VAD methods with our proposed method for five different SNR levels {−10 dB, −5 dB, 0 dB, 5 dB, 10 dB} (<b>a</b>) Airport noise—HR<sub>ns</sub> (<b>b</b>) Babble noise—HR<sub>ns</sub> (<b>c</b>) Car noise—HR<sub>ns</sub> (<b>d</b>) Train noise—HR<sub>ns</sub>.</p>
Full article ">
2969 KiB  
Article
How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?
by Piotr Szymański, Tomasz Kajdanowicz and Kristian Kersting
Entropy 2016, 18(8), 282; https://doi.org/10.3390/e18080282 - 30 Jul 2016
Cited by 42 | Viewed by 12453
Abstract
We propose using five data-driven community detection approaches from social networks to partition the label space in the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd. We evaluate modularity-maximizing using [...] Read more.
We propose using five data-driven community detection approaches from social networks to partition the label space in the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd. We evaluate modularity-maximizing using fast greedy and leading eigenvector approximations, infomap, walktrap and label propagation algorithms. For this purpose, we propose to construct a label co-occurrence graph (both weighted and unweighted versions) based on training data and perform community detection to partition the label set. Then, each partition constitutes a label space for separate multi-label classification sub-problems. As a result, we obtain an ensemble of multi-label classifiers that jointly covers the whole label space. Based on the binary relevance and label powerset classification methods, we compare community detection methods to label space divisions against random baselines on 12 benchmark datasets over five evaluation measures. We discover that data-driven approaches are more efficient and more likely to outperform RAkELd than binary relevance or label powerset is, in every evaluated measure. For all measures, apart from Hamming loss, data-driven approaches are significantly better than RAkELd ( α = 0 . 05 ), and at least one data-driven approach is more likely to outperform RAkELd than a priori methods in the case of RAkELd’s best performance. This is the largest RAkELd evaluation published to date with 250 samplings per value for 10 values of RAkELd parameter k on 12 datasets published to date. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Statistical evaluation of the method’s performance in terms of micro-averaged F1 score. Gray, baseline; white, statistically identical to the baseline; otherwise, the <span class="html-italic">p</span>-value of the hypothesis that a method performs better than the baseline.</p>
Full article ">Figure 2
<p>Histogram of the methods’ likelihood of performing better than RA<span class="html-italic">k</span>EL<span class="html-italic">d</span> in the micro-averaged F1 score aggregated over datasets.</p>
Full article ">Figure 3
<p>Statistical evaluation of the method’s performance in terms of macro-averaged F1 score. Gray, baseline; white, statistically identical to baseline; otherwise, the <span class="html-italic">p</span>-value of the hypothesis that a method performs better than the baseline.</p>
Full article ">Figure 4
<p>Histogram of the methods’ likelihood of performing better than RA<span class="html-italic">k</span>EL<span class="html-italic">d</span> in the macro-averaged F1 score aggregated over datasets.</p>
Full article ">Figure 5
<p>Statistical evaluation of the method’s performance in terms of Jaccard similarity score. Gray, baseline; white, statistically identical to baseline; otherwise, the <span class="html-italic">p</span>-value of hypothesis that a method performs better than the baseline.</p>
Full article ">Figure 6
<p>Histogram of the methods’ likelihood of performing better than RA<span class="html-italic">k</span>EL<span class="html-italic">d</span> in subset accuracy aggregated over datasets.</p>
Full article ">Figure 7
<p>Statistical evaluation of the method’s performance in terms of micro-averaged F1 score. Gray, baseline; white, statistically identical to baseline, otherwise; the <span class="html-italic">p</span>-value of the hypothesis that a method performs better than the baseline.</p>
Full article ">Figure 8
<p>Histogram of the methods’ likelihood of performing better than RA<span class="html-italic">k</span>EL<span class="html-italic">d</span> in Jaccard similarity aggregated over datasets.</p>
Full article ">Figure 9
<p>Histogram of the methods’ likelihood of performing better than RA<span class="html-italic">k</span>EL<span class="html-italic">d</span> in Hamming loss similarity aggregated over datasets.</p>
Full article ">Figure 10
<p>Efficiency of fast greedy modularity maximization data-driven approach against RA<span class="html-italic">k</span>EL<span class="html-italic">d</span>.</p>
Full article ">Figure 11
<p>Efficiency of the infomap greedy data-driven approach against RA<span class="html-italic">k</span>EL<span class="html-italic">d</span>.</p>
Full article ">Figure 12
<p>Efficiency of the label propagation data-driven approach against RA<span class="html-italic">k</span>EL<span class="html-italic">d</span>.</p>
Full article ">Figure 13
<p>Efficiency of the leading eigenvector modularity maximization data-driven approach against RA<span class="html-italic">k</span>EL<span class="html-italic">d</span>.</p>
Full article ">Figure 14
<p>Efficiency of the walktrap data-driven approach against RA<span class="html-italic">k</span>EL<span class="html-italic">d</span>.</p>
Full article ">
604 KiB  
Article
A PUT-Based Approach to Automatically Extracting Quantities and Generating Final Answers for Numerical Attributes
by Yaqing Liu, Lidong Wang, Rong Chen, Yingjie Song and Yalin Cai
Entropy 2016, 18(6), 235; https://doi.org/10.3390/e18060235 - 22 Jun 2016
Cited by 5 | Viewed by 4694
Abstract
Automatically extracting quantities and generating final answers for numerical attributes is very useful in many occasions, including question answering, image processing, human-computer interaction, etc. A common approach is to learn linguistics templates or wrappers and employ some algorithm or model to generate a [...] Read more.
Automatically extracting quantities and generating final answers for numerical attributes is very useful in many occasions, including question answering, image processing, human-computer interaction, etc. A common approach is to learn linguistics templates or wrappers and employ some algorithm or model to generate a final answer. However, building linguistics templates or wrappers is a tough task for builders. In addition, linguistics templates or wrappers are domain-dependent. To make the builder escape from building linguistics templates or wrappers, we propose a new approach to final answer generation based on Predicates-Units Table (PUT), a mini domain-independent knowledge base. It is deserved to point out that, in the following cases, quantities are not represented well. Quantities are absent of units. Quantities are perhaps wrong for a given question. Even if all of them are represented well, their units are perhaps inconsistent. These cases have a strong impact on final answer solving. One thousand nine hundred twenty-six real queries are employed to test the proposed method, and the experimental results show that the average correctness ratio of our approach is 87.1%. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The framework for final answer solving.</p>
Full article ">Figure 2
<p>The direct-viewing chart of correctness ratio of queries.</p>
Full article ">
1897 KiB  
Article
Stimuli-Magnitude-Adaptive Sample Selection for Data-Driven Haptic Modeling
by Arsen Abdulali, Waseem Hassan and Seokhee Jeon
Entropy 2016, 18(6), 222; https://doi.org/10.3390/e18060222 - 7 Jun 2016
Cited by 9 | Viewed by 5719
Abstract
Data-driven haptic modeling is an emerging technique where contact dynamics are simulated and interpolated based on a generic input-output matching model identified by data sensed from interaction with target physical objects. In data-driven modeling, selecting representative samples from a large set of data [...] Read more.
Data-driven haptic modeling is an emerging technique where contact dynamics are simulated and interpolated based on a generic input-output matching model identified by data sensed from interaction with target physical objects. In data-driven modeling, selecting representative samples from a large set of data in a way that they can efficiently and accurately describe the whole dataset has been a long standing problem. This paper presents a new algorithm for the sample selection where the variances of output are observed for selecting representative input-output samples in order to ensure the quality of output prediction. The main idea is that representative pairs of input-output are chosen so that the ratio of the standard deviation to the mean of the corresponding output group does not exceed an application-dependent threshold. This output- and standard deviation-based sample selection is very effective in applications where the variance or relative error of the output should be kept within a certain threshold. This threshold is used for partitioning the input space using Binary Space Partitioning-tree (BSP-tree) and k-means algorithms. We apply the new approach to data-driven haptic modeling scenario where the relative error of the output prediction result should be less than a perceptual threshold. For evaluation, the proposed algorithm is compared to two state-of-the-art sample selection algorithms for regression tasks. Four kinds of haptic related behavior–force datasets are tested. The results showed that the proposed algorithm outperformed the others in terms of output-approximation quality and computational complexity. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Example of an input-output relationship where a single group of inputs forms multiple distinct groups in output space.</p>
Full article ">Figure 2
<p>The data collection setup and the samples for dataset extraction. (<b>a</b>) data collection setup: <b>A</b> is the end effector for deformable tools. <b>B</b> is the end effector for deformable object with a rigid tool; (<b>b</b>) the four samples used for establishing the datasets. The left two are the deformable tools; a spoon and a fork. Both are made of elastic material. The right two are the deformable mock-ups made of silicone. Mock-up-1 has a harder inclusion inside.</p>
Full article ">Figure 3
<p>Input–output relationship and thresholds for low output magnitude description. (<b>a</b>) average root-mean-square error (RMSE) <span class="html-italic">vs.</span> tau; (<b>b</b>) average mean-absolute-percentage error (MAPE) <span class="html-italic">vs.</span> tau; (<b>c</b>) number of selected samples <span class="html-italic">vs.</span> tau.</p>
Full article ">Figure 4
<p>Mean plus standard deviation of the relative force magnitude error. (<b>a</b>) spoon; (<b>b</b>) fork; (<b>c</b>) Mock-up-1; (<b>d</b>) Mock-up-2.</p>
Full article ">Figure 5
<p>Mean plus standard deviation of the absolute force magnitude error. (<b>a</b>) spoon; (<b>b</b>) fork; (<b>c</b>) Mock-up-1; (<b>d</b>) Mock-up-2.</p>
Full article ">
700 KiB  
Article
A Conjecture Regarding the Extremal Values of Graph Entropy Based on Degree Powers
by Kinkar Chandra Das and Matthias Dehmer
Entropy 2016, 18(5), 183; https://doi.org/10.3390/e18050183 - 13 May 2016
Cited by 7 | Viewed by 4514
Abstract
Many graph invariants have been used for the construction of entropy-based measures to characterize the structure of complex networks. The starting point has been always based on assigning a probability distribution to a network when using Shannon’s entropy. In particular, Cao et al. [...] Read more.
Many graph invariants have been used for the construction of entropy-based measures to characterize the structure of complex networks. The starting point has been always based on assigning a probability distribution to a network when using Shannon’s entropy. In particular, Cao et al. (2014 and 2015) defined special graph entropy measures which are based on degrees powers. In this paper, we obtain some lower and upper bounds for these measures and characterize extremal graphs. Moreover we resolve one part of a conjecture stated by Cao et al. Full article
4680 KiB  
Article
Estimation of Tsunami Bore Forces on a Coastal Bridge Using an Extreme Learning Machine
by Iman Mazinani, Zubaidah Binti Ismail, Shahaboddin Shamshirband, Ahmad Mustafa Hashim, Marjan Mansourvar and Erfan Zalnezhad
Entropy 2016, 18(5), 167; https://doi.org/10.3390/e18050167 - 28 Apr 2016
Cited by 19 | Viewed by 6572
Abstract
This paper proposes a procedure to estimate tsunami wave forces on coastal bridges through a novel method based on Extreme Learning Machine (ELM) and laboratory experiments. This research included three water depths, ten wave heights, and four bridge models with a variety of [...] Read more.
This paper proposes a procedure to estimate tsunami wave forces on coastal bridges through a novel method based on Extreme Learning Machine (ELM) and laboratory experiments. This research included three water depths, ten wave heights, and four bridge models with a variety of girders providing a total of 120 cases. The research was designed and adapted to estimate tsunami bore forces including horizontal force, vertical uplift and overturning moment on a coastal bridge. The experiments were carried out on 1:40 scaled concrete bridge models in a wave flume with dimensions of 24 m × 1.5 m × 2 m. Two six-axis load cells and four pressure sensors were installed to the base plate to measure forces. In the numerical procedure, estimation and prediction results of the ELM model were compared with Genetic Programming (GP) and Artificial Neural Networks (ANNs) models. The experimental results showed an improvement in predictive accuracy, and capability of generalization could be achieved by the ELM approach in comparison with GP and ANN. Moreover, results indicated that the ELM models developed could be used with confidence for further work on formulating novel model predictive strategy for tsunami bore forces on a coastal bridge. The experimental results indicated that the new algorithm could produce good generalization performance in most cases and could learn thousands of times faster than conventional popular learning algorithms. Therefore, it can be conclusively obtained that utilization of ELM is certainly developing as an alternative approach to estimate the tsunami bore forces on a coastal bridge. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Damaged piers of the JR Rail Viaduct at the Tsuya River (photo: S. Dashti); (<b>b</b>) Damages to the deck and piers of Utatsu Bridge (Photo: Kenji Kosa).</p>
Full article ">Figure 2
<p>Wave flume and model installation in the flume.</p>
Full article ">Figure 3
<p>Bridge model dimension and the positive direction of drag force, uplift and overturning moment. (<b>a</b>) Model A3; (<b>b</b>) Model A4; (<b>c</b>) Model A5; (<b>d</b>) Model A6.</p>
Full article ">Figure 3 Cont.
<p>Bridge model dimension and the positive direction of drag force, uplift and overturning moment. (<b>a</b>) Model A3; (<b>b</b>) Model A4; (<b>c</b>) Model A5; (<b>d</b>) Model A6.</p>
Full article ">Figure 4
<p>Experimental set up in the flume (<b>a</b>) side view (<b>b</b>) plan view.</p>
Full article ">Figure 5
<p>The structure of SLFN.</p>
Full article ">Figure 6
<p>Scatter plots of actual and predicted values of tsunami bore force FX on a coastal bridge using (<b>a</b>) ELM, (<b>b</b>) GP and (<b>c</b>) ANN method.</p>
Full article ">Figure 7
<p>Scatter plots of actual and predicted values of tsunami bore force FZ on a coastal bridge using (<b>a</b>) ELM, (<b>b</b>) GP and (<b>c</b>) ANN method.</p>
Full article ">Figure 8
<p>Scatter plots of actual and predicted values of tsunami bore moment MX on a coastal bridge using (<b>a</b>) ELM, (<b>b</b>) GP and (<b>c</b>) ANN method.</p>
Full article ">
396 KiB  
Article
Finding Influential Users in Social Media Using Association Rule Learning
by Fredrik Erlandsson, Piotr Bródka, Anton Borg and Henric Johnson
Entropy 2016, 18(5), 164; https://doi.org/10.3390/e18050164 - 27 Apr 2016
Cited by 74 | Viewed by 9261
Abstract
Influential users play an important role in online social networks since users tend to have an impact on one other. Therefore, the proposed work analyzes users and their behavior in order to identify influential users and predict user participation. Normally, the success of [...] Read more.
Influential users play an important role in online social networks since users tend to have an impact on one other. Therefore, the proposed work analyzes users and their behavior in order to identify influential users and predict user participation. Normally, the success of a social media site is dependent on the activity level of the participating users. For both online social networking sites and individual users, it is of interest to find out if a topic will be interesting or not. In this article, we propose association learning to detect relationships between users. In order to verify the findings, several experiments were executed based on social network analysis, in which the most influential users identified from association rule learning were compared to the results from Degree Centrality and Page Rank Centrality. The results clearly indicate that it is possible to identify the most influential users using association rule learning. In addition, the results also indicate a lower execution time compared to state-of-the-art methods. Full article
Show Figures

Figure 1

Figure 1
<p>Combined plot of number of occurrence of each item-set (Frequency) with respect to number of users in the rule (Length). The upper and right axis illustrates histograms of the respective distributions.</p>
Full article ">Figure 2
<p>Distribution of values in learned association rules. (<b>a</b>) support distribution; (<b>b</b>) confidence distribution; (<b>c</b>) lift distribution; (<b>d</b>) conviction distribution.</p>
Full article ">Figure 3
<p>Distribution of posts created by top users over 108 sampled pages.</p>
Full article ">Figure 4
<p>Execution time for different social network analysis methods.</p>
Full article ">
6825 KiB  
Article
An Informed Framework for Training Classifiers from Social Media
by Dong Seon Cheng and Sami Abduljalil Abdulhak
Entropy 2016, 18(4), 130; https://doi.org/10.3390/e18040130 - 9 Apr 2016
Cited by 1 | Viewed by 4689
Abstract
Extracting information from social media has become a major focus of companies and researchers in recent years. Aside from the study of the social aspects, it has also been found feasible to exploit the collaborative strength of crowds to help solve classical machine [...] Read more.
Extracting information from social media has become a major focus of companies and researchers in recent years. Aside from the study of the social aspects, it has also been found feasible to exploit the collaborative strength of crowds to help solve classical machine learning problems like object recognition. In this work, we focus on the generally underappreciated problem of building effective datasets for training classifiers by automatically assembling data from social media. We detail some of the challenges of this approach and outline a framework that uses expanded search queries to retrieve more qualified data. In particular, we concentrate on collaboratively tagged media on the social platform Flickr, and on the problem of image classification to evaluate our approach. Finally, we describe a novel entropy-based method to incorporate an information-theoretic principle to guide our framework. Experimental validation against well-known public datasets shows the viability of this approach and marks an improvement over the state of the art in terms of simplicity and performance. Full article
Show Figures

Figure 1

Figure 1
<p>Montage of a few images returned by a simple search for “sofa” on Flickr. Most of these are not representative of the sofa class, either because they are marginally or loosely related (e.g., people sitting on sofas, or sofas present in the background).</p>
Full article ">Figure 2
<p>Graph of the gains for the entropy-based selection strategy against the baseline.</p>
Full article ">Figure 3
<p>Charts of the relative importance of the selected tags based on the entropy value <math display="inline"> <semantics> <msub> <mi>h</mi> <mi>i</mi> </msub> </semantics> </math> for the classes <span class="html-italic">motorbike</span>, <span class="html-italic">cow</span>, <span class="html-italic">person</span>, and <span class="html-italic">sheep</span>.</p>
Full article ">
683 KiB  
Article
The Effect of Threshold Values and Weighting Factors on the Association between Entropy Measures and Mortality after Myocardial Infarction in the Cardiac Arrhythmia Suppression Trial (CAST)
by Christopher Mayer, Martin Bachler, Andreas Holzinger, Phyllis K. Stein and Siegfried Wassertheurer
Entropy 2016, 18(4), 129; https://doi.org/10.3390/e18040129 - 8 Apr 2016
Cited by 19 | Viewed by 6054
Abstract
Heart rate variability (HRV) is a non-invasive measurement based on the intervals between normal heart beats that characterize cardiac autonomic function. Decreased HRV is associated with increased risk of cardiovascular events. Characterizing HRV using only moment statistics fails to capture abnormalities in regulatory [...] Read more.
Heart rate variability (HRV) is a non-invasive measurement based on the intervals between normal heart beats that characterize cardiac autonomic function. Decreased HRV is associated with increased risk of cardiovascular events. Characterizing HRV using only moment statistics fails to capture abnormalities in regulatory function that are important aspects of disease risk. Thus, entropy measures are a promising approach to quantify HRV for risk stratification. The purpose of this study was to investigate this potential for approximate, corrected approximate, sample, fuzzy, and fuzzy measure entropy and its dependency on the parameter selection. Recently, published parameter sets and further parameter combinations were investigated. Heart rate data were obtained from the "Cardiac Arrhythmia Suppression Trial (CAST) RR Interval Sub-Study Database" (Physionet). Corresponding outcomes and clinical data were provided by one of the investigators. The use of previously-reported parameter sets on the pre-treatment data did not significantly add to the identification of patients at risk for cardiovascular death on follow-up. After arrhythmia suppression treatment, several parameter sets predicted outcomes for all patients and patients without coronary artery bypass grafting (CABG). The strongest results were seen using the threshold parameter as a multiple of the data’s standard deviation ( r = 0 . 2 · σ ). Approximate and sample entropy provided significant hazard ratios for patients without CABG and without diabetes for an entropy maximizing threshold approximation. Additional parameter combinations did not improve the results for pre-treatment data. The results of this study illustrate the influence of parameter selection on entropy measures’ potential for cardiovascular risk stratification and support the potential use of entropy measures in future studies. Full article
Show Figures

Figure 1

Figure 1
<p>Kaplan–Meier survival curves according to risk groups based on <math display="inline"> <semantics> <mi>ApEn</mi> </semantics> </math> for post-treatment data (see parameter set No. 1 in <a href="#entropy-18-00129-t002" class="html-table">Table 2</a>); for all patients (<b>A</b>), for all patients w/o CABG (<b>B</b>) and w/o CABG and DM (<b>C</b>).</p>
Full article ">Figure 2
<p>Significance of predictive values of entropy measures for different choices of <span class="html-italic">r</span> (multiples of <math display="inline"> <semantics> <msub> <mi>r</mi> <mi>Chon</mi> </msub> </semantics> </math>); parameters: <math display="inline"> <semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>1200</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <mi>n</mi> <mo>=</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>=</mo> <mn>2</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <msub> <mi>n</mi> <mi>F</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics> </math>; HRV data at baseline (<b>A,B,C</b>) and after treatment (<b>D,E,F</b>); for all patients (<b>A,D</b>), for all patients w/o CABG (<b>B,E</b>) and w/o CABG and DM (<b>C,F</b>). <span class="html-italic">p</span> = 0.05 marks the threshold of statistical significance.</p>
Full article ">Figure 3
<p>Significance of predictive values of entropy measures for different choices of <span class="html-italic">r</span> (multiples of <span class="html-italic">σ</span>); parameters: <math display="inline"> <semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>1200</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <mi>n</mi> <mo>=</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <msub> <mi>n</mi> <mi>F</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics> </math>; HRV data at baseline (<b>A,B,C</b>) and after treatment (<b>D,E,F</b>); for all patients (<b>A,D</b>), for all patients w/o CABG (<b>B,E</b>) and w/o CABG and DM (<b>C,F</b>). <span class="html-italic">p</span> = 0.05 marks the threshold of statistical significance.</p>
Full article ">Figure 4
<p>Significance of predictive values of <math display="inline"> <semantics> <mi>FuzzyMEn</mi> </semantics> </math> for different choices of <math display="inline"> <semantics> <msub> <mi>r</mi> <mi>L</mi> </msub> </semantics> </math> and <math display="inline"> <semantics> <msub> <mi>r</mi> <mi>F</mi> </msub> </semantics> </math> (multiples of <math display="inline"> <semantics> <msub> <mi>r</mi> <mi>Chon</mi> </msub> </semantics> </math>); parameters: <math display="inline"> <semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>1200</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <mi>n</mi> <mo>=</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>=</mo> <mn>2</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <msub> <mi>n</mi> <mi>F</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics> </math>; HRV data at baseline (<b>A,B,C</b>) and after treatment (<b>D,E,F</b>); for all patients (<b>A,D</b>), for all patients w/o CABG (<b>B,E</b>) and w/o CABG and DM (<b>C,F</b>).</p>
Full article ">Figure 5
<p>Significance of predictive values of <math display="inline"> <semantics> <mi>FuzzyMEn</mi> </semantics> </math> for different choices of <math display="inline"> <semantics> <msub> <mi>r</mi> <mi>L</mi> </msub> </semantics> </math> and <math display="inline"> <semantics> <msub> <mi>r</mi> <mi>F</mi> </msub> </semantics> </math> (multiples of <span class="html-italic">σ</span>); parameters: <math display="inline"> <semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <mi>N</mi> <mo>=</mo> <mn>1200</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <mi>n</mi> <mo>=</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>=</mo> <mn>1</mn> </mrow> </semantics> </math>, <math display="inline"> <semantics> <mrow> <msub> <mi>n</mi> <mi>F</mi> </msub> <mo>=</mo> <mn>3</mn> </mrow> </semantics> </math>; HRV data at baseline (<b>A,B,C</b>) and after treatment (<b>D,E,F</b>); for all patients (<b>A,D</b>), for all patients w/o CABG (<b>B,E</b>) and w/o CABG and DM (<b>C,F</b>).</p>
Full article ">
2212 KiB  
Article
Feature Selection of Power Quality Disturbance Signals with an Entropy-Importance-Based Random Forest
by Nantian Huang, Guobo Lu, Guowei Cai, Dianguo Xu, Jiafeng Xu, Fuqing Li and Liying Zhang
Entropy 2016, 18(2), 44; https://doi.org/10.3390/e18020044 - 28 Jan 2016
Cited by 39 | Viewed by 6924
Abstract
Power quality signal feature selection is an effective method to improve the accuracy and efficiency of power quality (PQ) disturbance classification. In this paper, an entropy-importance (EnI)-based random forest (RF) model for PQ feature selection and disturbance classification is proposed. Firstly, 35 kinds [...] Read more.
Power quality signal feature selection is an effective method to improve the accuracy and efficiency of power quality (PQ) disturbance classification. In this paper, an entropy-importance (EnI)-based random forest (RF) model for PQ feature selection and disturbance classification is proposed. Firstly, 35 kinds of signal features extracted from S-transform (ST) with random noise are used as the original input feature vector of RF classifier to recognize 15 kinds of PQ signals with six kinds of complex disturbance. During the RF training process, the classification ability of different features is quantified by EnI. Secondly, without considering the features with zero EnI, the optimal perturbation feature subset is obtained by applying the sequential forward search (SFS) method which considers the classification accuracy and feature dimension. Then, the reconstructed RF classifier is applied to identify disturbances. According to the simulation results, the classification accuracy is higher than that of other classifiers, and the feature selection effect of the new approach is better than SFS and sequential backward search (SBS) without EnI. With the same feature subset, the new method can maintain a classification accuracy above 99.7% under the condition of 30 dB or above, and the accuracy under 20 dB is 96.8%. Full article
Show Figures

Figure 1

Figure 1
<p>Flow diagram of the RF based classification.</p>
Full article ">Figure 2
<p>Flow diagram of the new feature selection method.</p>
Full article ">Figure 3
<p>(<b>a</b>) EnI value of features; (<b>b</b>) GiI value of features.</p>
Full article ">Figure 4
<p>(<b>a</b>) Scatter plot of F5, F22 and F25; (<b>b</b>) Scatter plot of F4 and F5; (<b>c</b>) Scatter plot of F4 and F22.</p>
Full article ">Figure 5
<p>(<b>a</b>) Classification accuracy of different feature subsets obtained from EnI method; (<b>b</b>) Classification accuracy of different feature subsets obtained from GiI method.</p>
Full article ">Figure 6
<p>(<b>a</b>) Training error of different feature subsets obtained from EnI method; (<b>b</b>) Train error of different feature subsets obtained from GiI method.</p>
Full article ">Figure 7
<p>The normalized time of different selected feature number.</p>
Full article ">Figure 8
<p>(<b>a</b>) Classification error of different scale of RF of EnI method; (<b>b</b>) Classification error of different scale of RF of GiI method.</p>
Full article ">
2211 KiB  
Article
Using Multidimensional ADTPE and SVM for Optical Modulation Real-Time Recognition
by Junyu Wei, Zhiping Huang, Shaojing Su and Zhen Zuo
Entropy 2016, 18(1), 30; https://doi.org/10.3390/e18010030 - 16 Jan 2016
Cited by 6 | Viewed by 5653
Abstract
Based on the feature extraction of multidimensional asynchronous delay-tap plot entropy (ADTPE) and multiclass classification of support vector machine (SVM), we propose a method for recognition of multiple optical modulation formats and various data rates. We firstly present the algorithm of multidimensional ADTPE, [...] Read more.
Based on the feature extraction of multidimensional asynchronous delay-tap plot entropy (ADTPE) and multiclass classification of support vector machine (SVM), we propose a method for recognition of multiple optical modulation formats and various data rates. We firstly present the algorithm of multidimensional ADTPE, which is extracted from asynchronous delay sampling pairs of modulated optical signal. Then, a multiclass SVM is utilized for fast and accurate classification of several widely-used optical modulation formats. In addition, a simple real-time recognition scheme is designed to reduce the computation time. Compared to the existing method based on asynchronous delay-tap plot (ADTP), the theoretical analysis and simulation results show that our recognition method can effectively enhance the tolerance of transmission impairments, obtaining relatively high accuracy. Finally, it is further demonstrated that the proposed method can be integrated in an optical transport network (OTN) with flexible expansion. Through simply adding the corresponding sub-SVM module in the digital signal processer (DSP), arbitrary new modulation formats can be recognized with high recognition accuracy in a short response time. Full article
Show Figures

Figure 1

Figure 1
<p>Asynchronous fixed delay time sampling for asynchronous delay-tap plot.</p>
Full article ">Figure 2
<p>The left and middle columns show eye diagrams and asynchronous delay-tap plots (ADTPs) when the optical SNR (OSNR) = 20 dB without chromatic dispersion (CD) for 10-Gb return-to-zero (RZ), 40-Gb NRZ-differential phase-shift keying (DPSK), 40-Gb duo-binary optical (DUO), 40-Gb RZ-differential quadrature phase-shift keying (DQPSK), 100-Gb polarization-multiplexed (PM)-RZ-QPSK and 200-Gb PM-NRZ-16 quadrature amplitude modulation (16QAM), while the right column shows ADTPs when OSNR = 20 dB and CD = 500 ps/nm.</p>
Full article ">Figure 3
<p>Variations of four types of asynchronous delay-tap plot entropy (ADTPE) for six different modulation formats along with positive CD varying from 0 to 4000 ps/nm under different OSNR levels. The ADTPEs in the left, middle and right column correspond to 10-dB, 20-dB and 30-dB OSNRs, respectively.</p>
Full article ">Figure 4
<p>A viable procedure of recognition directly using multidimensional ADTPE.</p>
Full article ">Figure 5
<p>The structure of multiclass SVM comprised of fifteen sub-SVMs based on the one-<span class="html-italic">versus</span>-one algorithm.</p>
Full article ">Figure 6
<p>The structure of the real-time modulation format recognition system. PMD, polarization mode dispersion; SMF, single mode fiber; EDFA, erbium-doped fiber amplifier; VOA, variable optical attenuator; OBPF, optical band-pass filter; FDM, fixed dispersion module; PIN, positive intrinsic-negative diode; Async., asynchronous.</p>
Full article ">Figure 7
<p>The scatter points of four types of ADTPEs for multiple modulation formats.</p>
Full article ">Figure 8
<p>Recognized accuracy as a function of the proportion of the overall eigenvector for the SVM training.</p>
Full article ">
1374 KiB  
Article
Fruit Classification by Wavelet-Entropy and Feedforward Neural Network Trained by Fitness-Scaled Chaotic ABC and Biogeography-Based Optimization
by Shuihua Wang, Yudong Zhang, Genlin Ji, Jiquan Yang, Jianguo Wu and Ling Wei
Entropy 2015, 17(8), 5711-5728; https://doi.org/10.3390/e17085711 - 7 Aug 2015
Cited by 135 | Viewed by 15496
Abstract
Fruit classification is quite difficult because of the various categories and similar shapes and features of fruit. In this work, we proposed two novel machine-learning based classification methods. The developed system consists of wavelet entropy (WE), principal component analysis (PCA), feedforward neural network [...] Read more.
Fruit classification is quite difficult because of the various categories and similar shapes and features of fruit. In this work, we proposed two novel machine-learning based classification methods. The developed system consists of wavelet entropy (WE), principal component analysis (PCA), feedforward neural network (FNN) trained by fitness-scaled chaotic artificial bee colony (FSCABC) and biogeography-based optimization (BBO), respectively. The K-fold stratified cross validation (SCV) was utilized for statistical analysis. The classification performance for 1653 fruit images from 18 categories showed that the proposed “WE + PCA + FSCABC-FNN” and “WE + PCA + BBO-FNN” methods achieve the same accuracy of 89.5%, higher than state-of-the-art approaches: “(CH + MP + US) + PCA + GA-FNN ” of 84.8%, “(CH + MP + US) + PCA + PSO-FNN” of 87.9%, “(CH + MP + US) + PCA + ABC-FNN” of 85.4%, “(CH + MP + US) + PCA + kSVM” of 88.2%, and “(CH + MP + US) + PCA + FSCABC-FNN” of 89.1%. Besides, our methods used only 12 features, less than the number of features used by other methods. Therefore, the proposed methods are effective for fruit classification. Full article
Show Figures

Figure 1

Figure 1
<p>Illustration of “<span class="html-italic">Fruit</span>” dataset (one sample for each category). (<b>a</b>) Yellow Bananas; (<b>b</b>) Granny Smith Apples; (<b>c</b>) Rome Apples; (<b>d</b>) Tangerines; (<b>e</b>) Green Plantains; (<b>f</b>) Hass Avocados; (<b>g</b>) Watermelons; (<b>h</b>) Cantaloupes; (<b>i</b>) Gold Pineapples; (<b>j</b>) Passion Fruits; (<b>k</b>) Bosc Pears; (<b>l</b>) Anjou Pears; (<b>m</b>) Green Grapes; (<b>n</b>) Red Grapes; (<b>o</b>) Black Grapes; (<b>p</b>) Blackberries; (<b>q</b>) Strawberries; (<b>r</b>) and Blueberries.</p>
Full article ">Figure 2
<p>Flowchart of feature extraction.</p>
Full article ">Figure 3
<p>Structural architecture of one-hidden-layer FNN.</p>
Full article ">Figure 4
<p>Model of immigration λ and emigration μ probabilities.</p>
Full article ">Figure 5
<p>Diagram of the proposed method.</p>
Full article ">Figure 6
<p>Feature selection by principal component analysis (threshold is 95%).</p>
Full article ">
Back to TopTop