[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Issue
Volume 5, September
Previous Issue
Volume 5, March
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Data, Volume 5, Issue 2 (June 2020) – 29 articles

Cover Story (view full-size image): An increasing number of chemicals, such as pharmaceuticals, pesticides, and synthetic hormones, are in daily use worldwide. In the environment, chemicals can adversely affect biological populations and communities and, in turn, related ecosystem functions. Standartox is a database and tool that collects ecotoxicological test information to support the evaluation of environmental effects and risks of chemicals. Standartox cleans and harmonizes these data and subsequently provides access to functions that allow the data to be filtered and aggregated according to the user’s requirements. Large amounts of toxicity data on chemicals are currently scattered among various resources and are cumbersome to process. Standartox steadily incorporates new ecotoxicity data and aims at facilitating data access. View this paper.
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
25 pages, 15956 KiB  
Data Descriptor
A Probabilistic Bag-to-Class Approach to Multiple-Instance Learning
by Kajsa Møllersen, Jon Yngve Hardeberg and Fred Godtliebsen
Data 2020, 5(2), 56; https://doi.org/10.3390/data5020056 - 26 Jun 2020
Cited by 3 | Viewed by 4174
Abstract
Multi-instance (MI) learning is a branch of machine learning, where each object (bag) consists of multiple feature vectors (instances)—for example, an image consisting of multiple patches and their corresponding feature vectors. In MI classification, each bag in the training set has a class [...] Read more.
Multi-instance (MI) learning is a branch of machine learning, where each object (bag) consists of multiple feature vectors (instances)—for example, an image consisting of multiple patches and their corresponding feature vectors. In MI classification, each bag in the training set has a class label, but the instances are unlabeled. The instances are most commonly regarded as a set of points in a multi-dimensional space. Alternatively, instances are viewed as realizations of random vectors with corresponding probability distribution, where the bag is the distribution, not the realizations. By introducing the probability distribution space to bag-level classification problems, dissimilarities between probability distributions (divergences) can be applied. The bag-to-bag Kullback–Leibler information is asymptotically the best classifier, but the typical sparseness of MI training sets is an obstacle. We introduce bag-to-class divergence to MI learning, emphasizing the hierarchical nature of the random vectors that makes bags from the same class different. We propose two properties for bag-to-class divergences, and an additional property for sparse training sets, and propose a dissimilarity measure that fulfils them. Its performance is demonstrated on synthetic and real data. The probability distribution space is valid for MI learning, both for the theoretical analysis and applications. Full article
(This article belongs to the Special Issue Machine Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Breast tissue images [<a href="#B2-data-05-00056" class="html-bibr">2</a>]. The image segments are not labeled.</p>
Full article ">Figure 2
<p>Sea and desert images from Wikimedia Commons.</p>
Full article ">Figure 3
<p>Parametric generative model. Bags are realizations of random parameter vectors, sampled according to the respective class distributions. Instances are realizations of feature vectors, sampled according the respective bag distributions. Only the instance sets are observed.</p>
Full article ">Figure 4
<p>The PDF of a bag with uniform distribution and the PDFs of the two classes.</p>
Full article ">Figure 5
<p>(<b>a</b>) One positive bag in the training set gives small variance for the class PDF. (<b>b</b>) Ten positive bags in the training set, and the variance has increased.</p>
Full article ">Figure 6
<p>An example of ROC curves for <math display="inline"><semantics> <mrow> <mi>c</mi> <mi>K</mi> <mi>L</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>r</mi> <mi>K</mi> <mi>L</mi> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mi>r</mi> <mi>B</mi> <mi>H</mi> </mrow> </semantics></math> classifiers. The performance increases when the number of positive bags in the training set increases from 1 (dashed line) to 10 (solid line). The sensitivity-specificity pairs for the bag-to-bag KL and BH classifier is displayed for 100 positive and negative bags in the training set for comparison.</p>
Full article ">
11 pages, 3296 KiB  
Data Descriptor
A Database for the Radio Frequency Fingerprinting of Bluetooth Devices
by Emre Uzundurukan, Yaser Dalveren and Ali Kara
Data 2020, 5(2), 55; https://doi.org/10.3390/data5020055 - 21 Jun 2020
Cited by 39 | Viewed by 8257
Abstract
Radio frequency fingerprinting (RFF) is a promising physical layer protection technique which can be used to defend wireless networks from malicious attacks. It is based on the use of the distinctive features of the physical waveforms (signals) transmitted from wireless devices in order [...] Read more.
Radio frequency fingerprinting (RFF) is a promising physical layer protection technique which can be used to defend wireless networks from malicious attacks. It is based on the use of the distinctive features of the physical waveforms (signals) transmitted from wireless devices in order to classify authorized users. The most important requirement to develop an RFF method is the existence of a precise, robust, and extensive database of the emitted signals. In this context, this paper introduces a database consisting of Bluetooth (BT) signals collected at different sampling rates from 27 different smartphones (six manufacturers with several models for each). Firstly, the data acquisition system to create the database is described in detail. Then, the two well-known methods based on transient BT signals are experimentally tested by using the provided data to check their solidity. The results show that the created database may be useful for many researchers working on the development of the RFF of BT devices. Full article
(This article belongs to the Special Issue Data from Smartphones and Wearables)
Show Figures

Figure 1

Figure 1
<p>Data acquisition system with direct sampling.</p>
Full article ">Figure 2
<p>Data acquisition system with radio frequency (RF) front end.</p>
Full article ">Figure 3
<p>Undesired frequency components (spur signals).</p>
Full article ">Figure 4
<p>The recordings of BT signals: (<b>a</b>) Huawei GR5; (<b>b</b>) Samsung Note 3; (<b>c</b>) iPhone 7; (<b>d</b>) LG G4.</p>
Full article ">Figure 5
<p>The detected transient signals: (<b>a</b>) Huawei GR5; (<b>b</b>) Samsung Note 3; (<b>c</b>) iPhone 7; (<b>d</b>) LG G4.</p>
Full article ">Figure 6
<p>Comparison of the normalized energies of the data collected from Huawei GR5, Samsung Note 3, iPhone 7 and LG G4.</p>
Full article ">
15 pages, 6388 KiB  
Data Descriptor
Emissions from Swine Manure Treated with Current Products for Mitigation of Odors and Reduction of NH3, H2S, VOC, and GHG Emissions
by Baitong Chen, Jacek A. Koziel, Chumki Banik, Hantian Ma, Myeongseong Lee, Jisoo Wi, Zhanibek Meiirkhanuly, Daniel S. Andersen, Andrzej Białowiec and David B. Parker
Data 2020, 5(2), 54; https://doi.org/10.3390/data5020054 - 18 Jun 2020
Cited by 15 | Viewed by 5203
Abstract
Odor and gaseous emissions from the swine industry are of concern for the wellbeing of humans and livestock. Additives applied to the swine manure surface are popular, marketed products to solve this problem and relatively inexpensive and easy for farmers to use. There [...] Read more.
Odor and gaseous emissions from the swine industry are of concern for the wellbeing of humans and livestock. Additives applied to the swine manure surface are popular, marketed products to solve this problem and relatively inexpensive and easy for farmers to use. There is no scientific data evaluating the effectiveness of many of these products. We evaluated 12 manure additive products that are currently being marketed on their effectiveness in mitigating odor and gaseous emissions from swine manure. We used a pilot-scale system simulating the storage of swine manure with a controlled ventilation of headspace and periodic addition of manure. This dataset contains measured concentrations and estimated emissions of target gases in manure headspace above treated and untreated swine manure. These include ammonia (NH3), hydrogen sulfide (H2S), greenhouse gases (CO2, CH4, and N2O), volatile organic compounds (VOC), and odor. The experiment to test each manure additive product lasted for two months; the measurements of NH3 and H2S were completed twice a week; others were conducted weekly. The manure for each test was collected from three different farms in central Iowa to provide the necessary variety in stored swine manure properties. This dataset is useful for further analyses of gaseous emissions from swine manure under simulated storage conditions and for performance comparison of marketed products for the mitigation of gaseous emissions. Ultimately, swine farmers, the regulatory community, and the public need to have scientific data informing decisions about the usefulness of manure additives. Full article
(This article belongs to the Special Issue Big Data for Sustainable Development)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The schematic of a deep pit swine farm structure used in Iowa.</p>
Full article ">Figure 2
<p>Schematic of manure storage simulating the deep pit swine barn. A total of 15 storage simulators were used, facilitating tests of four manure additives in n = 3 replicates over eight-week trials.</p>
Full article ">Figure A1
<p>Manure simulator that simulated the deep pit storage under a slatted swine barn floor.</p>
Full article ">Figure A2
<p>Olfactometer used to measure odor concentration.</p>
Full article ">Figure A3
<p>Dräeger X-5600 used to measure the concentrations of H<sub>2</sub>S and NH<sub>3</sub>.</p>
Full article ">Figure A4
<p>OMS-300 used to measure the concentrations of H<sub>2</sub>S and NH<sub>3</sub>.</p>
Full article ">Figure A5
<p>Gas chromatograph (GC) equipped with a flame ionization detector (FID) and electron capture detector (ECD) used to measure the concentrations of CO<sub>2</sub>, CH<sub>4</sub>, and N<sub>2</sub>O.</p>
Full article ">Figure A6
<p>GC-MS used to measure the relative abundance of targeted VOCs.</p>
Full article ">
15 pages, 2172 KiB  
Article
Charge Recombination Kinetics of Bacterial Photosynthetic Reaction Centres Reconstituted in Liposomes: Deterministic Versus Stochastic Approach
by Emiliano Altamura, Paola Albanese, Pasquale Stano, Massimo Trotta, Francesco Milano and Fabio Mavelli
Data 2020, 5(2), 53; https://doi.org/10.3390/data5020053 - 12 Jun 2020
Cited by 3 | Viewed by 2719
Abstract
In this theoretical work, we analyse the kinetics of charge recombination reaction after a light excitation of the Reaction Centres extracted from the photosynthetic bacterium Rhodobacter sphaeroides and reconstituted in small unilamellar phospholipid vesicles. Due to the compartmentalized nature of liposomes, vesicles may [...] Read more.
In this theoretical work, we analyse the kinetics of charge recombination reaction after a light excitation of the Reaction Centres extracted from the photosynthetic bacterium Rhodobacter sphaeroides and reconstituted in small unilamellar phospholipid vesicles. Due to the compartmentalized nature of liposomes, vesicles may exhibit a random distribution of both ubiquinone molecules and the Reaction Centre protein complexes that can produce significant differences on the local concentrations from the average expected values. Moreover, since the amount of reacting species is very low in compartmentalized lipid systems the stochastic approach is more suitable to unveil deviations of the average time behaviour of vesicles from the deterministic time evolution. Full article
Show Figures

Figure 1

Figure 1
<p>Set of elementary reactions for the dark relaxation of photosynthetic Reaction Centres embedded in liposome membranes. <math display="inline"><semantics> <mrow> <msub> <mi>k</mi> <mrow> <mi>AD</mi> </mrow> </msub> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mrow> <mrow> <mi>k</mi> <mo>′</mo> </mrow> </mrow> <mrow> <mi>AD</mi> </mrow> </msub> </mrow> </semantics></math> are the kinetic constants of the electron transfer from <math display="inline"><semantics> <mrow> <msubsup> <mi mathvariant="normal">Q</mi> <mi mathvariant="normal">A</mi> <mo>−</mo> </msubsup> </mrow> </semantics></math> to D<sup>+</sup> when the <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">Q</mi> <mi mathvariant="normal">B</mi> </msub> </mrow> </semantics></math> pocket is empty or occupied, respectively. <span class="html-italic">k</span><sub>BD</sub> is the kinetic constant of the direct charge recombination from <math display="inline"><semantics> <mrow> <msubsup> <mi mathvariant="normal">Q</mi> <mi mathvariant="normal">B</mi> <mo>−</mo> </msubsup> </mrow> </semantics></math> to D<sup>+</sup>. <math display="inline"><semantics> <mrow> <msubsup> <mi>k</mi> <mrow> <mi>in</mi> </mrow> <mo>*</mo> </msubsup> <msubsup> <mrow> <mrow> <mo>/</mo> <mi>k</mi> </mrow> </mrow> <mrow> <mi>out</mi> </mrow> <mo>*</mo> </msubsup> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <msub> <mi>k</mi> <mrow> <mi>in</mi> </mrow> </msub> <mo>/</mo> <msub> <mi>k</mi> <mrow> <mi>out</mi> </mrow> </msub> </mrow> </semantics></math> represent the kinetic constants of the ubiquinone association/dissociation to/from the <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">Q</mi> <mi mathvariant="normal">B</mi> </msub> </mrow> </semantics></math> site in the charge separated and neutral state, respectively; <span class="html-italic">k</span><sub>AB</sub> is the kinetic constant for the electron transfer reaction from <math display="inline"><semantics> <mrow> <msubsup> <mi mathvariant="normal">Q</mi> <mi mathvariant="normal">A</mi> <mo>−</mo> </msubsup> </mrow> </semantics></math> to <math display="inline"><semantics> <mrow> <msub> <mi mathvariant="normal">Q</mi> <mi mathvariant="normal">B</mi> </msub> </mrow> </semantics></math>, while <span class="html-italic">k</span><sub>BA</sub> is the kinetic constant for the backward reaction.</p>
Full article ">Figure 2
<p>Random distribution of the reacting molecules D<sup>+</sup>Q<sub>A</sub><sup>−</sup>, D<sup>+</sup>Q<sub>A</sub><sup>−</sup>Q<sub>B</sub> and Q at the beginning of a simulation run by setting [RC] = 1.0 × 10<sup>−3</sup> M and <math display="inline"><semantics> <mrow> <msubsup> <mi>c</mi> <mi>Q</mi> <mi>T</mi> </msubsup> <mo>/</mo> <msubsup> <mi>c</mi> <mrow> <mi>R</mi> <mi>C</mi> </mrow> <mi>T</mi> </msubsup> <mo>=</mo> <mn>0.1</mn> </mrow> </semantics></math>. Comparisons between the simulate distributions (bars) and the theoretical probabilities, Gaussian <span class="html-italic">N</span>(<span class="html-italic">n|N<sub>x</sub>,</span><math display="inline"><semantics> <mrow> <msqrt> <mrow> <msub> <mi>N</mi> <mi>x</mi> </msub> </mrow> </msqrt> </mrow> </semantics></math> )<span class="html-italic">dn</span> (blue curves) for D<sup>+</sup>Q<sub>A</sub><sup>−</sup> and Poisson <span class="html-italic">P(n|N<sub>x</sub>)</span> (red points and lines) for D<sup>+</sup>Q<sub>A</sub><sup>−</sup>Q<sub>B</sub> and Q, are reported.</p>
Full article ">Figure 3
<p>Experimental charge recombination traces at 865 nm for POPC at 25 °C: normalized absorbance χ(D<sup>+</sup>) = <math display="inline"><semantics> <mrow> <mo>Δ</mo> <mi>A</mi> <mo>/</mo> <mo>Δ</mo> <msub> <mi>A</mi> <mn>0</mn> </msub> </mrow> </semantics></math> <span class="html-italic">vs</span> time (black lines) along with the optimized solutions (red lines) of the ODE set reported in Equation (2). Q/RC ratios are: 0, 0.5, 0.8 and 20 going from the fastest decay to the slowest. Initial concentrations have been set, as described in Numerical Integration Section. The best-fit kinetic constants are reported in <a href="#data-05-00053-t001" class="html-table">Table 1</a>. The optimized kinetic parameters reduce the average RMSD of the four curves of about 50%.</p>
Full article ">Figure 4
<p>Time evolution of the photo excited species: comparisons between the single runs of stochastic simulations (black and gray curves) and the numerical solution of the ODE set (red dashed curves). Plots are referred to the same RC concentration: <span class="html-italic">c</span><sup>T</sup><sub>RC</sub> = 1.0 × 10<sup>−3</sup> M, but to six different values of the ratio <span class="html-italic">c</span><sup>T</sup><sub>Q</sub>/<span class="html-italic">c</span><sup>T</sup><sub>RC</sub>. In each plot simulation outcomes were performed for a vesicle with the same membrane composition, but with an increasing membrane volume, starting from the membrane volume of a 40 nm radius liposome <span class="html-italic">V</span><sub>40</sub> = <math display="inline"><semantics> <mrow> <mn>7</mn> <msup> <mrow> <mrow> <mo>.</mo> <mn>3</mn> <mtext> </mtext> <mo>×</mo> <mtext> </mtext> <mn>10</mn> </mrow> </mrow> <mrow> <mrow> <mo>−</mo> <mn>20</mn> </mrow> </mrow> </msup> <mo> </mo> <mi>d</mi> <msup> <mi>m</mi> <mn>3</mn> </msup> <mo>,</mo> </mrow> </semantics></math> (black curve) and enlarging the membrane volume by 10 to 10<sup>3</sup> times. Legends show the RMSD values for the individual stochastic traces calculated with Equation (4).</p>
Full article ">Figure 5
<p>Time evolution of the photo excited species: comparisons between the averages of stochastic simulations (black line with gray confidence bands) and the deterministic curves (red lines) for different <math display="inline"><semantics> <mrow> <msubsup> <mi>c</mi> <mi>Q</mi> <mi>T</mi> </msubsup> </mrow> </semantics></math>/<math display="inline"><semantics> <mrow> <msubsup> <mi>c</mi> <mrow> <mi>R</mi> <mi>C</mi> </mrow> <mi>T</mi> </msubsup> </mrow> </semantics></math>. The averages were calculated over the stochastic simulation outcomes of a 500 monodispersed vesicle population having a 40 nm radius and a membrane volume <span class="html-italic">V</span><sub>40</sub> = <math display="inline"><semantics> <mrow> <mn>7</mn> <msup> <mrow> <mrow> <mo>.</mo> <mn>3</mn> <mtext> </mtext> <mo>×</mo> <mtext> </mtext> <mn>10</mn> </mrow> </mrow> <mrow> <mrow> <mo>−</mo> <mn>20</mn> </mrow> </mrow> </msup> <mo> </mo> <mi>d</mi> <msup> <mi>m</mi> <mn>3</mn> </msup> <mo>,</mo> </mrow> </semantics></math> the same RC concentration: <span class="html-italic">c</span><sup>T</sup><sub>RC</sub> = 1.0 × 10<sup>−3</sup> M, but six different values of the ratio <span class="html-italic">c</span><sup>T</sup><sub>Q</sub>/<span class="html-italic">c</span><sup>T</sup><sub>RC</sub>. The RC proteins and Q molecules were distributed uniformly over the vesicle populations.</p>
Full article ">Figure 6
<p>Time evolution of the photo excited species: comparison between the averages of stochastic simulations (black lines with grey error bands) and the deterministic curves (red lines) for a population of 500 monodisperse vesicles of 200 nm (<b>A</b>), 40 nm (<b>B</b>), and 20 nm (<b>C</b>) radius, respectively, with the same membrane composition: <math display="inline"><semantics> <mrow> <msubsup> <mi>c</mi> <mrow> <mi>R</mi> <mi>C</mi> </mrow> <mi>T</mi> </msubsup> </mrow> </semantics></math> = 1.0 × 10<sup>−3</sup> M and <math display="inline"><semantics> <mrow> <msubsup> <mi>c</mi> <mi>Q</mi> <mi>T</mi> </msubsup> </mrow> </semantics></math> /<math display="inline"><semantics> <mrow> <msubsup> <mi>c</mi> <mrow> <mi>R</mi> <mi>C</mi> </mrow> <mi>T</mi> </msubsup> </mrow> </semantics></math> = 1. The RC proteins and Q molecules were distributed randomly over the vesicle populations according to a Gaussian density/Poisson probability.</p>
Full article ">
11 pages, 2510 KiB  
Data Descriptor
Large-Scale Dataset for Radio Frequency-Based Device-Free Crowd Estimation
by Abdil Kaya, Stijn Denis, Ben Bellekens, Maarten Weyn and Rafael Berkvens
Data 2020, 5(2), 52; https://doi.org/10.3390/data5020052 - 9 Jun 2020
Cited by 8 | Viewed by 4531
Abstract
Organisers of events attracting many people have the important task to ensure the safety of the crowd on their venue premises. Measuring the size of the crowd is a critical first step, but often challenging because of occlusions, noise and the dynamics of [...] Read more.
Organisers of events attracting many people have the important task to ensure the safety of the crowd on their venue premises. Measuring the size of the crowd is a critical first step, but often challenging because of occlusions, noise and the dynamics of the crowd. We have been working on a passive Radio Frequency (RF) sensing technique for crowd size estimation, and we now present three datasets of measurements collected at the Tomorrowland music festival in environments containing thousands of people. All datasets have reference data, either based on payment transactions or an access control system, and we provide an example analysis script. We hope that future analyses can lead to an added value for crowd safety experts. Full article
Show Figures

Figure 1

Figure 1
<p>Each line in the dataset files corresponds to a message received by the Controller, such as the one sent by Node 33 in the example above. There are <span class="html-italic">N</span> nodes and a controller in the network. A zero entry in the <tt>rssi_values</tt> vector means that the listening node (Node 33 in the example) did not receive a message from the node ID with the corresponding vector index in the past cycle.</p>
Full article ">Figure 2
<p>We designed and deployed two types of nodes throughout the years: (<b>a</b>) The first iteration in a sturdy but open encapsulation, always featuring both the 433 MHz and 868 MHz networks. (<b>b</b>) The second iteration with a waterproof encapsulation, featuring either both networks or just the 868 MHz network. Both types are powered by a 6600 mAh battery and have an independently working microcontroller for each network.</p>
Full article ">Figure 3
<p>Communication cycle example with <span class="html-italic">N</span> nodes and a controller in the network. Controllers send a message that starts the cycle, after which the controller itself waits for <math display="inline"><semantics> <mrow> <msub> <mi>T</mi> <mi>w</mi> </msub> <mo>∗</mo> <mrow> <mo>(</mo> <mi>N</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </semantics></math> before broadcasting another start cycle message. Network devices schedule the transmission of their vector with RSSI values at an interval of <math display="inline"><semantics> <msub> <mi>T</mi> <mi>w</mi> </msub> </semantics></math>. After transmitting their payload, the vector is reset to zeroes, to be populated by the time the node can transmit again. The duration of transmissions as depicted in this figure depends on the payload size and the communication protocol.</p>
Full article ">Figure 4
<p>Network node and controller positions at (<b>a</b>) Freedom Stage 2017, (<b>b</b>) Freedom Stage 2018 and (<b>c</b>) Main Comfort 2018 environments. The 433 <math display="inline"><semantics> <mi mathvariant="normal">M</mi> </semantics></math><math display="inline"><semantics> <mi>Hz</mi> </semantics></math> and 868 <math display="inline"><semantics> <mi mathvariant="normal">M</mi> </semantics></math><math display="inline"><semantics> <mi>Hz</mi> </semantics></math> nodes share the same position, although there are fourteen positions in the Main Comfort environment that only have a 868 <math display="inline"><semantics> <mi mathvariant="normal">M</mi> </semantics></math><math display="inline"><semantics> <mi>Hz</mi> </semantics></math> node. These positions are indicated with a triangle and these nodes have IDs of 40 and above.</p>
Full article ">Figure 5
<p>433 MHz RSS attenuation graphs as generated by our example script. (<b>a</b>) Saturday and (<b>b</b>) Sunday of the Freedom Stage 2017 environment, and (<b>c</b>) Saturday and (<b>d</b>) Sunday of the Freedom Stage 2018 environment are overlaid with the cashless transactions per minute. (<b>e</b>) Saturday and (<b>f</b>) Sunday of the Main Comfort 2018 environment are overlaid with the scan system-based crowd counts. Green vertical bands indicate the interval of data used for the calibration. Grey vertical lines indicate the beginning and end of a DJ set at the festival. The rolling standard deviation of the mean RSS attenuation is indicated as a light blue band around the mean RSS attenuation graph (±1<span class="html-italic">σ</span>).</p>
Full article ">Figure 6
<p>868 MHz RSS attenuation graphs as generated by our example script. (<b>a</b>) Saturday and (<b>b</b>) Sunday of the Freedom Stage 2017 environment, and (<b>c</b>) Saturday and (<b>d</b>) Sunday of the Freedom Stage 2018 environment are overlaid with the cashless transactions per minute. (<b>e</b>) Saturday and (<b>f</b>) Sunday of the Main Comfort 2018 environment are overlaid with the scan system based crowd counts. Green vertical bands indicate the interval of data used for the calibration. Grey vertical lines indicate the beginning and end of a DJ set at the festival. The rolling standard deviation of the mean RSS attenuation is indicated as a light blue band around the mean RSS attenuation graph (±1<span class="html-italic">σ</span>).</p>
Full article ">
19 pages, 1955 KiB  
Review
An Interdisciplinary Review of Camera Image Collection and Analysis Techniques, with Considerations for Environmental Conservation Social Science
by Coleman L. Little, Elizabeth E. Perry, Jessica P. Fefer, Matthew T. J. Brownlee and Ryan L. Sharp
Data 2020, 5(2), 51; https://doi.org/10.3390/data5020051 - 6 Jun 2020
Cited by 8 | Viewed by 4255
Abstract
Camera-based data collection and image analysis are integral methods in many research disciplines. However, few studies are specifically dedicated to trends in these methods or opportunities for interdisciplinary learning. In this systematic literature review, we analyze published sources (n = 391) to [...] Read more.
Camera-based data collection and image analysis are integral methods in many research disciplines. However, few studies are specifically dedicated to trends in these methods or opportunities for interdisciplinary learning. In this systematic literature review, we analyze published sources (n = 391) to synthesize camera use patterns and image collection and analysis techniques across research disciplines. We frame this inquiry with interdisciplinary learning theory to identify cross-disciplinary approaches and guiding principles. Within this, we explicitly focus on trends within and applicability to environmental conservation social science (ECSS). We suggest six guiding principles for standardized, collaborative approaches to camera usage and image analysis in research. Our analysis suggests that ECSS may offer inspiration for novel combinations of data collection, standardization tactics, and detailed presentations of findings and limitations. ECSS can correspondingly incorporate more image analysis tactics from other disciplines, especially in regard to automated image coding of pertinent attributes. Full article
(This article belongs to the Special Issue Machine Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Steps followed to refine the corpus of sources included in this systematic literature review, from initial query to final database. Following this process, citation metadata and six attributes were thematically coded for each of the 391 included sources: research discipline, country and continent of study, camera type, camera placement, data collection method, and data analysis method.</p>
Full article ">Figure 2
<p>Publication distribution over time (5 year increments from 1995 to 2019) for each research discipline. The research discipline key is presented in the same order as sources, from top to bottom, most to least (i.e., from Environmental Conservation Social Sciences having the highest percentage to Biology/Microbiology having the least).</p>
Full article ">Figure 3
<p>Publication distribution over time (5 year increments from 1995 to 2019) for each camera placement technique. The placement technique key is presented in the same order as sources, from top to bottom most to least (i.e., from outdoor fixed having the highest number to Watercraft having the least).</p>
Full article ">Figure 4
<p>Environmental conservation social science publication distribution over time (5 year increments from 1995 to 2019) for each camera placement technique. The placement technique key is presented in the same order as sources, from top to bottom most to least (i.e., from outdoor fixed having the highest number to Computer having the least).</p>
Full article ">
9 pages, 2065 KiB  
Article
Data Wrangling in Database Systems: Purging of Dirty Data
by Otmane Azeroual
Data 2020, 5(2), 50; https://doi.org/10.3390/data5020050 - 5 Jun 2020
Cited by 24 | Viewed by 7180
Abstract
Researchers need to be able to integrate ever-increasing amounts of data into their institutional databases, regardless of the source, format, or size of the data. It is then necessary to use the increasing diversity of data to derive greater value from data for [...] Read more.
Researchers need to be able to integrate ever-increasing amounts of data into their institutional databases, regardless of the source, format, or size of the data. It is then necessary to use the increasing diversity of data to derive greater value from data for their organization. The processing of electronic data plays a central role in modern society. Data constitute a fundamental part of operational processes in companies and scientific organizations. In addition, they form the basis for decisions. Bad data quality can negatively affect decisions and have a negative impact on results. The quality of the data is crucial. This includes the new theme of data wrangling, sometimes referred to as data munging or data crunching, to find the dirty data and to transform and clean them. The aim of data wrangling is to prepare a lot of raw data in their original state so that they can be used for further analysis steps. Only then can knowledge be obtained that may bring added value. This paper shows how the data wrangling process works and how it can be used in database systems to clean up data from heterogeneous data sources during their acquisition and integration. Full article
(This article belongs to the Special Issue Challenges in Business Intelligence)
Show Figures

Figure 1

Figure 1
<p>Data wrangling process.</p>
Full article ">Figure 2
<p>Imported dataset in Trifacta Wrangler.</p>
Full article ">Figure 3
<p>The important function “Recipe”.</p>
Full article ">
13 pages, 1247 KiB  
Data Descriptor
Responses of Germination to Light and to Far-Red Radiation—Can they be Predicted from Diaspores Size?
by Luís Silva Dias, Elsa Ganhão and Alexandra Soveral Dias
Data 2020, 5(2), 49; https://doi.org/10.3390/data5020049 - 21 May 2020
Cited by 1 | Viewed by 2624
Abstract
This paper presents an update of a dataset of seed volumes previously released online and combines it with published data of the photoblastic response of germination of fruits or seeds (light or dark conditions), and of the effects of enhanced far-red radiation on [...] Read more.
This paper presents an update of a dataset of seed volumes previously released online and combines it with published data of the photoblastic response of germination of fruits or seeds (light or dark conditions), and of the effects of enhanced far-red radiation on germination. Some evidence was found to support that germination in larger diaspores might be indifferent to light or dark conditions. Similarly, germination in smaller diaspores might be inhibited by far-red radiation. However, the length, width, thickness, volume, shape, type of diaspore, or relative amplitude of volume is essentially useless to predict photoblastic responses or the effects of far-red radiation on germination of diaspores. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Distribution of positively (triangles), indifferent (circles), and negatively (squares) photoblastic diaspores separately for fruits (open symbols) and seeds (closed symbols) in relation to: (<b>a</b>) size (expressed as equivalent diameter); (<b>b</b>) departure from sphericity (expressed as population variance of linear dimensions normed so that length is unit).</p>
Full article ">Figure 2
<p>Distribution of diaspores with germination indifferent to (circles), and inhibited (diamonds) by far-red radiation, separately for fruits (open symbols) and seeds (closed symbols), in relation to: (<b>a</b>) size (expressed as equivalent diameter); (<b>b</b>) departure from sphericity (expressed as population variance of linear dimensions normed so that length is unit). Diaspores with germination completely inhibited by far-red radiation are represented by asterisks.</p>
Full article ">Figure A1
<p>Mass distribution of positively (triangles), indifferent (circles), and negatively (squares) photoblastic diaspores. Diaspores with germination completely inhibited in dark are represented by asterisks.</p>
Full article ">
8 pages, 446 KiB  
Data Descriptor
Low-Temperature Pyrolysis of Municipal Solid Waste Components and Refuse-Derived Fuel—Process Efficiency and Fuel Properties of Carbonized Solid Fuel
by Kacper Świechowski, Ewa Syguła, Jacek A. Koziel, Paweł Stępień, Szymon Kugler, Piotr Manczarski and Andrzej Białowiec
Data 2020, 5(2), 48; https://doi.org/10.3390/data5020048 - 21 May 2020
Cited by 18 | Viewed by 4558
Abstract
New technologies to valorize refuse-derived fuels (RDFs) will be required in the near future due to emerging trends of (1) the cement industry’s demands for high-quality alternative fuels and (2) the decreasing calorific value of the fuels derived from municipal solid waste (MSW) [...] Read more.
New technologies to valorize refuse-derived fuels (RDFs) will be required in the near future due to emerging trends of (1) the cement industry’s demands for high-quality alternative fuels and (2) the decreasing calorific value of the fuels derived from municipal solid waste (MSW) and currently used in cement/incineration plants. Low-temperature pyrolysis can increase the calorific value of processed material, leading to the production of value-added carbonized solid fuel (CSF). This dataset summarizes the key properties of MSW-derived CSF. Pyrolysis experiments were completed using eight types of organic waste and their two RDF mixtures. Organic waste represented common morphological groups of MSW, i.e., cartons, fabrics, kitchen waste, paper, plastic, rubber, PAP/AL/PE composite packaging (multi-material packaging also known as Tetra Pak cartons), and wood. The pyrolysis was conducted at temperatures ranging from 300 to 500 °C (20 °C intervals), with a retention (process) time of 20 to 60 min (20 min intervals). The mass yield, energy densification ratio, and energy yield were determined to characterize the pyrolysis process efficiency. The raw materials and produced CSF were tested with proximate analyses (moisture content, organic matter content, ash content, and combustible part content) and with ultimate analyses (elemental composition C, H, N, S) and high heating value (HHV). Additionally, differential scanning calorimetry (DSC) and thermogravimetric analyses (TGA) of the pyrolysis process were performed. The dataset documents the changes in fuel properties of RDF resulting from low-temperature pyrolysis as a function of the pyrolysis conditions and feedstock type. The greatest HHV improvements were observed for fabrics (up to 65%), PAP/AL/PE composite packaging (up to 56%), and wood (up to 46%). Full article
Show Figures

Figure 1

Figure 1
<p>An example of temperature patterns during the pyrolysis of municipal solid waste (MSW) components.</p>
Full article ">
5 pages, 1946 KiB  
Data Descriptor
Data from Experimental Analysis of the Performance and Load Cycling of a Polymer Electrolyte Membrane Fuel Cell
by Andrea Ramírez-Cruzado, Blanca Ramírez-Peña, Rosario Vélez-García, Alfredo Iranzo and José Guerra
Data 2020, 5(2), 47; https://doi.org/10.3390/data5020047 - 20 May 2020
Cited by 4 | Viewed by 3516
Abstract
Fuel cells are electrochemical devices that convert the chemical energy stored in fuels (hydrogen for polymer electrolyte membrane (PEM) fuel cells) directly into electricity with high efficiency. Fuel cells are already commercially used in different applications, and significant research efforts are being carried [...] Read more.
Fuel cells are electrochemical devices that convert the chemical energy stored in fuels (hydrogen for polymer electrolyte membrane (PEM) fuel cells) directly into electricity with high efficiency. Fuel cells are already commercially used in different applications, and significant research efforts are being carried out to further improve their performance and durability and to reduce costs. Experimental testing of fuel cells is a fundamental research activity used to assess all the issues indicated above. The current work presents original data corresponding to the experimental analysis of the performance of a 50 cm2 PEM fuel cell, including experimental results from a load cycling dedicated test. The experimental data were acquired using a dedicated test bench following the harmonized testing protocols defined by the Joint Research Centre (JRC) of the European Commission for automotive applications. With the presented dataset, we aim to provide a transparent collection of experimental data from PEM fuel cell testing that can contribute to enhanced reusability for further research. Full article
Show Figures

Figure 1

Figure 1
<p>Simplified P&amp;ID of the test bench showing the instrumentation referred in <a href="#data-05-00047-t002" class="html-table">Table 2</a>.</p>
Full article ">
16 pages, 1920 KiB  
Data Descriptor
Standartox: Standardizing Toxicity Data
by Andreas Scharmüller, Verena C. Schreiner and Ralf B. Schäfer
Data 2020, 5(2), 46; https://doi.org/10.3390/data5020046 - 16 May 2020
Cited by 18 | Viewed by 5711
Abstract
An increasing number of chemicals such as pharmaceuticals, pesticides and synthetic hormones are in daily use all over the world. In the environment, chemicals can adversely affect populations and communities and in turn related ecosystem functions. To evaluate the risks from chemicals for [...] Read more.
An increasing number of chemicals such as pharmaceuticals, pesticides and synthetic hormones are in daily use all over the world. In the environment, chemicals can adversely affect populations and communities and in turn related ecosystem functions. To evaluate the risks from chemicals for ecosystems, data on their toxicity, which are typically produced in standardized ecotoxicological laboratory tests, is required. The results from ecotoxicological tests are compiled in (meta-)databases such as the United States Environmental Protection Agency (EPA) ECOTOXicology Knowledgebase (ECOTOX). However, for many chemicals, multiple ecotoxicity data are available for the same test organism. These can vary strongly, thereby causing uncertainty of related analyses. Given that most current databases lack aggregation steps or are confined to specific chemicals, we developed Standartox, a tool and database that continuously incorporates the ever-growing number of test results in an automated process workflow that ultimately leads to a single aggregated data point for a specific chemical-organism test combination, representing the toxicity of a chemical. Standartox can be accessed through a web application and an R package. Full article
Show Figures

Figure 1

Figure 1
<p>Share of 10 most frequent entries for the parameters (<b>A</b>) effect group, (<b>B</b>) chemical role, (<b>C</b>) chemical class, (<b>D</b>) taxonomic order, (<b>E</b>) organism habitat and (<b>F</b>) organism distribution in Standartox. Multiple classifications are possible (e.g. a chemical can be a fungicide and a pesticide).</p>
Full article ">Figure 2
<p>Violin plots of of test results (XX<sub>50</sub>) in Standartox illustrating (<b>A</b>) differential variability and data distribution between species (i.e., <span class="html-italic">Xenopus laevis</span>—Amphibian, <span class="html-italic">Raphidocelis subcapitata</span>—Algae, <span class="html-italic">Oncorhynchus mykiss</span>—Fish, <span class="html-italic">Lemna minor</span>—Macrophyte) for the chemical atrazine in 96 h tests, (<b>B</b>) how the variability in toxicity tests with zinc sulfate and <span class="html-italic">Daphnia magna</span> varies with test duration and (<b>C</b>) high variability that is not explained by the available test characteristics in the case of cupric sulfate tested on <span class="html-italic">Pimephales promelas</span> for 96 h. Red dots depict Standartox geometric mean estimates and red error bars show the associated standard deviation. Black dots depict the raw data. To facilitate readability, data points are randomly scattered along a hypothetical y-axis and are greyed out if within the violins.</p>
Full article ">Figure 3
<p>Comparison between Standartox, (<b>A</b>) the Pesticides Properties DataBase (PPDB) and (<b>B</b>) ChemProp values. The black lines indicate identity and red lines mark a divergence of a factor of 10. Compared species are color coded.</p>
Full article ">Figure 4
<p>Organigram of Standartox. The U.S. Environmental Protection Agency (EPA) ECOTOXicology Knowledgebase (ECOTOX) is downloaded quarterly and processed (i.e., query additional information with Chemical Abstracts Service (CAS) numbers and taxa names and conversion of concentration and duration units). Subsequently, a Standartox data set is compiled together with filter and aggregation methods. Thus, users can access the Standartox data set and filter and aggregate through a web application and an R package.</p>
Full article ">
8 pages, 2280 KiB  
Data Descriptor
An Open Access Data Set Highlighting Aggregation of Dyes on Metal Oxides
by Vishwesh Venkatraman and Lethesh Kallidanthiyil Chellappan
Data 2020, 5(2), 45; https://doi.org/10.3390/data5020045 - 13 May 2020
Cited by 6 | Viewed by 3093
Abstract
The adsorption of a dye to a metal oxide surface such as TiO2, NiO and ZnO leads to deprotonation and often undesirable aggregation of dye molecules, which in turn impacts the photophysical properties of the dye. While controlled aggregation is useful [...] Read more.
The adsorption of a dye to a metal oxide surface such as TiO2, NiO and ZnO leads to deprotonation and often undesirable aggregation of dye molecules, which in turn impacts the photophysical properties of the dye. While controlled aggregation is useful for some applications, it can result in lower performance for dye-sensitized solar cells. To understand this phenomenon better, we have conducted an extensive search of the literature and identified over 4000 records of absorption spectra in solution and after adsorption onto metal oxide. The total data set comprises over 3500 unique compounds, with observed absorption maxima in solution and after adsorption on the semiconductor electrode. This data may serve to provide further insight into the structure-property relationships governing dye-aggregation behaviour. Full article
(This article belongs to the Special Issue Machine Learning and Materials Informatics)
Show Figures

Figure 1

Figure 1
<p>Impact of solvent polarity on aggregation. For each pure solvent, the stacked barplot shows the number of blue/red-shifted or unchanged categories (as defined by Equation (<a href="#FD2-data-05-00045" class="html-disp-formula">2</a>). N-Methyl-2-Pyrrolidone and acetic acid are not shown as only single entries were available for these solvents.</p>
Full article ">Figure 2
<p>Frequently occurring fragments in the dyes.</p>
Full article ">Figure 3
<p>Pie charts showing the distribution of the absorption shifts (irrespective of the solvent) based on the (<b>A</b>) class of the dyes and (<b>B</b>) the anchoring groups used. (<b>A</b>) The “misc” category includes various dyes containing fluorene, phenoxazine, truxene, <span class="html-italic">N</span>,<span class="html-italic">N</span>-dialkylaniline, julolidine and other donor classes. (<b>B</b>) The “misc” category includes dyes anchored to the metal oxide via thiazolidinedione, aldehyde, hydantoin, hydroxybenzonitrile, alkoxysilane and other groups.</p>
Full article ">Figure 4
<p>Browser-based search utility. (<b>a</b>) User interface for searching the aggregation data set. Users can select different options on the form for narrowing the search. (<b>b</b>) Results are displayed as a table. A maximum of 100 records can be displayed. The results can be retrieved as a semicolon delimited CSV file.</p>
Full article ">
26 pages, 5969 KiB  
Article
An Optimum Tea Fermentation Detection Model Based on Deep Convolutional Neural Networks
by Gibson Kimutai, Alexander Ngenzi, Rutabayiro Ngoga Said, Ambrose Kiprop and Anna Förster
Data 2020, 5(2), 44; https://doi.org/10.3390/data5020044 - 30 Apr 2020
Cited by 16 | Viewed by 6815
Abstract
Tea is one of the most popular beverages in the world, and its processing involves a number of steps which includes fermentation. Tea fermentation is the most important step in determining the quality of tea. Currently, optimum fermentation of tea is detected by [...] Read more.
Tea is one of the most popular beverages in the world, and its processing involves a number of steps which includes fermentation. Tea fermentation is the most important step in determining the quality of tea. Currently, optimum fermentation of tea is detected by tasters using any of the following methods: monitoring change in color of tea as fermentation progresses and tasting and smelling the tea as fermentation progresses. These manual methods are not accurate. Consequently, they lead to a compromise in the quality of tea. This study proposes a deep learning model dubbed TeaNet based on Convolution Neural Networks (CNN). The input data to TeaNet are images from the tea Fermentation and Labelme datasets. We compared the performance of TeaNet with other standard machine learning techniques: Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree (DT), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), and Naive Bayes (NB). TeaNet was more superior in the classification tasks compared to the other machine learning techniques. However, we will confirm the stability of TeaNet in the classification tasks in our future studies when we deploy it in a tea factory in Kenya. The research also released a tea fermentation dataset that is available for use by the community. Full article
(This article belongs to the Special Issue Machine Learning in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Processing steps of black tea.</p>
Full article ">Figure 2
<p>Tea fermentation process.</p>
Full article ">Figure 3
<p>Implementation of machine learning techniques.</p>
Full article ">Figure 4
<p>Examples of classes of the tea fermentation dataset.</p>
Full article ">Figure 5
<p>Examples of categories of LabelMe dataset.</p>
Full article ">Figure 6
<p>Generation of color features of an image using color histogram.</p>
Full article ">Figure 7
<p>Conversion of image to grayscale histogram using Local Binary Patterns (LBP).</p>
Full article ">Figure 8
<p>Example of classification by a decision tree.</p>
Full article ">Figure 9
<p>Example of a random forest operation.</p>
Full article ">Figure 10
<p>K-Nearest Neighbor (KNN) proximity algorithm map.</p>
Full article ">Figure 11
<p>A typical Convolution Neural Network (CNN) architecture.</p>
Full article ">Figure 12
<p>An example of a classification task using Support Vector Machine (SVM).</p>
Full article ">Figure 13
<p>Example of classification using Naive Bayes.</p>
Full article ">Figure 14
<p>Example of classification using Local Discriminant Analysis (LDA).</p>
Full article ">Figure 15
<p>The architecture of the TeaNet that we propose for optimum detection of tea fermentation.</p>
Full article ">Figure 16
<p>Accuracy and loss of TeaNet during training and validation.</p>
Full article ">Figure 17
<p>Precision of classification for each of the classifiers for the two datasets.</p>
Full article ">Figure 18
<p>Recall of classification for each of the classifiers for the two datasets.</p>
Full article ">Figure 19
<p>F1 scores of classification for each of the classifiers for the two datasets.</p>
Full article ">Figure 20
<p>Accuracy of classification for each of the classifiers for the two datasets.</p>
Full article ">Figure 21
<p>Logarithmic Loss of classification for each of the classifiers for the two datasets.</p>
Full article ">
13 pages, 1391 KiB  
Article
Guidelines for a Standardized Filesystem Layout for Scientific Data
by Florian Spreckelsen, Baltasar Rüchardt, Jan Lebert, Stefan Luther, Ulrich Parlitz and Alexander Schlemmer
Data 2020, 5(2), 43; https://doi.org/10.3390/data5020043 - 24 Apr 2020
Cited by 3 | Viewed by 5297
Abstract
Storing scientific data on the filesystem in a meaningful and transparent way is no trivial task. In particular, when the data have to be accessed after their originator has left the lab, the importance of a standardized filesystem layout cannot be underestimated. It [...] Read more.
Storing scientific data on the filesystem in a meaningful and transparent way is no trivial task. In particular, when the data have to be accessed after their originator has left the lab, the importance of a standardized filesystem layout cannot be underestimated. It is desirable to have a structure that allows for the unique categorization of all kinds of data from experimental results to publications. They have to be accessible to a broad variety of workflows, e.g., via graphical user interface as well as via command line, in order to find widespread acceptance. Furthermore, the inclusion of already existing data has to be as simple as possible. We propose a three-level layout to organize and store scientific data that incorporates the full chain of scientific data management from data acquisition to analysis to publications. Metadata are saved in a standardized way and connect original data to analyses and publications as well as to their originators. A simple software tool to check a file structure for compliance with the proposed structure is presented. Full article
(This article belongs to the Special Issue Data Quality and Data Access for Research)
Show Figures

Figure 1

Figure 1
<p>The example project <tt>2020_SpeedOfLight</tt> contains data from three different experiments each of which is identified by date and the experimental method used to determine the speed of light.</p>
Full article ">
18 pages, 15131 KiB  
Data Descriptor
Changes in the Building Stock of Da Nang between 2015 and 2017
by Andreas Braun, Gebhard Warth, Felix Bachofer, Tram Thi Quynh Bui, Hao Tran and Volker Hochschild
Data 2020, 5(2), 42; https://doi.org/10.3390/data5020042 - 23 Apr 2020
Cited by 3 | Viewed by 4109
Abstract
This descriptor introduces a novel dataset, which contains the number and types of buildings in the city of Da Nang in Central Vietnam. The buildings were classified into nine distinct types and initially extracted from a satellite image of the year 2015. Secondly, [...] Read more.
This descriptor introduces a novel dataset, which contains the number and types of buildings in the city of Da Nang in Central Vietnam. The buildings were classified into nine distinct types and initially extracted from a satellite image of the year 2015. Secondly, changes were identified based on a visual interpretation of an image of the year 2017, so that new buildings, demolished buildings and building upgrades can be quantitatively analyzed. The data was aggregated by administrative wards and a hexagonal grid with a diameter of 250 m to protect personal rights and to avoid the misuse of a single building’s information. The dataset shows an increase of 19,391 buildings between October 2015 and August 2017, with a variety of interesting spatial patterns. The center of the city is mostly dominated by building changes and upgrades, while most of the new buildings were constructed within a distance of five to six kilometers from the city center. Full article
Show Figures

Figure 1

Figure 1
<p>Systematic workflow for the creation of the presented dataset.</p>
Full article ">Figure 2
<p>Study area, extent of the analysis, and field reference data collection.</p>
Full article ">Figure 3
<p>Radiometric differences resulting from angle and acquisition date/time of the two images used in this study.</p>
Full article ">Figure 4
<p>Example for Hòa Minh ward. Top: satellite image from 2017; bottom: result of the visual identification of changes, including the boundaries of administrative wards and hexagons.</p>
Full article ">Figure 5
<p>Absolute building changes between 2015 and 2017: new buildings (c2, left) and demolitions (c5, right).</p>
Full article ">Figure 6
<p>Changes per square kilometer between 2015 and 2017: new buildings (c2, left) and demolitions (c5, right).</p>
Full article ">Figure 7
<p>Relative increase in local (pc_t5, left) and modern (pc_t6, right) apartments between 2015 and 2017 per ward.</p>
Full article ">Figure 8
<p>Absolute change in bungalow-type (left) and villa-type (right) buildings between 2015 and 2017.</p>
Full article ">Figure 9
<p>Number of new (c2, left) and changed (c3, right) buildings between 2015 and 2017.</p>
Full article ">Figure 10
<p>Demolitions per square kilometer in the presented dataset (left) compared to findings of Warth et al. (2019).</p>
Full article ">Figure 11
<p>Changed and unchanged buildings for the districts in Da Nang.</p>
Full article ">Figure 12
<p>Share of new and demolished buildings with respect to the distance to the historic city center.</p>
Full article ">Figure A1
<p>Share of each ward covered by the area of interest (AOI, extent of the satellite images) of this study.</p>
Full article ">Figure A2
<p>Building density in 2017 aggregated by hexagons.</p>
Full article ">Figure A3
<p>Absolute number of new buildings compared to areas expected to be below the tideline in 2050 by Kulp &amp; Strauss (2019) [<a href="#B19-data-05-00042" class="html-bibr">19</a>].</p>
Full article ">
24 pages, 1156 KiB  
Article
A Multi-Factor Analysis of Forecasting Methods: A Study on the M4 Competition
by Pantelis Agathangelou, Demetris Trihinas and Ioannis Katakis
Data 2020, 5(2), 41; https://doi.org/10.3390/data5020041 - 22 Apr 2020
Cited by 2 | Viewed by 4079
Abstract
As forecasting becomes more and more appreciated in situations and activities of everyday life that involve prediction and risk assessment, more methods and solutions make their appearance in this exciting arena of uncertainty. However, less is known about what makes a promising or [...] Read more.
As forecasting becomes more and more appreciated in situations and activities of everyday life that involve prediction and risk assessment, more methods and solutions make their appearance in this exciting arena of uncertainty. However, less is known about what makes a promising or a poor forecast. In this article, we provide a multi-factor analysis on the forecasting methods that participated and stood out in the M4 competition, by focusing on Error (predictive performance), Correlation (among different methods), and Complexity (computational performance). The main goal of this study is to recognize the key elements of the contemporary forecasting methods, reveal what made them excel in the M4 competition, and eventually provide insights towards better understanding the forecasting task. Full article
Show Figures

Figure 1

Figure 1
<p>“Easiest”-four (lowest accumulated error for all methods) series for all categories. Numbers over the figures refer to the respective series. Negative values indicate underestimation whereas positive values overestimation.</p>
Full article ">Figure 2
<p>“Most difficult”-four (highest accumulated error for all methods) forecasting series for all categories. Numbers over the figures refer to the respective series. Negative values indicate underestimation, whereas positive values overestimation.</p>
Full article ">Figure 3
<p>Train (white background) and Test (grey background) data of the series of <a href="#data-05-00041-t004" class="html-table">Table 4</a> (the most difficult time series) for the majority of the methods. The Figure contains colored graphics.</p>
Full article ">Figure 4
<p>Train (white background) and Test (grey background) data of the series of <a href="#data-05-00041-t005" class="html-table">Table 5</a> (the most difficult time series) for the majority of the methods. The figure contains colored graphics.</p>
Full article ">Figure 5
<p>Train (white background) and Test (grey background) data of the series of <a href="#data-05-00041-t006" class="html-table">Table 6</a> (the most difficult time series) for the majority of the methods. The Figure contains colored graphics.</p>
Full article ">Figure 6
<p>Train (white background) and Test (grey background) data of the series of <a href="#data-05-00041-t007" class="html-table">Table 7</a> (the most difficult time series) for the majority of the methods. The figure contains colored graphics.</p>
Full article ">Figure 7
<p>Train (white background) and Test (grey background) data of the series of <a href="#data-05-00041-t008" class="html-table">Table 8</a> (the most difficult time series) for the majority of the methods. The figure contains colored graphics.</p>
Full article ">Figure 7 Cont.
<p>Train (white background) and Test (grey background) data of the series of <a href="#data-05-00041-t008" class="html-table">Table 8</a> (the most difficult time series) for the majority of the methods. The figure contains colored graphics.</p>
Full article ">Figure 8
<p>Train (white background) and Test (grey background) data of the series of <a href="#data-05-00041-t009" class="html-table">Table 9</a> (the most difficult time series) for the majority of the methods. The figure contains colored graphics.</p>
Full article ">Figure 8 Cont.
<p>Train (white background) and Test (grey background) data of the series of <a href="#data-05-00041-t009" class="html-table">Table 9</a> (the most difficult time series) for the majority of the methods. The figure contains colored graphics.</p>
Full article ">Figure 9
<p>Length distributions for the “Easiest”-twenty, for each method (Left Column), and “Most difficult”-Twenty for each method, (Middle Column), for the top-10 Methods. (Right Column) Depicts the overall Lengths Distribution per category.</p>
Full article ">Figure 10
<p>The average corelation matrix for all methods.</p>
Full article ">Figure 11
<p>Average step error percentage for the top-10 Methods and all categories. The <span class="html-italic">x</span>-axis represents the forecasting Horizon, and Negative Error indicates that the method forecast a value lower than the actual.</p>
Full article ">Figure 11 Cont.
<p>Average step error percentage for the top-10 Methods and all categories. The <span class="html-italic">x</span>-axis represents the forecasting Horizon, and Negative Error indicates that the method forecast a value lower than the actual.</p>
Full article ">
7 pages, 1338 KiB  
Data Descriptor
The Fluctuation of Process Gasses Especially of Carbon Monoxide during Aerobic Biostabilization of an Organic Fraction of Municipal Solid Waste under Different Technological Regimes
by Sylwia Stegenta-Dąbrowska, Jakub Rogosz, Przemysław Bukowski, Marcin Dębowski, Peter F. Randerson, Jerzy Bieniek and Andrzej Białowiec
Data 2020, 5(2), 40; https://doi.org/10.3390/data5020040 - 19 Apr 2020
Cited by 2 | Viewed by 2494
Abstract
Carbon monoxide (CO) is an air pollutant commonly formed during natural and anthropogenic processes involving incomplete combustion. Much less is known about biological CO production during the decomposition of the organic fraction (OF), especially originating from municipal solid waste (MSW), e.g., during the [...] Read more.
Carbon monoxide (CO) is an air pollutant commonly formed during natural and anthropogenic processes involving incomplete combustion. Much less is known about biological CO production during the decomposition of the organic fraction (OF), especially originating from municipal solid waste (MSW), e.g., during the aerobic biostabilization (AB) process. In this dataset, we summarized the temperature and the content of process gases (including rarely reported carbon monoxide, CO) generated inside full-scale AB of an organic fraction of municipal solid waste (OFMSW) reactor. The objective of the study was to present the data of the fluctuation of CO content as well as that of O2, CO2, and CH4 in process gas within the waste pile, during the AB of the OFMSW. The OFMSW was aerobically biostabilized in six reactors, in which the technological regimes of AB were dependent on process duration (42–69 days), waste mass (391.02–702.38 Mg), the intensity of waste aeration (4.4–10.7 m3·Mg−1·h−1), reactor design (membrane-covered reactor or membrane-covered reactor with sidewalls) and thermal conditions in the reactor (20.2–77.0 °C). The variations in the degree of waste aeration (O2 content), temperature, and fluctuation of CO, CO2, and CH4 content during the weekly measurement intervals were summarized. Despite a high O2 content in all reactors and stable thermal conditions, the presence of CO in process gas was observed, which suggests that ensuring optimum conditions for the process is not sufficient for CO emissions to be mitigated. In the analyzed experiment, CO concentration was highly variable over the duration of the process, ranging from a few to over 1,500 ppm. The highest concentration of CO was observed between the second and fifth weeks of the test. The reactor B2 was the source of the highest CO production and average highest temperature. This study suggests that the highest CO productions occur at the highest temperature, which is why the authors believe that CO production has thermochemical foundations. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

Figure 1
<p>The biostabilization of OFMSW (<b>a</b>) base of the reactors with aeration channels, (<b>b</b>) OFMSW placed in the reactor before covering, (<b>c</b>) waste covered with a semi-permeable membrane, (<b>d</b>) reactors during aeration.</p>
Full article ">Figure 2
<p>The scheme of experiment setup.</p>
Full article ">Figure 3
<p>The scheme of gas and temperature sampling points in the reactors. Sampling cross-sections (<b>A</b>) Position of gas and temperature sampling points (shallow: blue and deep: grey) (<b>B</b>).</p>
Full article ">
18 pages, 3398 KiB  
Article
An On-Demand Service for Managing and Analyzing Arctic Sea Ice High Spatial Resolution Imagery
by Dexuan Sha, Xin Miao, Mengchao Xu, Chaowei Yang, Hongjie Xie, Alberto M. Mestas-Nuñez, Yun Li, Qian Liu and Jingchao Yang
Data 2020, 5(2), 39; https://doi.org/10.3390/data5020039 - 17 Apr 2020
Cited by 3 | Viewed by 4223
Abstract
Sea ice acts as both an indicator and an amplifier of climate change. High spatial resolution (HSR) imagery is an important data source in Arctic sea ice research for extracting sea ice physical parameters, and calibrating/validating climate models. HSR images are difficult to [...] Read more.
Sea ice acts as both an indicator and an amplifier of climate change. High spatial resolution (HSR) imagery is an important data source in Arctic sea ice research for extracting sea ice physical parameters, and calibrating/validating climate models. HSR images are difficult to process and manage due to their large data volume, heterogeneous data sources, and complex spatiotemporal distributions. In this paper, an Arctic Cyberinfrastructure (ArcCI) module is developed that allows a reliable and efficient on-demand image batch processing on the web. For this module, available associated datasets are collected and presented through an open data portal. The ArcCI module offers an architecture based on cloud computing and big data components for HSR sea ice images, including functionalities of (1) data acquisition through File Transfer Protocol (FTP) transfer, front-end uploading, and physical transfer; (2) data storage based on Hadoop distributed file system and matured operational relational database; (3) distributed image processing including object-based image classification and parameter extraction of sea ice features; (4) 3D visualization of dynamic spatiotemporal distribution of extracted parameters with flexible statistical charts. Arctic researchers can search and find arctic sea ice HSR image and relevant metadata in the open data portal, obtain extracted ice parameters, and conduct visual analytics interactively. Users with large number of images can leverage the service to process their image in high performance manner on cloud, and manage, analyze results in one place. The ArcCI module will assist domain scientists on investigating polar sea ice, and can be easily transferred to other HSR image processing research projects. Full article
Show Figures

Figure 1

Figure 1
<p>Examples of Global Fiducials Library (GFL) sea ice and melt-pond evolution: images of Buoy 42597 taken on June 6 (<b>a</b>), June 24 (<b>b</b>), and July 1 (<b>c</b>) of 2010, and images of Buoy 586420 taken on August 30 (<b>d</b>) and September 1 (<b>e</b>) of 2010, with the geographic positions of the two buoys shown in (<b>f</b>).</p>
Full article ">Figure 2
<p>Concept model of ArcCI architecture.</p>
Full article ">Figure 3
<p>Unified Modeling Language (UML) diagram of the database scheme.</p>
Full article ">Figure 4
<p>Extract-Transform-Load (ETL) workflow for ArcCI.</p>
Full article ">Figure 5
<p>Jupyter notebook ecosystem for image analysis.</p>
Full article ">Figure 6
<p>Object-based Image Analysis (OBIA) workflow for sea ice classification.</p>
Full article ">Figure 7
<p>Screenshot of ArcHSR imagery open portal.</p>
Full article ">Figure 8
<p>Users’ views on functionalities.</p>
Full article ">Figure 9
<p>3D visualization module of extracted sea ice properties.</p>
Full article ">Figure 10
<p>Visual classification results for four-class schema.</p>
Full article ">
12 pages, 2789 KiB  
Data Descriptor
Bioinformatics Analysis Identifying Key Biomarkers in Bladder Cancer
by Chuan Zhang, Mandy Berndt-Paetz and Jochen Neuhaus
Data 2020, 5(2), 38; https://doi.org/10.3390/data5020038 - 16 Apr 2020
Cited by 5 | Viewed by 3913
Abstract
Our goal was to find new diagnostic and prognostic biomarkers in bladder cancer (BCa), and to predict molecular mechanisms and processes involved in BCa development and progression. Notably, the data collection is an inevitable step and time-consuming work. Furthermore, identification of the complementary [...] Read more.
Our goal was to find new diagnostic and prognostic biomarkers in bladder cancer (BCa), and to predict molecular mechanisms and processes involved in BCa development and progression. Notably, the data collection is an inevitable step and time-consuming work. Furthermore, identification of the complementary results and considerable literature retrieval were requested. Here, we provide detailed information of the used datasets, the study design, and on data mining. We analyzed differentially expressed genes (DEGs) in the different datasets and the most important hub genes were retrieved. We report on the meta-data information of the population, such as gender, race, tumor stage, and the expression levels of the hub genes. We include comprehensive information about the gene ontology (GO) enrichment analyses and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. We also retrieved information about the up- and down-regulation of genes. All in all, the presented datasets can be used to evaluate potential biomarkers and to predict the performance of different preclinical biomarkers in BCa. Full article
(This article belongs to the Special Issue Benchmarking Datasets in Bioinformatics)
Show Figures

Figure 1

Figure 1
<p>Study design and workflow of the analysis. Taken from Zhang et al. 2019 [<a href="#B11-data-05-00038" class="html-bibr">11</a>]. For detailed information on the analytical methods and the software packages please refer to the Materials and Methods section in the cited paper.</p>
Full article ">Figure 2
<p>Interaction network analysis. (<b>A</b>) The interactions and protein–protein networks of the top 30 hub genes. (<b>B</b>) Protein‒protein network and interaction among the 14 hub genes from STRING-db.org, accessed on 11 November 2019. Nodes with different colors represent different query proteins. A different color on the edge means a different interaction (see legend in figure). Adapted from Figure 4 of Zhang et al. 2020 [<a href="#B11-data-05-00038" class="html-bibr">11</a>].</p>
Full article ">Figure 3
<p>Oncomine meta-analysis of Hub genes in bladder cancer (BCa) vs. non-cancerous tissue. Left hand: median rank (median rank of the gene across each of the analyses); <span class="html-italic">p</span>-value (<span class="html-italic">p</span>-value for the median-ranked analysis); color of the boxes indicate the percentile of the z-transformed expression level of the gene in the particular study; right hand: <span class="html-italic">p</span> = (<span class="html-italic">p</span>-value reported in each of the studies); NA (not measured in the study); (1) Blaveri et al., Clin Cancer Res, 2005 [<a href="#B35-data-05-00038" class="html-bibr">35</a>], invasive cancer samples <span class="html-italic">n</span> = 51, normal bladder samples <span class="html-italic">n</span> = 3; (2) Dyrskjot et al., Cancer Res, 2004 [<a href="#B36-data-05-00038" class="html-bibr">36</a>], invasive cancer samples <span class="html-italic">n</span> = 13, normal bladder samples <span class="html-italic">n</span> = 14; (3) Lee et al., J Clin Oncol, 2010 [<a href="#B37-data-05-00038" class="html-bibr">37</a>], invasive cancer samples <span class="html-italic">n</span> = 62, normal bladder samples <span class="html-italic">n</span> = 10; (4) Modlich et al. Clin Cancer Res, 2004 [<a href="#B38-data-05-00038" class="html-bibr">38</a>], invasive cancer samples <span class="html-italic">n</span> = 20, normal bladder samples <span class="html-italic">n</span> = 4; and (5). Sanchez-Carbayoet al., J Clin Oncol, 2006 [<a href="#B39-data-05-00038" class="html-bibr">39</a>], invasive cancer samples <span class="html-italic">n</span> = 72, normal bladder samples <span class="html-italic">n</span> = 52.</p>
Full article ">Figure 4
<p>The expression body map of Hub genes. The median expressions of hub genes in tumors were marked in red and normal tissues were marked in green. The map is based on the GEPIA database (<a href="http://gepia.cancer-pku.cn" target="_blank">http://gepia.cancer-pku.cn</a>), transcript per million (TPM).</p>
Full article ">
19 pages, 6997 KiB  
Article
U-Net Segmented Adjacent Angle Detection (USAAD) for Automatic Analysis of Corneal Nerve Structures
by Philip Mehrgardt, Seid Miad Zandavi, Simon K. Poon, Juno Kim, Maria Markoulli and Matloob Khushi
Data 2020, 5(2), 37; https://doi.org/10.3390/data5020037 - 14 Apr 2020
Cited by 14 | Viewed by 5414
Abstract
Measurement of corneal nerve tortuosity is associated with dry eye disease, diabetic retinopathy, and a range of other conditions. However, clinicians measure tortuosity on very different grading scales that are inherently subjective. Using in vivo confocal microscopy, 253 images of corneal nerves were [...] Read more.
Measurement of corneal nerve tortuosity is associated with dry eye disease, diabetic retinopathy, and a range of other conditions. However, clinicians measure tortuosity on very different grading scales that are inherently subjective. Using in vivo confocal microscopy, 253 images of corneal nerves were captured and manually labelled by two researchers with tortuosity measurements ranging on a scale from 0.1 to 1.0. Tortuosity was estimated computationally by extracting a binarised nerve structure utilising a previously published method. A novel U-Net segmented adjacent angle detection (USAAD) method was developed by training a U-Net with a series of back feeding processed images and nerve structure vectorizations. Angles between all vectors and segments were measured and used for training and predicting tortuosity measured by human labelling. Despite the disagreement among clinicians on tortuosity labelling measures, the optimised grading measurement was significantly correlated with our USAAD angle measurements. We identified the nerve interval lengths that optimised the correlation of tortuosity estimates with human grading. We also show the merit of our proposed method with respect to other baseline methods that provide a single estimate of tortuosity. The real benefit of USAAD in future will be to provide comprehensive structural information about variations in nerve orientation for potential use as a clinical measure of the presence of disease and its progression. Full article
(This article belongs to the Special Issue Data-Driven Healthcare Tasks: Tools, Frameworks, and Techniques)
Show Figures

Figure 1

Figure 1
<p>Comparison of straight and curved corneal nerves. (<b>A</b>) Low tortuosity nerve. (<b>B</b>) High tortuosity nerves. (<b>C</b>) Examples of (1) washout, (2) buckling and (3) dendritic cells.</p>
Full article ">Figure 2
<p>Corneal nerve position in the human eye: Side view—the cornea is marked in red and with an arrow (<b>a</b>). Front view (<b>b</b>).</p>
Full article ">Figure 3
<p>Grading histogram of the 253 IVCM images.</p>
Full article ">Figure 4
<p>Cfibre tortuosity grading process.</p>
Full article ">Figure 5
<p>U-Net segmented adjacent angle detection flowchart.</p>
Full article ">Figure 6
<p>Manually traced nerves.</p>
Full article ">Figure 7
<p>Example of images backfed to training.</p>
Full article ">Figure 8
<p>Scheme of postprocessing sequence.</p>
Full article ">Figure 9
<p>The flowchart of segmentation and postprocessing.</p>
Full article ">Figure 10
<p>Nerve vectorization and splitting, segment end points marked in white and intersections in red.</p>
Full article ">Figure 11
<p>Nerve angle detection process.</p>
Full article ">Figure 12
<p>Subjective grading and standard deviation.</p>
Full article ">Figure 13
<p>USAAD mean angle correlation with human grading.</p>
Full article ">Figure 14
<p>USAAD maximum correlated nerve length.</p>
Full article ">Figure 15
<p>Subjective, Cfibre and USAAD grading confusion matrices.</p>
Full article ">
13 pages, 7943 KiB  
Data Descriptor
METER.AC: Live Open Access Atmospheric Monitoring Data for Bulgaria with High Spatiotemporal Resolution
by Atanas Terziyski, Stoyan Tenev, Vedrin Jeliazkov, Nina Jeliazkova and Nikolay Kochev
Data 2020, 5(2), 36; https://doi.org/10.3390/data5020036 - 8 Apr 2020
Cited by 9 | Viewed by 5261
Abstract
Detailed atmospheric monitoring data are notoriously difficult to obtain for some geographic regions, while they are of paramount importance in scientific research, forecasting, emergency response, policy making, etc. We describe a continuously updated dataset, METER.AC, consisting of raw measurements of atmospheric pressure, temperature, [...] Read more.
Detailed atmospheric monitoring data are notoriously difficult to obtain for some geographic regions, while they are of paramount importance in scientific research, forecasting, emergency response, policy making, etc. We describe a continuously updated dataset, METER.AC, consisting of raw measurements of atmospheric pressure, temperature, relative humidity, particulate matter, and background radiation in about 100 locations in Bulgaria, as well as some derived values such as sea-level atmospheric pressure, dew/frost point, and hourly trends. The measurements are performed by low-power maintenance-free nodes with common hardware and software, which are specifically designed and optimized for this purpose. The time resolution of the measurements is 5 min. The short-term aim is to deploy at least one node per 100 km2, while uniformly covering altitudes between 0 and 3000 m asl with a special emphasis on remote mountainous areas. A full history of all raw measurements (non-aggregated in time and space) is publicly available, starting from September 2018. We describe the basic technical characteristics of our in-house developed equipment, data organization, and communication protocols as well as present some use case examples. The METER.AC network relies on the paradigm of the Internet of Things (IoT), by collecting data from various gauges. A guiding principle in this work is the provision of findable, accessible, interoperable, and reusable (FAIR) data. The dataset is in the public domain, and it provides resources and tools enabling citizen science development in the context of sustainable development. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

Figure 1
<p>Network outlook: (<b>a</b>) Each device consists of an input (1) and output (2) valve, cable fitting (3) and assembly part (4); (<b>b</b>) current coverage map of Bulgaria and Greece.</p>
Full article ">Figure 2
<p>Data logger (<b>a</b>) for the meteostation (<b>b</b>) at the village of Sarnegor, Bulgaria.</p>
Full article ">Figure 3
<p>Indoor radon concentration gauge box. On the left hand side is an RD200M sensor. The electronics and additional sensors are placed in the perforated box (right side).</p>
Full article ">Figure 4
<p>A high temporal resolution gamma background monitoring device with 4 SI22-G tubes.</p>
Full article ">Figure 5
<p>METER.AC data workflow based on a multilayer infrastructure scheme.</p>
Full article ">Figure 6
<p>Screenshot of the METER.AC site, taken in March 2020.</p>
Full article ">Figure 7
<p>Fine particulate matter concentration history for the area of Plovdiv, Bulgaria.</p>
Full article ">Figure 8
<p>METER.AC screenshots with gauges for current parameter values for a particular node (left) and sky snapshot summary (right).</p>
Full article ">Figure 9
<p>Several map layers of nebe.to present METER.AC data.</p>
Full article ">Figure 10
<p>METER.AC access statistics for a period of two months.</p>
Full article ">
13 pages, 2714 KiB  
Article
Data Quality as a Critical Success Factor for User Acceptance of Research Information Systems
by Otmane Azeroual, Gunter Saake, Mohammad Abuosba and Joachim Schöpfel
Data 2020, 5(2), 35; https://doi.org/10.3390/data5020035 - 6 Apr 2020
Cited by 14 | Viewed by 4772
Abstract
In our present paper, the influence of data quality on the success of the user acceptance of research information systems (RIS) is investigated and determined. Until today, only a little research has been done on this topic and no studies have been carried [...] Read more.
In our present paper, the influence of data quality on the success of the user acceptance of research information systems (RIS) is investigated and determined. Until today, only a little research has been done on this topic and no studies have been carried out. So far, just the importance of data quality in RIS, the investigation of its dimensions and techniques for measuring, improving, and increasing data quality in RIS (such as data profiling, data cleansing, data wrangling, and text data mining) has been focused. With this work, we try to derive an answer to the question of the impact of data quality on the success of RIS user acceptance. An acceptance of RIS users is achieved when the research institutions decide to replace the RIS and replace it with a new one. The result is a statement about the extent to which data quality influences the success of users’ acceptance of RIS. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

Figure 1
<p>Technology acceptance model in the context of research information systems (RIS) [<a href="#B6-data-05-00035" class="html-bibr">6</a>].</p>
Full article ">Figure 2
<p>Asked questions about RIS and its use at German universities and research institutions.</p>
Full article ">Figure 3
<p>Results of the acceptance of 51 RIS users.</p>
Full article ">Figure 4
<p>Correlation analysis “Scatterplot”.</p>
Full article ">Figure 5
<p>Structural equation model (SEM) for the dependency between data quality and user acceptance of RIS.</p>
Full article ">Figure 6
<p>Coefficient of determination (R<sup>2</sup>) by regression analysis.</p>
Full article ">
5 pages, 374 KiB  
Data Descriptor
Player Heart Rate Responses and Pony External Load Measures during 16-Goal Polo
by Russ Best
Data 2020, 5(2), 34; https://doi.org/10.3390/data5020034 - 2 Apr 2020
Cited by 5 | Viewed by 3262
Abstract
This dataset provides information pertaining to the spatiotemporal stresses experienced by Polo ponies in play and the cardiovascular responses to these demands by Polo players, during 16-goal Polo. Data were collected by player-worn GPS units and paired heart rate monitors, across a New [...] Read more.
This dataset provides information pertaining to the spatiotemporal stresses experienced by Polo ponies in play and the cardiovascular responses to these demands by Polo players, during 16-goal Polo. Data were collected by player-worn GPS units and paired heart rate monitors, across a New Zealand Polo season. The dataset comprises observations from 160 chukkas of Open Polo, and is presented as per chukka per game (curated) and in per effort per player (raw) formats. Data for distance, speed, and high intensity metrics are presented and are further categorised into five equine-based speed zones, in accordance with previous literature. The purpose of this dataset is to provide a detailed quantification of the load experienced by Polo players and their ponies at the highest domestic performance level in New Zealand, as well as advancing the scope of previous Polo literature that has employed GPS or heart rate monitoring technologies. This dataset may be of interest to equine scientists and trainers, veterinary practitioners, and sports scientists. An exemplar template is provided to facilitate the adoption of this data collection approach by other practitioners. Full article
(This article belongs to the Special Issue Data from Smartphones and Wearables)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) belt mounted GPS unit in close-up; (<b>b</b>) belt mounted GPS unit on a Polo player, during match-play.</p>
Full article ">
24 pages, 4752 KiB  
Article
Multiple Regression Analysis and Frequent Itemset Mining of Electronic Medical Records: A Visual Analytics Approach Using VISA_M3R3
by Sheikh S. Abdullah, Neda Rostamzadeh, Kamran Sedig, Amit X. Garg and Eric McArthur
Data 2020, 5(2), 33; https://doi.org/10.3390/data5020033 - 29 Mar 2020
Cited by 11 | Viewed by 5304
Abstract
Medication-induced acute kidney injury (AKI) is a well-known problem in clinical medicine. This paper reports the first development of a visual analytics (VA) system that examines how different medications associate with AKI. In this paper, we introduce and describe VISA_M3R3, a VA system [...] Read more.
Medication-induced acute kidney injury (AKI) is a well-known problem in clinical medicine. This paper reports the first development of a visual analytics (VA) system that examines how different medications associate with AKI. In this paper, we introduce and describe VISA_M3R3, a VA system designed to assist healthcare researchers in identifying medications and medication combinations that associate with a higher risk of AKI using electronic medical records (EMRs). By integrating multiple regression models, frequent itemset mining, data visualization, and human-data interaction mechanisms, VISA_M3R3 allows users to explore complex relationships between medications and AKI in such a way that would be difficult or sometimes even impossible without the help of a VA system. Through an analysis of 595 medications using VISA_M3R3, we have identified 55 AKI-inducing medications, 24,212 frequent medication groups, and 78 medication groups that are associated with AKI. The purpose of this paper is to demonstrate the usefulness of VISA_M3R3 in the investigation of medication-induced AKI in particular and other clinical problems in general. Furthermore, this research highlights what needs to be considered in the future when designing VA systems that are intended to support gaining novel and deep insights into massive existing EMRs. Full article
(This article belongs to the Special Issue Data Quality and Data Access for Research)
Show Figures

Figure 1

Figure 1
<p>Workflow diagram of VISA_M3R3. Different colors are used to show the separation of the three main modules.</p>
Full article ">Figure 2
<p>The Visualization module of VISA_M3R3 is composed of five views: (<b>A</b>) single-medication view, (<b>B</b>) multiple-medications view, (<b>C</b>) covariates view, (<b>D</b>) medication-hierarchy view, and (<b>E</b>) frequent-itemsets view.</p>
Full article ">Figure 3
<p>Scatter plot of single-medication view.</p>
Full article ">Figure 4
<p>Scatterplot of multiple-medications view.</p>
Full article ">Figure 5
<p>Chord diagram showing the results of the frequent itemset mining analysis in the frequent-itemsets view.</p>
Full article ">Figure 6
<p>Six sliders representing different covariates in the covariates view.</p>
Full article ">Figure 7
<p>The medication-hierarchy view shows the list of medications and their classes and subclasses.</p>
Full article ">Figure 8
<p>Overview of interactions in the single-medication view.</p>
Full article ">Figure 9
<p>Overview of interactions in the multiple-medications view.</p>
Full article ">Figure 10
<p>Overview of interactions in the covariates view.</p>
Full article ">Figure 11
<p>Overview of interactions in the frequent-itemsets view.</p>
Full article ">Figure 12
<p>Overview of interactions in the medication-hierarchy view and selection controls.</p>
Full article ">
12 pages, 2184 KiB  
Data Descriptor
Data-Sets for Indoor Photovoltaic Behavior in Low Lighting Conditions
by Mojtaba Masoudinejad
Data 2020, 5(2), 32; https://doi.org/10.3390/data5020032 - 28 Mar 2020
Cited by 5 | Viewed by 3373
Abstract
Analysis of voltage–current behavior of photovoltaic modules is a critical part of their modeling. Parameter identification of these models demands data from them, measured in realistic environments. In spite of advancement in modeling methodologies under solar lighting, few analyses have been focused on [...] Read more.
Analysis of voltage–current behavior of photovoltaic modules is a critical part of their modeling. Parameter identification of these models demands data from them, measured in realistic environments. In spite of advancement in modeling methodologies under solar lighting, few analyses have been focused on indoor photovoltaics. Lack of accurate and reproducible data as a major challenge in this field is addressed here. A high accuracy measurement setup for evaluation and analysis of indoor photovoltaic modules is explained. By use of this system, different modules are measured under diverse environmental conditions. These measurements are structured in data-sets that can be used for either analysis of physical environment effects and modeling or development of specific parameter identification methods in low light intensity conditions. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

Figure 1
<p>The overall measurement platform with the integration sphere on top.</p>
Full article ">Figure 2
<p>Developed board for measurement of data, including all sensors, light sampling integration sphere and a PV module.</p>
Full article ">Figure 3
<p>A general form of voltage–current curve of a PV module and its MPP.</p>
Full article ">Figure 4
<p>Schematic structure of the light measurement system.</p>
Full article ">Figure 5
<p>Distribution of light intensity and temperature of measurements.</p>
Full article ">Figure 6
<p>Example of radiometry spectrum from warm white LED light in experiment “Solems_W_2993_2401”.</p>
Full article ">Figure 7
<p>Example of the photometry spectrum from warm white LED in experiment “Solems_W_2993_2401”.</p>
Full article ">Figure 8
<p>Example of a voltage–current curve from Solems module under warm white LED light in experiment “Solems_W_2993_2401”.</p>
Full article ">
17 pages, 1355 KiB  
Data Descriptor
Monthly Entomological Inoculation Rate Data for Studying the Seasonality of Malaria Transmission in Africa
by Edmund I. Yamba, Adrian M. Tompkins, Andreas H. Fink, Volker Ermert, Mbouna D. Amelie, Leonard K. Amekudzi and Olivier J. T. Briët
Data 2020, 5(2), 31; https://doi.org/10.3390/data5020031 - 27 Mar 2020
Cited by 4 | Viewed by 5354
Abstract
A comprehensive literature review was conducted to create a new database of 197 field surveys of monthly malaria Entomological Inoculation Rates (EIR), a metric of malaria transmission intensity. All field studies provide data at a monthly temporal resolution and have a duration of [...] Read more.
A comprehensive literature review was conducted to create a new database of 197 field surveys of monthly malaria Entomological Inoculation Rates (EIR), a metric of malaria transmission intensity. All field studies provide data at a monthly temporal resolution and have a duration of at least one year in order to study the seasonality of the disease. For inclusion, data collection methodologies adhered to a specific standard and the location and timing of the measurements were documented. Auxiliary information on the population and hydrological setting were also included. The database includes measurements that cover West and Central Africa and the period from 1945 to 2011, and hence facilitates analysis of interannual transmission variability over broad regions. Full article
Show Figures

Figure 1

Figure 1
<p>The geographical locations of extracted EIR<sub>m</sub> in Sub-Saharan Africa. The colored circles show the maximum EIR<sub>m</sub> value recorded for the location. The proximity of many of the locations are very close and hence cannot be explicitly resolved on the map. The blue lines show the boundaries of the sub-regions (Sahel, Guinea, WA and EA).</p>
Full article ">Figure 2
<p>Comparison of monthly EIR and rainfall (ARC<math display="inline"><semantics> <msub> <mrow/> <mn>2</mn> </msub> </semantics></math>) and temperature (ERA Interim adjusted to location height for each field survey location). Black line: average EIR; Blue line: average rainfall (RR). Average temperature: minimum (Tn, brown) and maximum (Tx, red).</p>
Full article ">Figure 3
<p>The sympatric association and geographical distribution of dominant malaria vectors in Sub-Saharan Africa.</p>
Full article ">
10 pages, 1342 KiB  
Article
Influence of Information Quality via Implemented German RCD Standard in Research Information Systems
by Otmane Azeroual, Joachim Schöpfel and Dragan Ivanovic
Data 2020, 5(2), 30; https://doi.org/10.3390/data5020030 - 27 Mar 2020
Cited by 2 | Viewed by 3514
Abstract
With the steady increase in the number of data sources to be stored and processed by higher education and research institutions, it has become necessary to develop Research Information Systems, which will store this research information in the long term and make it [...] Read more.
With the steady increase in the number of data sources to be stored and processed by higher education and research institutions, it has become necessary to develop Research Information Systems, which will store this research information in the long term and make it accessible for further use, such as reporting and evaluation processes, institutional decision making and the presentation of research performance. In order to retain control while integrating research information from heterogeneous internal and external data sources and disparate interfaces into RIS and to maximize the benefits of the research information, ensuring data quality in RIS is critical. To facilitate a common understanding of the research information collected and to harmonize data collection processes, various standardization initiatives have emerged in recent decades. These standards support the use of research information in RIS and enable compatibility and interoperability between different information systems. This paper examines the process of securing data quality in RIS and the impact of research information standards on data quality in RIS. We focus on the recently developed German Research Core Dataset standard as a case of application. Full article
(This article belongs to the Special Issue Data Quality and Data Access for Research)
Show Figures

Figure 1

Figure 1
<p>Iterative workflow for ensuring research information quality.</p>
Full article ">Figure 2
<p>Practicing RCD Standard in RIS.</p>
Full article ">Figure 3
<p>Solution to the problems of schema and data conflicts.</p>
Full article ">
14 pages, 2267 KiB  
Article
Research Data Sharing in Spain: Exploring Determinants, Practices, and Perceptions
by Rafael Aleixandre-Benavent, Antonio Vidal-Infer, Adolfo Alonso-Arroyo, Fernanda Peset and Antonia Ferrer Sapena
Data 2020, 5(2), 29; https://doi.org/10.3390/data5020029 - 27 Mar 2020
Cited by 16 | Viewed by 4923
Abstract
This work provides an overview of a Spanish survey on research data, which was carried out within the framework of the project Datasea at the beginning of 2015. It is covered by the objectives of sustainable development (goal 9) to support the research. [...] Read more.
This work provides an overview of a Spanish survey on research data, which was carried out within the framework of the project Datasea at the beginning of 2015. It is covered by the objectives of sustainable development (goal 9) to support the research. The purpose of the study was to identify the habits and current experiences of Spanish researchers in the health sciences in relation to the management and sharing of raw research data. Method: An electronic questionnaire composed of 40 questions divided into three blocks was designed. The three Section s contained questions on the following aspects: (A) personal information; (B) creation and reuse of data; and (C) preservation of data. The questionnaire was sent by email to a list of universities in Spain to be distributed among their researchers and professors. A total of 1063 researchers completed the questionnaire. More than half of the respondents (54.9%) lacked a data management plan; nearly a quarter had storage systems for the research group; 81.5% used personal computers to store data; “Contact with colleagues” was the most frequent means used to locate and access other researchers’ data; and nearly 60% of researchers stated their data were available to the research group and collaborating colleagues. The main fears about sharing were legal questions (47.9%), misuse or interpretation of data (42.7%), and loss of authorship (28.7%). The results allow us to understand the state of data sharing among Spanish researchers and can serve as a basis to identify the needs of researchers to share data, optimize existing infrastructure, and promote data sharing among those who do not practice it yet. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

Figure 1
<p>Existence of data management policies.</p>
Full article ">Figure 2
<p>Reasons for developing the data policy.</p>
Full article ">Figure 3
<p>Devices usually used for data storage.</p>
Full article ">Figure 4
<p>Sources used to locate and access other researchers’ data.</p>
Full article ">Figure 5
<p>Fear of sharing research data.</p>
Full article ">Figure 6
<p>Key threats to data sharing.</p>
Full article ">Figure 7
<p>Reasons for preservation of data.</p>
Full article ">
7 pages, 536 KiB  
Article
The Emergency Medicine Facing the Challenge of Open Science
by Andrea Sixto-Costoya, Rafael Aleixandre-Benavent, Rut Lucas-Domínguez and Antonio Vidal-Infer
Data 2020, 5(2), 28; https://doi.org/10.3390/data5020028 - 25 Mar 2020
Cited by 9 | Viewed by 3379
Abstract
(1) Background: The availability of research datasets can strengthen and facilitate research processes. This is specifically relevant in the emergency medicine field due to the importance of providing immediate care in critical situations as the very current Coronavirus (COVID-19) Pandemic is showing to [...] Read more.
(1) Background: The availability of research datasets can strengthen and facilitate research processes. This is specifically relevant in the emergency medicine field due to the importance of providing immediate care in critical situations as the very current Coronavirus (COVID-19) Pandemic is showing to the scientific community. This work aims to show which Emergency Medicine journals indexed in Journal Citation Reports (JCR) currently meet data sharing criteria. (2) Methods: This study analyzes the editorial policies regarding the data deposit of the journals in the emergency medicine category of the JCR and evaluates the Supplementary material of the articles published in these journals that have been deposited in the PubMed Central repository. (3) Results: It has been observed that 19 out of the 24 journals contained in the emergency medicine category of Journal Citation Reports are also located in PubMed Central (PMC), yielding a total of 5983 articles. Out of these, only 9.4% of the articles contain supplemental material. Although second quartile journals of JCR emergency medicine category have quantitatively more articles in PMC, the main journals involved in the deposit of supplemental material belong to the first quartile, of which the most used format in the articles is pdf, followed by text documents. (4) Conclusion: This study reveals that data sharing remains an incipient practice in the emergency medicine field, as there are still barriers between researchers to participate in data sharing. Therefore, it is necessary to promote dynamics to improve this practice both qualitatively (the quality and format of datasets) and quantitatively (the quantity of datasets in absolute terms) in research. Full article
(This article belongs to the Special Issue Data Reuse for Sustainable Development Goals)
Show Figures

Figure 1

Figure 1
<p>Types of journals according to Open Access (OA) criteria, separated by quartile of the Journal Citation Reports (JCR) emergency medicine category.</p>
Full article ">Figure 2
<p>File type of the SM of the emergency medicine journals indexed in the PMC repository (N=13) according to their JCR quartile.</p>
Full article ">
Previous Issue
Next Issue
Back to TopTop