[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Issue
Volume 9, March
Previous Issue
Volume 9, January
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Data, Volume 9, Issue 2 (February 2024) – 21 articles

Cover Story (view full-size image): Organ-on-a-chip (OOC) technology has emerged as a groundbreaking approach for emulating the physiological environment, revolutionizing biomedical research, drug development, and personalized medicine. Since bright-field microscopy is the most common approach for daily monitoring, we have compiled a dataset of 3072 OOC brightfield microscopy pictures with various cell lines, each evaluated by an expert in cell biology. We demonstrated how the OOC technology can be automated by using automatic brightfield microscopy and how this dataset can be used for training machine learning models. Data generated from OOC setup can provide more reliable tissue models, automated decision-making processes within the OOC framework and efficient research in future. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
19 pages, 958 KiB  
Article
Multimodal Hinglish Tweet Dataset for Deep Pragmatic Analysis
by Pratibha, Amandeep Kaur, Meenu Khurana and Robertas Damaševičius
Data 2024, 9(2), 38; https://doi.org/10.3390/data9020038 - 15 Feb 2024
Cited by 7 | Viewed by 2880
Abstract
Wars, conflicts, and peace efforts have become inherent characteristics of regions, and understanding the prevailing sentiments related to these issues is crucial for finding long-lasting solutions. Twitter/‘X’, with its vast user base and real-time nature, provides a valuable source to assess the raw [...] Read more.
Wars, conflicts, and peace efforts have become inherent characteristics of regions, and understanding the prevailing sentiments related to these issues is crucial for finding long-lasting solutions. Twitter/‘X’, with its vast user base and real-time nature, provides a valuable source to assess the raw emotions and opinions of people regarding war, conflict, and peace. This paper focuses on collecting and curating hinglish tweets specifically related to wars, conflicts, and associated taxonomy. The creation of said dataset addresses the existing gap in contemporary literature, which lacks comprehensive datasets capturing the emotions and sentiments expressed by individuals regarding wars, conflicts, and peace efforts. This dataset holds significant value and application in deep pragmatic analysis as it enables future researchers to identify the flow of sentiments, analyze the information architecture surrounding war, conflict, and peace effects, and delve into the associated psychology in this context. To ensure the dataset’s quality and relevance, a meticulous selection process was employed, resulting in the inclusion of explanable 500 carefully chosen search filters. The dataset currently has 10,040 tweets that have been validated with the help of human expert to make sure they are correct and accurate. Full article
(This article belongs to the Special Issue Sentiment Analysis in Social Media Data)
Show Figures

Figure 1

Figure 1
<p>Research Workflow.</p>
Full article ">Figure 2
<p>Topic Inferring [<a href="#B46-data-09-00038" class="html-bibr">46</a>,<a href="#B47-data-09-00038" class="html-bibr">47</a>].</p>
Full article ">Figure 3
<p>Percentage of Tokens by Cluster.</p>
Full article ">Figure 4
<p>Analysis of Inter-rater agreement.</p>
Full article ">
14 pages, 47258 KiB  
Data Descriptor
Digital Elevation Models and Orthomosaics of the Dutch Noordwest Natuurkern Foredune Restoration Project
by Gerben Ruessink, Dick Groenendijk and Bas Arens
Data 2024, 9(2), 37; https://doi.org/10.3390/data9020037 - 15 Feb 2024
Cited by 1 | Viewed by 2035
Abstract
Coastal dunes worldwide are increasingly under pressure from the adverse effects of human activities. Therefore, more and more restoration measures are being taken to create conditions that help disturbed coastal dune ecosystems regenerate or recover naturally. However, many projects lack the (open-access) monitoring [...] Read more.
Coastal dunes worldwide are increasingly under pressure from the adverse effects of human activities. Therefore, more and more restoration measures are being taken to create conditions that help disturbed coastal dune ecosystems regenerate or recover naturally. However, many projects lack the (open-access) monitoring observations needed to signal whether further actions are needed, and hence lack the opportunity to “learn by doing”. This submission presents an open-access data set of 37 high-resolution digital elevation models and 24 orthomosaics collected before and after the excavation of five artificial foredune trough blowouts (“notches”) in winter 2012/2013 in the Dutch Zuid-Kennemerland National Park, one of the largest coastal dune restoration projects in northwest Europe. These high-resolution data provide a valuable resource for improving understanding of the biogeomorphic processes that determine the evolution of restored dune systems as well as developing guidelines to better design future restoration efforts with foredune notching. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Location of the study site along the west coast of the Netherlands. The white rectangle indicates the approximate region in which the foredune and landward dunes were reactivated as part of the Noordwest Natuurkern project, visible as the bare sand areas landward of the beach. The area containing the five foredune notches, which is the focus of this paper, is outlined by the red rectangle and is shown in more detail in (<b>b</b>). The five notches are labelled N1 to N5 from south to north. Three reactivated parabolic dunes (P1–P3) are also indicated. (<b>c</b>) Photo of part of the notched foredune, looking east–northeast. Panels (<b>a</b>,<b>b</b>) contain the pansharpened Formosat-2 RGB satellite image (2 m resolution) of 27 May 2013, shortly after the excavation of the notches was completed. The photo in (<b>c</b>) was taken with a drone in summer 2021. The white open triangle in (<b>a</b>) is the approximate view angle of the photo in (<b>c</b>). The wind rose in (<b>a</b>) contains four wind speed classes: dark blue, ≤5 m/s; green, &gt;5–10 m/s; orange, &gt;10–15 m/s; red, &gt;15 m/s.</p>
Full article ">Figure 2
<p>(<b>a</b>) Digital elevation model of the 2013-05-01 survey, shown as a colour relief map and (<b>b</b>) DEM of Difference for 2013-05-01–2023-02-08. Positive elevation change represents deposition. The background is the colour relief map of (<b>a</b>) with an opacity of 60%. Elevation change ±1 m is completely transparent. (<b>c</b>) Time series of deposition volume <math display="inline"><semantics> <msub> <mi>V</mi> <mi mathvariant="normal">d</mi> </msub> </semantics></math> (red dots), erosion volume <math display="inline"><semantics> <msub> <mi>V</mi> <mi mathvariant="normal">e</mi> </msub> </semantics></math> (blue dots), and net volume change <math display="inline"><semantics> <msub> <mi>V</mi> <mi mathvariant="normal">n</mi> </msub> </semantics></math> (magenta dots) for the region bounded by the polygon (white line) in (<b>b</b>), relative to the 2013-05-01 survey. The dashed lines are the best-fit linear lines. Their statistics are given in the text. The tick marks on the horizontal axis in (<b>c</b>) are, as in all subsequent figures with a time axis, at the start of the year.</p>
Full article ">Figure 3
<p>Orthomosaics of the (<b>a</b>) 2017-03-15 and (<b>b</b>) 2023-09-25 surveys. The red dots are the locations of the ground control point (GCP) targets used in the Structure-from-Motion workflow, described in <a href="#sec3dot1-data-09-00037" class="html-sec">Section 3.1</a>. Panels (<b>a</b>,<b>b</b>) contain 37 and 10 GCPs, respectively.</p>
Full article ">Figure 4
<p>Time series of meteorological data for the NWNKern region 2013–2023: (<b>a</b>) wind speed at IJmuiden WMO 06225; grey lines are 10-min average values, dark line is 90-day moving average, and red dots are storm events for which at least 6 consecutive 10-min wind speeds exceed 20.7 m/s (i.e., event duration is at least 1 h). The dots are plotted at the peak of the storm and scale with peak wind speed. (<b>b</b>) Rainfall at Wijk aan Zee WMO 06257; grey lines are daily amounts, and dark line is 90-day moving average. The shown time period is the same as in <a href="#data-09-00037-f002" class="html-fig">Figure 2</a>c.</p>
Full article ">Figure 5
<p>(<b>a</b>) DEM of Difference for 2023-02-08 (ALS)–2023-02-13 (UAVSfM) and (<b>b</b>) orthomosaic of the 2023-09-25 survey. As explained in the text, the positive elevation differences in (<b>a</b>) can here be interpreted as the height of the canopy.</p>
Full article ">
16 pages, 7418 KiB  
Data Descriptor
AriAplBud: An Aerial Multi-Growth Stage Apple Flower Bud Dataset for Agricultural Object Detection Benchmarking
by Wenan Yuan
Data 2024, 9(2), 36; https://doi.org/10.3390/data9020036 - 11 Feb 2024
Cited by 1 | Viewed by 2384
Abstract
As one of the most important topics in contemporary computer vision research, object detection has received wide attention from the precision agriculture community for diverse applications. While state-of-the-art object detection frameworks are usually evaluated against large-scale public datasets containing mostly non-agricultural objects, a [...] Read more.
As one of the most important topics in contemporary computer vision research, object detection has received wide attention from the precision agriculture community for diverse applications. While state-of-the-art object detection frameworks are usually evaluated against large-scale public datasets containing mostly non-agricultural objects, a specialized dataset that reflects unique properties of plants would aid researchers in investigating the utility of newly developed object detectors within agricultural contexts. This article presents AriAplBud: a close-up apple flower bud image dataset created using an unmanned aerial vehicle (UAV)-based red–green–blue (RGB) camera. AriAplBud contains 3600 images of apple flower buds at six growth stages, with 110,467 manual bounding box annotations as positive samples and 2520 additional empty orchard images containing no apple flower bud as negative samples. AriAplBud can be directly deployed for developing object detection models that accept Darknet annotation format without additional preprocessing steps, serving as a potential benchmark for future agricultural object detection research. A demonstration of developing YOLOv8-based apple flower bud detectors is also presented in this article. Full article
Show Figures

Figure 1

Figure 1
<p>Randomly selected AriAplBud sample images from different data collection dates.</p>
Full article ">Figure 2
<p>AriAplBud data collection apple orchard at the Russell E. Larson Agricultural Research Center.</p>
Full article ">Figure 3
<p>Apple flower bud annotations of the image “041920_0041.jpg” from AriAplBud.</p>
Full article ">Figure 4
<p>Apple flower bud detection result comparison between YOLOv8n and YOLOv8x using sample images at different growth stages from the AriAplBud test dataset.</p>
Full article ">Figure 5
<p>Annotation number comparison between all apple flower bud growth stages in AriAplBud.</p>
Full article ">Figure 6
<p>Bar chart showing annotation per image distribution of AriAplBud positive samples.</p>
Full article ">Figure 7
<p>Two sample images from AriAplBud, “050220_0450.jpg” and “050720_0085.jpg”, containing 1 and 112 annotations, respectively.</p>
Full article ">Figure 8
<p>Box plot of AriAplBud annotation size in terms of normalized bounding box area.</p>
Full article ">Figure 9
<p>Two sample images from AriAplBud, “050220_0435.jpg” and “050720_0128.jpg”, containing the smallest and largest bounding box annotations, respectively.</p>
Full article ">Figure 10
<p>Regions of three sample images from AriAplBud, “052120_0222.jpg”, “042320_0068.jpg”, and “051320_0017.jpg”, where an unintentional annotation inside a petal fall annotation, an unannotated tight cluster, and a falsely annotated bloom as petal fall can be observed.</p>
Full article ">Figure 11
<p>Regions of three sample images from AriAplBud, “050220_0359.jpg”, “050220_0440.jpg”, and “050220_0406.jpg”, with different pink annotation styles.</p>
Full article ">Figure 12
<p>Regions of two sample images from AriAplBud, “051620_0045.jpg” and “051620_0043.jpg”, with different petal fall annotation styles.</p>
Full article ">Figure 13
<p>Two sample images from AriAplBud, “042320_0076.jpg” and “042320_0077.jpg”, with very similar visual appearances.</p>
Full article ">Figure 14
<p>Two distinct sample images from AriAplBud, “041920_0040.jpg” and “092520_0220.jpg”, captured with a five-month interval.</p>
Full article ">Figure 15
<p>Two sample images from AriAplBud, “041920_0051.jpg” and “042320_0008.jpg”, containing differently sized tight clusters due to UAV flight altitude variations.</p>
Full article ">Figure 16
<p>Two blurred sample images from AriAplBud, “042320_0025.jpg” and “042320_0067.jpg”.</p>
Full article ">Figure 17
<p>Regions of three sample images from AriAplBud, “041920_0055.jpg”, “041920_0048.jpg”, and “041920_0019.jpg”, containing a wooden post, metal wires, and interlocking chain ties.</p>
Full article ">
9 pages, 481 KiB  
Data Descriptor
COVID-19 Lockdown Effects on Sleep, Immune Fitness, Mood, Quality of Life, and Academic Functioning: Survey Data from Turkish University Students
by Pauline A. Hendriksen, Sema Tan, Evi C. van Oostrom, Agnese Merlo, Hilal Bardakçi, Nilay Aksoy, Johan Garssen, Gillian Bruce and Joris C. Verster
Data 2024, 9(2), 35; https://doi.org/10.3390/data9020035 - 10 Feb 2024
Cited by 1 | Viewed by 2828
Abstract
Previous studies from the Netherlands, Germany, and Argentina revealed that the 2019 coronavirus disease (COVID-19) pandemic and associated lockdown periods had a significant negative impact on the wellbeing and quality of life of students. The negative impact of lockdown periods on health correlates [...] Read more.
Previous studies from the Netherlands, Germany, and Argentina revealed that the 2019 coronavirus disease (COVID-19) pandemic and associated lockdown periods had a significant negative impact on the wellbeing and quality of life of students. The negative impact of lockdown periods on health correlates such as immune fitness, alcohol consumption, and mood were reflected in their academic functioning. As both the duration and intensity of lockdown measures differed between countries, it is important to replicate these findings in different countries and cultures. Therefore, the purpose of the current study was to examine the impact of the COVID-19 pandemic on immune fitness, mood, academic functioning, sleep, smoking, alcohol consumption, healthy diet, and quality of life among Turkish students. Turkish students in the age range of 18 to 30 years old were invited to complete an online survey. Data were collected from n = 307 participants and included retrospective assessments for six time periods: (1) BP (before the COVID-19 pandemic, 1 January 2020–10 March 2020), (2) NL1 (the first no lockdown period, 11 March 2020–28 April 2021), (3) the lockdown period (29 April 2021–17 May 2021), (4) NL2 (the second no lockdown period, 18 May 2021–31 December 2021), (5) NL3 (the third no lockdown period, 1 January 2022–December 2022), and (6) for the past month. In this data descriptor article, the content of the survey and the dataset are described. Full article
Show Figures

Figure 1

Figure 1
<p>Schematic overview of the COVID-19 pandemic in Türkiye. Abbreviations: COVID-19 = 2019 coronavirus disease, BP = before the COVID-19 pandemic, NL1 = first no lockdown period, L = lockdown period, NL2 = second no lockdown period, NL3 = third no lockdown period.</p>
Full article ">
14 pages, 3842 KiB  
Data Descriptor
Draft Genome Sequencing of the Bacillus thuringiensis var. Thuringiensis Highly Insecticidal Strain 800/15
by Anton E. Shikov, Iuliia A. Savina, Maria N. Romanenko, Anton A. Nizhnikov and Kirill S. Antonets
Data 2024, 9(2), 34; https://doi.org/10.3390/data9020034 - 10 Feb 2024
Cited by 1 | Viewed by 2301
Abstract
The Bacillus thuringiensis serovar thuringiensis strain 800/15 has been actively used as an agent in biopreparations with high insecticidal activity against the larvae of the Colorado potato beetle Leptinotarsa decemlineata and gypsy moth Lymantria dispar. In the current study, we present the [...] Read more.
The Bacillus thuringiensis serovar thuringiensis strain 800/15 has been actively used as an agent in biopreparations with high insecticidal activity against the larvae of the Colorado potato beetle Leptinotarsa decemlineata and gypsy moth Lymantria dispar. In the current study, we present the first draft genome of the 800/15 strain coupled with a comparative genomic analysis of its closest reference strains. The raw sequence data were obtained by Illumina technology on the HiSeq X platform and de novo assembled with the SPAdes v3.15.4 software. The genome reached 6,524,663 bp. in size and carried 6771 coding sequences, 3 of which represented loci encoding insecticidal toxins, namely, Spp1Aa1, Cry1Ab9, and Cry1Ba8 active against the orders Lepidoptera, Blattodea, Hemiptera, Diptera, and Coleoptera. We also revealed the biosynthetic gene clusters responsible for the synthesis of secondary metabolites, including fengycin, bacillibactin, and petrobactin with predicted antibacterial, fungicidal, and growth-promoting properties. Further comparative genomics suggested the strain is not enriched with genes linked with biological activities implying that agriculturally important properties rely more on the composition of loci rather than their abundance. The obtained genomic sequence of the strain with the experimental metadata could facilitate the computational prediction of bacterial isolates’ potency from genomic data. Full article
Show Figures

Figure 1

Figure 1
<p>The morphology of the 800/15 strain’s colonies after one day of cultivation on LB nutrient medium (<b>a</b>) and transition from vegetative to sporulating culture after two days of cultivation on CCY [<a href="#B8-data-09-00034" class="html-bibr">8</a>] nutrient medium stained with Coomassie brilliant blue (100× objective) (<b>b</b>). The black solid arrow shows the spore, the black dotted arrow—bipyramidal crystals of toxins, and the orange solid arrow—the vegetative cell.</p>
Full article ">Figure 2
<p>Comparisons between the studied strains in terms of genomic similarity and composition of insecticidal toxins and virulence determinants. (<b>a</b>) The bootstrapped hierarchal clustering tree of the genomes is based on the pair-wise ANI (average nucleotide identity) estimates. The numbers near branches correspond to support values, and the black arrow points to the strain 800/15. Adjacent strips are colorized according to experimentally verified insecticidal activities and isolation sources, respectively. (<b>b</b>) Shared and non-common insecticidal toxins within the fourth and third ranks (<b>c</b>) of the current structure-based nomenclature, predicted host species (<b>d</b>), and known virulence determinants from VFDB (Virulence Factors Database) [<a href="#B25-data-09-00034" class="html-bibr">25</a>] (<b>e</b>) within the strain 800/15 and closest reference genomes. Non-shared values imply that the respective homologs or host species are absent in the genomes of reference strains or strain 800/15.</p>
Full article ">Figure 3
<p>The overall number and phylogeny-wise distribution of genomic determinants and theoretically predicted pesticidal activities of the analyzed strains. (<b>a</b>) The total amount of insecticidal toxins, theoretically predicted sets of affected species (<b>b</b>), and homologs of well-studied virulence determinants (<b>c</b>) from VFDB (Virulence Factors Database) [<a href="#B25-data-09-00034" class="html-bibr">25</a>]. The arrow points to strain 800/15. (<b>d</b>) The heatmap displaying the distribution of insecticidal toxins and virulence factors (<b>e</b>), homologs, as well as the number of putative host species (<b>f</b>). For the distribution of homologs, the intensity of the color corresponds to the identity with the closest hit from the respective database. (<b>g</b>) The sum of insecticidal toxins, virulence determinants, and predicted host species found in the studied strains.</p>
Full article ">Figure 4
<p>The total number of predicted CRISPR (clustered regularly interspaced short palindromic repeats) sequences (<b>a</b>), and mobile genetic elements, namely, genomic islands (<b>b</b>), insertion sequences (<b>c</b>), and prophages (<b>d</b>) presented in the <span class="html-italic">B. thuringiensis</span> strain 800/15 genome and the closest reference genome assemblies. Black arrows point to the bars related to the studied strain.</p>
Full article ">Figure 5
<p>Functional annotation of the strain 800/15 and the composition of biosynthetic gene clusters in the studied genomic dataset. (<b>a</b>) Over-representation tests of functional terms in the analyzed strain within the Gene Ontology (GO) system in Biological Process (BP), (<b>b</b>) Cellular Component (CC), and Molecular Function (MF) (<b>c</b>) categories. The size of the circles denotes the enrichment ratio, i.e., the ratio of terms belonging to the strain 800/15 to the total number of the respective terms in the universe (the sum of the terms for all proteins in the dataset). The color indicates a <span class="html-italic">p</span>-value obtained from Fisher’s exact text corrected with the “weight0” algorithm. (<b>d</b>) The total number of biosynthetic gene clusters (BGCs) identified using DeepBGC v0.1.30 [<a href="#B20-data-09-00034" class="html-bibr">20</a>] software concerning the chemical class of potentially synthesized metabolites. (<b>e</b>) The distribution of BGCs displaying the percentage of BGCs with predicted antibiotic activity.</p>
Full article ">
12 pages, 988 KiB  
Data Descriptor
Conflicting Marks Archive Dataset: A Dataset of Conflicting Marks from the Brazilian Intellectual Property Office
by Igor Bezerra Reis, Rafael Ângelo Santos Leite, Mateus Miranda Torres, Alcides Gonçalves da Silva Neto, Francisco José da Silva e Silva and Ariel Soares Teles
Data 2024, 9(2), 33; https://doi.org/10.3390/data9020033 - 9 Feb 2024
Viewed by 1911
Abstract
A registered trademark represents one of a company’s most valuable intellectual assets, acting as a safeguard against possible reputational damage and financial losses resulting from infringements of this intellectual property. To be registered, a mark must be unique and distinctive in relation to [...] Read more.
A registered trademark represents one of a company’s most valuable intellectual assets, acting as a safeguard against possible reputational damage and financial losses resulting from infringements of this intellectual property. To be registered, a mark must be unique and distinctive in relation to other trademarks which are already registered. In this paper, we describe the CMAD, an acronym for Conflicting Marks Archive Dataset. This dataset has been meticulously organized into pairs of marks (Number of pairs = 18,355) involved in copyright infringement across word, figurative and mixed marks. Organizations sought to register these marks with the National Institute of Industrial Property (INPI) in Brazil, and had their applications denied after analysis by intellectual property specialists. The robustness of this dataset is ensured by the intrinsic similarity of the conflicting marks, since the decisions were made by INPI specialists. This characteristic provides a reliable basis for the development and testing of tools designed to analyze similarity between marks, thus contributing to the evolution of practices and computer-based solutions in the field of intellectual property. Full article
Show Figures

Figure 1

Figure 1
<p>Example of nominative similarity.</p>
Full article ">Figure 2
<p>Example of ideological similarity.</p>
Full article ">Figure 3
<p>Example of visual similarity.</p>
Full article ">Figure 4
<p>Methodology steps.</p>
Full article ">Figure 5
<p>Example of rejection: (<b>a</b>) originally written in PT-BR; (<b>b</b>) translated by the authors to English language.</p>
Full article ">Figure 6
<p>Example of an XML file containing a mark application.</p>
Full article ">Figure 7
<p>Example of collected data from a rejected mark.</p>
Full article ">Figure 8
<p>Sample distribution of presentations for: (<b>a</b>) rejected mark applications, and (<b>b</b>) trademarks.</p>
Full article ">Figure 9
<p>Three samples of conflicting marks in the CMAD: (<b>a</b>) three pairs of mark images, and (<b>b</b>) respective entries in the CSV file.</p>
Full article ">
5 pages, 187 KiB  
Editorial
Data in Astrophysics and Geophysics: Novel Research and Applications
by Vladimir A. Srećković, Milan S. Dimitrijević and Zoran R. Mijić
Data 2024, 9(2), 32; https://doi.org/10.3390/data9020032 - 8 Feb 2024
Viewed by 2040
Abstract
Rapid development of communication technologies and constant technological improvements as a result of scientific discoveries require the establishment of specific databases [...] Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
14 pages, 7532 KiB  
Article
The Yinshan Mountains Record over 10,000 Landslides
by Jingjing Sun, Chong Xu, Liye Feng, Lei Li, Xuewei Zhang and Wentao Yang
Data 2024, 9(2), 31; https://doi.org/10.3390/data9020031 - 8 Feb 2024
Cited by 4 | Viewed by 2120
Abstract
China boasts a vast expanse of mountainous terrain, characterized by intricate geological conditions and structural features, resulting in frequent geological disasters. Among these, landslides, as prototypical geological hazards, pose significant threats to both lives and property. Consequently, conducting a comprehensive landslide inventory in [...] Read more.
China boasts a vast expanse of mountainous terrain, characterized by intricate geological conditions and structural features, resulting in frequent geological disasters. Among these, landslides, as prototypical geological hazards, pose significant threats to both lives and property. Consequently, conducting a comprehensive landslide inventory in mountainous regions is imperative for current research. This study concentrates on the Yinshan Mountains, an ancient fault-block mountain range spanning east–west in the central Inner Mongolia Autonomous Region, extending from Langshan Mountains in the west to Damaqun Mountains in the east, with the narrow sense Xiao–Yin Mountains District in between. Employing multi-temporal high-resolution remote sensing images from Google Earth, this study conducted visual interpretation, identifying 10,968 landslides in the Yinshan area, encompassing a total area of 308.94 km2. The largest landslide occupies 2.95 km2, while the smallest covers 84.47 m2. Specifically, the Langshan area comprises 331 landslides with a total area of 11.96 km2, the narrow sense Xiao–Yin Mountains include 3393 landslides covering 64.13 km2, and the Manhan Mountains, Damaqun Mountains, and adjacent areas account for 7244 landslides over a total area of 232.85 km2. This research not only contributes to global landslide cataloging initiatives but also serves as a robust foundation for future geohazard prevention and management efforts. Full article
(This article belongs to the Section Spatial Data Science and Digital Earth)
Show Figures

Figure 1

Figure 1
<p>Overview of the study area (vector range source [<a href="#B14-data-09-00031" class="html-bibr">14</a>,<a href="#B18-data-09-00031" class="html-bibr">18</a>]).</p>
Full article ">Figure 2
<p>Example of landslide delineation.</p>
Full article ">Figure 3
<p>Distribution and density analysis of landslides in the Yinshan Mountains. (<b>a</b>) Landslide distribution. (<b>b</b>) Density distribution of landslide points.</p>
Full article ">Figure 4
<p>Detailed partition display. (<b>a</b>) Distribution of landslides in Langshan Mountains District. (<b>b</b>) Distribution of landslides in Xiao–Yin Mountains District in a narrow sense. (<b>c</b>) Distribution of landslides in the Manhan Mountains, Damaqun Mountains, and surrounding mountainous areas.</p>
Full article ">Figure 5
<p>Typical landslide locations.</p>
Full article ">Figure 6
<p>Typical single landslides.</p>
Full article ">Figure 6 Cont.
<p>Typical single landslides.</p>
Full article ">Figure 7
<p>Typical mass landslides.</p>
Full article ">
9 pages, 1164 KiB  
Data Descriptor
Expanded Brain CT Dataset for the Development of AI Systems for Intracranial Hemorrhage Detection and Classification
by Anna N. Khoruzhaya, Tatiana M. Bobrovskaya, Dmitriy V. Kozlov, Dmitriy Kuligovskiy, Vladimir P. Novik, Kirill M. Arzamasov and Elena I. Kremneva
Data 2024, 9(2), 30; https://doi.org/10.3390/data9020030 - 6 Feb 2024
Viewed by 5150
Abstract
Intracranial hemorrhage (ICH) is a dangerous life-threatening condition leading to disability. Timely and high-quality diagnosis plays a huge role in the course and outcome of this disease. The gold standard in determining ICH is computed tomography. This method requires a prompt involvement of [...] Read more.
Intracranial hemorrhage (ICH) is a dangerous life-threatening condition leading to disability. Timely and high-quality diagnosis plays a huge role in the course and outcome of this disease. The gold standard in determining ICH is computed tomography. This method requires a prompt involvement of highly qualified personnel, which is not always possible, for example, in case of a staff shortage or increased workload. In such a situation, every minute counts, and time can be lost. The solution to this problem seems to be a set of diagnostic decisions, including the use of artificial intelligence, which will help to identify patients with ICH in a timely manner and provide prompt and quality medical care. However, the main obstacle to the development of artificial intelligence is a lack of high-quality datasets for training and testing. In this paper, we present a dataset including 800 brain CT scans consisting of multiple series of DICOM images with and without signs of ICH, enriched with clinical and technical parameters, as well as the methodology of its generation utilizing natural language processing tools. The dataset is publicly available, which contributes to increased competition in the development of artificial intelligence systems and their advancement and quality improvement. Full article
Show Figures

Figure 1

Figure 1
<p>Examples of brain CT scans with labels. (<b>a</b>) Brain CT scan with signs of epidural hemorrhage in the right hemisphere. (<b>b</b>) Brain CT scan with signs of subarachnoid hemorrhage in both hemispheres. (<b>c</b>) Brain CT scan with signs of subdural hemorrhage in the right hemisphere. (<b>d</b>) Brain CT scan with signs of intracerebral hemorrhage in the right hemisphere. (<b>e</b>) Brain CT scan with signs of multiple hemorrhages (in this case subdural, subarachnoid, intracerebral) in both hemispheres. (<b>f</b>) Brain CT scan (bone kernel) with signs of frontal bone fracture. (<b>g</b>) Brain CT scan with signs of combined pathology (in this case, the tumor was complicated by hemorrhage). (<b>h</b>) Brain CT scan with signs of blood breakthrough into the liquor spaces, indicated by intraventricular hemorrhage.</p>
Full article ">Figure 2
<p>Data collection flowchart. CT—computed tomography, ICH—intracranial hemorrhage, NLP—natural language processing.</p>
Full article ">Figure 3
<p>Stages of the defacing algorithm: (<b>a</b>)—obtaining an axial slice, (<b>b</b>)—drawing a line to the facial part of the slice from the centroid, obtaining a point for constructing an artefact (ellipse), (<b>c</b>)—constructing an ellipse of random size and filling it with specified values, (<b>d</b>)—repeating operation (<b>b</b>,<b>c</b>) for each slice (<b>a</b>), obtaining artefacts across the entire facial part of the skull, (<b>e</b>)—3D reconstruction of the CT study after running the algorithm for 60% of the slices in axial view.</p>
Full article ">
17 pages, 663 KiB  
Article
A Comprehensive Data Pipeline for Comparing the Effects of Momentum on Sports Leagues
by Jordan Truman Paul Noel, Vinicius Prado da Fonseca and Amilcar Soares
Data 2024, 9(2), 29; https://doi.org/10.3390/data9020029 - 1 Feb 2024
Cited by 5 | Viewed by 4964
Abstract
Momentum has been a consistently studied aspect of sports science for decades. Among the established literature, there has, at times, been a discrepancy between conclusions. However, if momentum is indeed an actual phenomenon, it would affect all aspects of sports, from player evaluation [...] Read more.
Momentum has been a consistently studied aspect of sports science for decades. Among the established literature, there has, at times, been a discrepancy between conclusions. However, if momentum is indeed an actual phenomenon, it would affect all aspects of sports, from player evaluation to pre-game prediction and betting. Therefore, using momentum-based features that quantify a team’s linear trend of play, we develop a data pipeline that uses a small sample of recent games to assess teams’ quality of play and measure the predictive power of momentum-based features versus the predictive power of more traditional frequency-based features across several leagues using several machine learning techniques. More precisely, we use our pipeline to determine the differences in the predictive power of momentum-based features and standard statistical features for the National Hockey League (NHL), National Basketball Association (NBA), and five major first-division European football leagues. Our findings show little evidence that momentum has superior predictive power in the NBA. Still, we found some instances of the effects of momentum on the NHL that produced better pre-game predictors, whereas we view a similar trend in European football/soccer. Our results indicate that momentum-based features combined with frequency-based features could improve pre-game prediction models and that, in the future, momentum should be studied more from a feature/performance indicator point-of-view and less from the view of the dependence of sequential outcomes, thus attempting to distance momentum from the binary view of winning and losing. Full article
Show Figures

Figure 1

Figure 1
<p>A diagram of our complete data pipeline.</p>
Full article ">Figure 2
<p>Example of how momentum-based features are calculated with the use of the data in <a href="#data-09-00029-t002" class="html-table">Table 2</a>.</p>
Full article ">Figure 3
<p>A comparison of the accuracy achieved among different sports and feature sets using logistic regression.</p>
Full article ">Figure 4
<p>A comparison of the accuracy achieved among different sports and feature sets when using random forest.</p>
Full article ">Figure 5
<p>A comparison of the accuracy achieved among different sports and feature sets when using linear discriminant analysis.</p>
Full article ">
10 pages, 2458 KiB  
Data Descriptor
Organ-On-A-Chip (OOC) Image Dataset for Machine Learning and Tissue Model Evaluation
by Valērija Movčana, Arnis Strods, Karīna Narbute, Fēlikss Rūmnieks, Roberts Rimša, Gatis Mozoļevskis, Maksims Ivanovs, Roberts Kadiķis, Kārlis Gustavs Zviedris, Laura Leja, Anastasija Zujeva, Tamāra Laimiņa and Arturs Abols
Data 2024, 9(2), 28; https://doi.org/10.3390/data9020028 - 1 Feb 2024
Cited by 1 | Viewed by 2879
Abstract
Organ-on-a-chip (OOC) technology has emerged as a groundbreaking approach for emulating the physiological environment, revolutionizing biomedical research, drug development, and personalized medicine. OOC platforms offer more physiologically relevant microenvironments, enabling real-time monitoring of tissue, to develop functional tissue models. Imaging methods are the [...] Read more.
Organ-on-a-chip (OOC) technology has emerged as a groundbreaking approach for emulating the physiological environment, revolutionizing biomedical research, drug development, and personalized medicine. OOC platforms offer more physiologically relevant microenvironments, enabling real-time monitoring of tissue, to develop functional tissue models. Imaging methods are the most common approach for daily monitoring of tissue development. Image-based machine learning serves as a valuable tool for enhancing and monitoring OOC models in real-time. This involves the classification of images generated through microscopy contributing to the refinement of model performance. This paper presents an image dataset, containing cell images generated from OOC setup with different cell types. There are 3072 images generated by an automated brightfield microscopy setup. For some images, parameters such as cell type, seeding density, time after seeding and flow rate are provided. These parameters along with predefined criteria can contribute to the evaluation of image quality and identification of potential artifacts. This dataset can be used as a basis for training machine learning classifiers for automated data analysis generated from an OOC setup providing more reliable tissue models, automated decision-making processes within the OOC framework and efficient research in the future. Full article
Show Figures

Figure 1

Figure 1
<p>Representative images of A549 and HPMEC cell lines from lung cancer on a chip model classified with ‘good’/’bad’ label. Scale bar: 100 μm.</p>
Full article ">Figure 2
<p>Image of microfluidic chip. Chip dimensions are 4.9 cm × 3 cm and total height 6 mm. Blue dye represents the bottom channel, red dye—top channel.</p>
Full article ">Figure 3
<p>Summary of DNN architecture.</p>
Full article ">Figure 4
<p><b>Summary of training MobileNetV3 on the dataset:</b> (<b>A</b>) model loss during training, (<b>B</b>) model accuracy during training, and (<b>C</b>) confusion matrix for the performance of the model on the test data.</p>
Full article ">
24 pages, 1171 KiB  
Article
Understanding Data Breach from a Global Perspective: Incident Visualization and Data Protection Law Review
by Gabriel Arquelau Pimenta Rodrigues, André Luiz Marques Serrano, Amanda Nunes Lopes Espiñeira Lemos, Edna Dias Canedo, Fábio Lúcio Lopes de Mendonça, Robson de Oliveira Albuquerque, Ana Lucila Sandoval Orozco and Luis Javier García Villalba
Data 2024, 9(2), 27; https://doi.org/10.3390/data9020027 - 31 Jan 2024
Cited by 7 | Viewed by 11953
Abstract
Data breaches result in data loss, including personal, health, and financial information that are crucial, sensitive, and private. The breach is a security incident in which personal and sensitive data are exposed to unauthorized individuals, with the potential to incur several privacy concerns. [...] Read more.
Data breaches result in data loss, including personal, health, and financial information that are crucial, sensitive, and private. The breach is a security incident in which personal and sensitive data are exposed to unauthorized individuals, with the potential to incur several privacy concerns. As an example, the French newspaper Le Figaro breached approximately 7.4 billion records that included full names, passwords, and e-mail and physical addresses. To reduce the likelihood and impact of such breaches, it is fundamental to strengthen the security efforts against this type of incident and, for that, it is first necessary to identify patterns of its occurrence, primarily related to the number of data records leaked, the affected geographical region, and its regulatory aspects. To advance the discussion in this regard, we study a dataset comprising 428 worldwide data breaches between 2018 and 2019, providing a visualization of the related statistics, such as the most affected countries, the predominant economic sector targeted in different countries, and the median number of records leaked per incident in different countries, regions, and sectors. We then discuss the data protection regulation in effect in each country comprised in the dataset, correlating key elements of the legislation with the statistical findings. As a result, we have identified an extensive disclosure of medical records in India and government data in Brazil in the time range. Based on the analysis and visualization, we find some interesting insights that researchers seldom focus on before, and it is apparent that the real dangers of data leaks are beyond the ordinary imagination. Finally, this paper contributes to the discussion regarding data protection laws and compliance regarding data breaches, supporting, for example, the decision process of data storage location in the cloud. Full article
Show Figures

Figure 1

Figure 1
<p>The ten most frequently breached countries.</p>
Full article ">Figure 2
<p>Sum of breached records per country.</p>
Full article ">Figure 3
<p>Boxplot of breached records per country.</p>
Full article ">Figure 4
<p>Proportion of sum of records breached per country in regions.</p>
Full article ">Figure 5
<p>Boxplot of breached records per region.</p>
Full article ">Figure 6
<p>Boxplot of breached records per sector.</p>
Full article ">Figure 7
<p>Sum of records breached per region per year per sector.</p>
Full article ">Figure 8
<p>Most explored sector in the top 10 breached countries.</p>
Full article ">Figure 9
<p>Distribution of sum of records breached by sectors per region.</p>
Full article ">Figure 10
<p>Distribution of breach sizes per sector.</p>
Full article ">Figure 11
<p>Classification level of regulation and enforcement for countries per region [<a href="#B77-data-09-00027" class="html-bibr">77</a>].</p>
Full article ">Figure 12
<p>Timeline of enactment of data protection laws.</p>
Full article ">Figure 13
<p>Distribution of countries that have a DPA and require registration.</p>
Full article ">Figure 14
<p>Distribution of countries that require a DPO.</p>
Full article ">Figure 15
<p>Distribution of countries that require breach notification and specify a time limit.</p>
Full article ">
18 pages, 3696 KiB  
Data Descriptor
Dataset for Electronics and Plasmonics in Graphene, Silicene, and Germanene Nanostrips
by Talia Tene, Nataly Bonilla García, Miguel Ángel Sáez Paguay, John Vera, Marco Guevara, Cristian Vacacela Gomez and Stefano Bellucci
Data 2024, 9(2), 26; https://doi.org/10.3390/data9020026 - 30 Jan 2024
Viewed by 2243
Abstract
The quest for novel materials with extraordinary electronic and plasmonic properties is an ongoing pursuit in the field of materials science. The dataset provides the results of a computational study that used ab initio and semi-analytical computations to model freestanding nanosystems. We delve [...] Read more.
The quest for novel materials with extraordinary electronic and plasmonic properties is an ongoing pursuit in the field of materials science. The dataset provides the results of a computational study that used ab initio and semi-analytical computations to model freestanding nanosystems. We delve into the world of ribbon-like materials, specifically graphene nanoribbons, silicene nanoribbons, and germanene nanoribbons, comparing their electronic and plasmonic characteristics. Our research reveals a myriad of insights, from the tunability of band structures and the influence of an atomic number on electronic properties to the adaptability of nanoribbons for optoelectronic applications. Further, we uncover the promise of these materials for biosensing, demonstrating their plasmon frequency tunability based on charge density and Fermi velocity modification. Our findings not only expand the understanding of these quasi-1D materials but also open new avenues for the development of cutting-edge devices and technologies. This data presentation holds immense potential for future advancements in electronics, optics, and molecular sensing. Full article
Show Figures

Figure 1

Figure 1
<p>Electronic band structure computed using the LDA (black points) and GW (red points) methods: (<b>A</b>) graphene, (<b>B</b>) silicene, and (<b>C</b>) germanene. Green and cyan lines represent the linear fit.</p>
Full article ">Figure 2
<p>Estimated parameters of (<b>A</b>) Fermi velocity, (<b>B</b>) electron effective mass, and (<b>C</b>) bandgap as a function of the atomic number: C<sub>6</sub>, Si<sub>14</sub>, and Ge<sub>32</sub>. The ribbon width was fixed at 155 nm.</p>
Full article ">Figure 3
<p>Bandgap variation as a function of the ribbon width from 5 nm to 5 μm for GNRs (red points), SiNRs (blue points), and GeNRs (green points).</p>
Full article ">Figure 4
<p>Electronic band structure and density of states (DOS) of (<b>A</b>) GNR, (<b>B</b>) SiNR, and (<b>C</b>) GeNRs. The ribbon width was fixed at 155 nm. The cyan line represents the smoothed DOS.</p>
Full article ">Figure 5
<p>Plasmon frequency as a function of the angle variation (<math display="inline"><semantics> <mrow> <mo>±</mo> <mn>90</mn> <mo>°</mo> </mrow> </semantics></math>) considering different plasmon momenta: <math display="inline"><semantics> <mrow> <mi>q</mi> <mo>=</mo> <mn>100</mn> </mrow> </semantics></math> cm<sup>−1</sup>, <math display="inline"><semantics> <mrow> <mi>q</mi> <mo>=</mo> <mn>100</mn> </mrow> </semantics></math>0 cm<sup>−1</sup>, and <math display="inline"><semantics> <mrow> <mi>q</mi> <mo>=</mo> <mn>10,000</mn> </mrow> </semantics></math> cm<sup>−1</sup>. The ribbon width was fixed at 155 nm. (<b>A</b>) GNR, (<b>B</b>) SiNR, and (<b>C</b>) GeNR.</p>
Full article ">Figure 6
<p>Plasmon frequency as a function of the angle variation (<math display="inline"><semantics> <mrow> <mo>±</mo> <mn>90</mn> <mo>°</mo> </mrow> </semantics></math>) considering different charge densities: <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>N</mi> </mrow> <mrow> <mn>2</mn> <mi>D</mi> </mrow> </msub> <mo>=</mo> <mn>1.0</mn> </mrow> </semantics></math> cm<sup>−2</sup>, <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>N</mi> </mrow> <mrow> <mn>2</mn> <mi>D</mi> </mrow> </msub> <mo>=</mo> <mn>0.5</mn> </mrow> </semantics></math> cm<sup>−2</sup>, and <math display="inline"><semantics> <mrow> <msub> <mrow> <mi>N</mi> </mrow> <mrow> <mn>2</mn> <mi>D</mi> </mrow> </msub> <mo>=</mo> <mn>0.25</mn> </mrow> </semantics></math> cm<sup>−2</sup>. The ribbon width was fixed at 155 nm and <math display="inline"><semantics> <mrow> <mi>q</mi> <mo>=</mo> <mn>100</mn> </mrow> </semantics></math> cm<sup>−1</sup>. (<b>A</b>) GNR, (<b>B</b>) SiNR, and (<b>C</b>) GeNR.</p>
Full article ">Figure 7
<p>Plasmon frequency as a function of the angle variation (<math display="inline"><semantics> <mrow> <mo>±</mo> <mn>90</mn> <mo>°</mo> </mrow> </semantics></math>) considering different effective masses, which were varied by increasing the Fermi velocity from 25% to 75% (see <a href="#data-09-00026-t002" class="html-table">Table 2</a>). The ribbon width was fixed at 155 nm and <math display="inline"><semantics> <mrow> <mi>q</mi> <mo>=</mo> <mn>100</mn> </mrow> </semantics></math> cm<sup>−1</sup>. (<b>A</b>) GNR, (<b>B</b>) SiNR, and (<b>C</b>) GeNR.</p>
Full article ">Scheme 1
<p>Scheme of generated data for ab initio and semi-analytical computations.</p>
Full article ">
19 pages, 3460 KiB  
Article
Curating, Collecting, and Cataloguing Global COVID-19 Datasets for the Aim of Predicting Personalized Risk
by Sepehr Golriz Khatami, Astghik Sargsyan, Maria Francesca Russo, Daniel Domingo-Fernández, Andrea Zaliani, Abish Kaladharan, Priya Sethumadhavan, Sarah Mubeen, Yojana Gadiya, Reagon Karki, Stephan Gebel, Ram Kumar Ruppa Surulinathan, Vanessa Lage-Rupprecht, Saulius Archipovas, Geltrude Mingrone, Marc Jacobs, Carsten Claussen, Martin Hofmann-Apitius and Alpha Tom Kodamullil
Data 2024, 9(2), 25; https://doi.org/10.3390/data9020025 - 29 Jan 2024
Viewed by 2253
Abstract
Although hundreds of datasets have been published since the beginning of the coronavirus pandemic, there is a lack of centralized resources where these datasets are listed and harmonized to facilitate their applicability and uptake by predictive modeling approaches. Firstly, such a centralized resource [...] Read more.
Although hundreds of datasets have been published since the beginning of the coronavirus pandemic, there is a lack of centralized resources where these datasets are listed and harmonized to facilitate their applicability and uptake by predictive modeling approaches. Firstly, such a centralized resource provides information about data owners to researchers who are searching datasets to develop their predictive models. Secondly, the harmonization of the datasets supports simultaneously taking advantage of several similar datasets. This, in turn, does not only ease the imperative external validation of data-driven models but can also be used for virtual cohort generation, which helps to overcome data sharing impediments. Here, we present that the COVID-19 data catalogue is a repository that provides a landscape view of COVID-19 studies and datasets as a putative source to enable researchers to develop personalized COVID-19 predictive risk models. The COVID-19 data catalogue currently contains over 400 studies and their relevant information collected from a wide range of global sources such as global initiatives, clinical trial repositories, publications, and data repositories. Further, the curated content stored in this data catalogue is complemented by a web application, providing visualizations of these studies, including their references, relevant information such as measured variables, and the geographical locations of where these studies were performed. This resource is one of the first to capture, organize, and store studies, datasets, and metadata related to COVID-19 in a comprehensive repository. We believe that our work will facilitate future research and development of personalized predictive risk models for COVID-19. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the steps required for the preparation of the data catalogue.</p>
Full article ">Figure 2
<p>Number of studies from different sources included in the data catalogue.</p>
Full article ">Figure 3
<p>Distribution of different variable types from over 50 rank one and two prioritized studies.</p>
Full article ">Figure 4
<p>Distribution of different clinical variables from over 50 rank one and two prioritized studies.</p>
Full article ">Figure 5
<p>Distribution of different laboratory variables from over 50 rank one and two prioritized studies.</p>
Full article ">Figure 6
<p>Longitudinal view of patient hospitalizations.</p>
Full article ">Figure 7
<p>Association between fatality rate and BMI.</p>
Full article ">Figure 8
<p>Percentage of patients having at most 50% missing values during hospitalization time.</p>
Full article ">Figure 9
<p>Number of missing values for static variables.</p>
Full article ">Figure 10
<p>Geographic distribution of studies.</p>
Full article ">
15 pages, 412 KiB  
Article
Mapping Hierarchical File Structures to Semantic Data Models for Efficient Data Integration into Research Data Management Systems
by Henrik tom Wörden, Florian Spreckelsen, Stefan Luther, Ulrich Parlitz and Alexander Schlemmer
Data 2024, 9(2), 24; https://doi.org/10.3390/data9020024 - 26 Jan 2024
Viewed by 2575
Abstract
Although other methods exist to store and manage data in modern information technology, the standard solution is file systems. Therefore, keeping well-organized file structures and file system layouts can be key to a sustainable research data management infrastructure. However, file structures alone lack [...] Read more.
Although other methods exist to store and manage data in modern information technology, the standard solution is file systems. Therefore, keeping well-organized file structures and file system layouts can be key to a sustainable research data management infrastructure. However, file structures alone lack several important capabilities for FAIR data management: the two most significant being insufficient visualization of data and inadequate possibilities for searching and obtaining an overview. Research data management systems (RDMSs) can fill this gap, but many do not support the simultaneous use of the file system and RDMS. This simultaneous use can have many benefits, but keeping data in RDMS in synchrony with the file structure is challenging. Here, we present concepts that allow for keeping file structures and semantic data models (in RDMS) synchronous. Furthermore, we propose a specification in yaml format that allows for a structured and extensible declaration and implementation of a mapping between the file system and data models used in semantic research data management. Implementing these concepts will facilitate the re-use of specifications for multiple use cases. Furthermore, the specification can serve as a machine-readable and, at the same time, human-readable documentation of specific file system structures. We demonstrate our work using the Open Source RDMS LinkAhead (previously named “CaosDB”). Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Figure 1
<p>Mapping between file structure and data model. Blue lines indicate which pieces of information from the file structure and file contents are mapped to the respective properties in the data model.</p>
Full article ">Figure 2
<p>Overview of the complete data integration procedure. See <a href="#sec2dot1-data-09-00024" class="html-sec">Section 2.1</a>.</p>
Full article ">Figure 3
<p>Illustration of the data synchronization procedure using a crawler: Data acquisition (possibly including computer simulations or data produced during data analysis) leads to a variety of files in different formats on the file system. Crawler plugins (“CFoods”) are designed in a way that they understand the local structures on the file system. The file tree is traversed, possibly opening files and extracting (meta) data in order to transform them into a semantic data model that is then synchronized with the RDMS, as described in <a href="#sec2dot3-data-09-00024" class="html-sec">Section 2.3</a>. The figure was previously published in [<a href="#B26-data-09-00024" class="html-bibr">26</a>].</p>
Full article ">Figure 4
<p>Terminology used in the context of defining identity for <tt>Records</tt>: For each <tt>RecordType</tt> that is used in the synchronization procedure, its identity needs to be defined in what we call the <tt>Registered Identifiable</tt>. The scanner will fill the values of the properties of the <tt>Registered Identifiable</tt> and, in that process, create an <tt>Identifiable</tt>. This <tt>Identifiable</tt> can be used by the crawler to run a query on LinkAhead. If a <tt>Record</tt> matching the properties of the <tt>Identifiable</tt> already exists in LinkAhead, it will be retrieved and used, as described in <a href="#sec2dot4-data-09-00024" class="html-sec">Section 2.4</a>, to update the <tt>Record</tt>. Otherwise, a new <tt>Record</tt> will be inserted.</p>
Full article ">
10 pages, 228 KiB  
Data Descriptor
Comprehensive Dataset on Pre-SARS-CoV-2 Infection Sports-Related Physical Activity Levels, Disease Severity, and Treatment Outcomes: Insights and Implications for COVID-19 Management
by Dimitrios I. Bourdas, Panteleimon Bakirtzoglou, Antonios K. Travlos, Vasileios Andrianopoulos and Emmanouil Zacharakis
Data 2024, 9(2), 23; https://doi.org/10.3390/data9020023 - 26 Jan 2024
Cited by 1 | Viewed by 2309
Abstract
This dataset aimed to explore associations between pre-SARS-CoV-2 infection exercise and sports-related physical activity (PA) levels and disease severity, along with treatments administered following the most recent SARS-CoV-2 infection. A comprehensive analysis investigated the relationships between PA categories (“Inactive”, “Low PA”, “Moderate PA”, [...] Read more.
This dataset aimed to explore associations between pre-SARS-CoV-2 infection exercise and sports-related physical activity (PA) levels and disease severity, along with treatments administered following the most recent SARS-CoV-2 infection. A comprehensive analysis investigated the relationships between PA categories (“Inactive”, “Low PA”, “Moderate PA”, “High PA”), disease severity (“Sporadic”, “Episodic”, “Recurrent”, “Frequent”, “Persistent”), and treatments post-SARS-CoV-2 infection (“No treatment”, “Home remedies”, “Prescribed medication”, “Hospital admission”, “Intensive care unit admission”) within a sample population (n = 5829) from the Hellenic territory. Utilizing the Active-Q questionnaire, data were collected from February to March 2023, capturing PA habits, participant characteristics, medical history, vaccination status, and illness experiences. Findings revealed an independent relationship between preinfection PA levels and disease severity (χ2 = 9.097, df = 12, p = 0.695). Additionally, a statistical dependency emerged between PA levels and illness treatment categories (χ2 = 39.362, df = 12, p < 0.001), particularly linking inactive PA with home remedies treatment. These results highlight the potential influence of preinfection PA on disease severity and treatment choices following SARS-CoV-2 infection. The dataset offers valuable insights into the interplay between PA, disease outcomes, and treatment decisions, aiding future research in shaping targeted interventions and public health strategies related to COVID-19 management. Full article
9 pages, 445 KiB  
Data Descriptor
Genomic Epidemiology Dataset for the Important Nosocomial Pathogenic Bacterium Acinetobacter baumannii
by Andrey Shelenkov, Yulia Mikhaylova and Vasiliy Akimkin
Data 2024, 9(2), 22; https://doi.org/10.3390/data9020022 - 26 Jan 2024
Viewed by 2026
Abstract
The infections caused by various bacterial pathogens both in clinical and community settings represent a significant threat to public healthcare worldwide. The growing resistance to antimicrobial drugs acquired by bacterial species causing healthcare-associated infections has already become a life-threatening danger noticed by the [...] Read more.
The infections caused by various bacterial pathogens both in clinical and community settings represent a significant threat to public healthcare worldwide. The growing resistance to antimicrobial drugs acquired by bacterial species causing healthcare-associated infections has already become a life-threatening danger noticed by the World Health Organization. Several groups or lineages of bacterial isolates, usually called ‘the clones of high risk’, often drive the spread of resistance within particular species. Thus, it is vitally important to reveal and track the spread of such clones and the mechanisms by which they acquire antibiotic resistance and enhance their survival skills. Currently, the analysis of whole-genome sequences for bacterial isolates of interest is increasingly used for these purposes, including epidemiological surveillance and the development of spread prevention measures. However, the availability and uniformity of the data derived from genomic sequences often represent a bottleneck for such investigations. With this dataset, we present the results of a genomic epidemiology analysis of 17,546 genomes of a dangerous bacterial pathogen, Acinetobacter baumannii. Important typing information, including multilocus sequence typing (MLST)-based sequence types (STs), intrinsic blaOXA-51-like gene variants, capsular (KL) and oligosaccharide (OCL) types, CRISPR-Cas systems, and cgMLST profiles are presented, as well as the assignment of particular isolates to nine known international clones of high risk. The presence of antimicrobial resistance genes within the genomes is also reported. These data will be useful for researchers in the field of A. baumannii genomic epidemiology, resistance analysis, and prevention measure development. Full article
Show Figures

Figure 1

Figure 1
<p>Representation of <span class="html-italic">A. baumannii</span> isolates belonging to the international clones of high risk in Genbank. ‘NOIC’ represents the isolates not belonging to IC1-IC9.</p>
Full article ">
12 pages, 394 KiB  
Article
MHAiR: A Dataset of Audio-Image Representations for Multimodal Human Actions
by Muhammad Bilal Shaikh, Douglas Chai, Syed Mohammed Shamsul Islam and Naveed Akhtar
Data 2024, 9(2), 21; https://doi.org/10.3390/data9020021 - 25 Jan 2024
Viewed by 2160
Abstract
Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were [...] Read more.
Audio-image representations for a multimodal human action (MHAiR) dataset contains six different image representations of the audio signals that capture the temporal dynamics of the actions in a very compact and informative way. The dataset was extracted from the audio recordings which were captured from an existing video dataset, i.e., UCF101. Each data sample captured a duration of approximately 10 s long, and the overall dataset was split into 4893 training samples and 1944 testing samples. The resulting feature sequences were then converted into images, which can be used for human action recognition and other related tasks. These images can be used as a benchmark dataset for evaluating the performance of machine learning models for human action recognition and related tasks. These audio-image representations could be suitable for a wide range of applications, such as surveillance, healthcare monitoring, and robotics. The dataset can also be used for transfer learning, where pre-trained models can be fine-tuned on a specific task using specific audio images. Thus, this dataset can facilitate the development of new techniques and approaches for improving the accuracy of human action-related tasks and also serve as a standard benchmark for testing the performance of different machine learning models and algorithms. Full article
Show Figures

Figure 1

Figure 1
<p>Six different audio-image representations of the same action. Each image represents different characteristics of the same audio signal (adopted from [<a href="#B19-data-09-00021" class="html-bibr">19</a>]). (<b>a</b>) Waveplot. (<b>b</b>) Spectral Centroids. (<b>c</b>) Spectral Rolloff. (<b>d</b>) MFCCs. (<b>e</b>) MFCCs Feature Scaling. (<b>f</b>) Chromagram.</p>
Full article ">Figure 2
<p>High-level schematic representation of our approach.</p>
Full article ">
17 pages, 3022 KiB  
Article
An Optimized Hybrid Approach for Feature Selection Based on Chi-Square and Particle Swarm Optimization Algorithms
by Amani Abdo, Rasha Mostafa and Laila Abdel-Hamid
Data 2024, 9(2), 20; https://doi.org/10.3390/data9020020 - 25 Jan 2024
Cited by 2 | Viewed by 2904
Abstract
Feature selection is a significant issue in the machine learning process. Most datasets include features that are not needed for the problem being studied. These irrelevant features reduce both the efficiency and accuracy of the algorithm. It is possible to think about feature [...] Read more.
Feature selection is a significant issue in the machine learning process. Most datasets include features that are not needed for the problem being studied. These irrelevant features reduce both the efficiency and accuracy of the algorithm. It is possible to think about feature selection as an optimization problem. Swarm intelligence algorithms are promising techniques for solving this problem. This research paper presents a hybrid approach for tackling the problem of feature selection. A filter method (chi-square) and two wrapper swarm intelligence algorithms (grey wolf optimization (GWO) and particle swarm optimization (PSO)) are used in two different techniques to improve feature selection accuracy and system execution time. The performance of the two phases of the proposed approach is assessed using two distinct datasets. The results show that PSOGWO yields a maximum accuracy boost of 95.3%, while chi2-PSOGWO yields a maximum accuracy improvement of 95.961% for feature selection. The experimental results show that the proposed approach performs better than the compared approaches. Full article
(This article belongs to the Section Information Systems and Data Management)
Show Figures

Figure 1

Figure 1
<p>Architecture of phase 1 (PSOGWO).</p>
Full article ">Figure 2
<p>Flow chart for chi2-PSOGWO.</p>
Full article ">Figure 3
<p>Architecture of phase 2 (chi-square-PSOGWO).</p>
Full article ">Figure 4
<p>Comparing accuracy of feature selection.</p>
Full article ">Figure 5
<p>Comparing accuracy of chi2-PSOGWO.</p>
Full article ">Figure 6
<p>Comparing accuracy of PSO, PSOGWO, and chi2-PSOGWO.</p>
Full article ">Figure 7
<p>Comparing execution time of PSOGWO and chi2-PSOGWO.</p>
Full article ">Figure 8
<p>Evaluation of accuracy and execution time.</p>
Full article ">
6 pages, 197 KiB  
Data Descriptor
Draft Genome Sequence of the Commercial Strain Rhizobium ruizarguesonis bv. viciae RCAM1022
by Olga A. Kulaeva, Evgeny A. Zorin, Anton S. Sulima, Gulnar A. Akhtemova and Vladimir A. Zhukov
Data 2024, 9(2), 19; https://doi.org/10.3390/data9020019 - 23 Jan 2024
Viewed by 2043
Abstract
Legume plants enter a symbiosis with soil nitrogen-fixing bacteria (rhizobia), thereby gaining access to assimilable atmospheric nitrogen. Since this symbiosis is important for agriculture, biofertilizers with effective strains of rhizobia are created for crop legumes to increase their yield and minimize the amounts [...] Read more.
Legume plants enter a symbiosis with soil nitrogen-fixing bacteria (rhizobia), thereby gaining access to assimilable atmospheric nitrogen. Since this symbiosis is important for agriculture, biofertilizers with effective strains of rhizobia are created for crop legumes to increase their yield and minimize the amounts of mineral fertilizers required. In this work, we sequenced and characterized the genome of Rhizobium ruizarguesonis bv. viciae strain RCAM1022, a component of the ‘Rhizotorfin’ biofertilizer produced in Russia and used for pea (Pisum sativum L.). Full article
18 pages, 2704 KiB  
Article
Can Data and Machine Learning Change the Future of Basic Income Models? A Bayesian Belief Networks Approach
by Hamed Khalili
Data 2024, 9(2), 18; https://doi.org/10.3390/data9020018 - 23 Jan 2024
Viewed by 2744
Abstract
Appeals to governments for implementing basic income are contemporary. The theoretical backgrounds of the basic income notion only prescribe transferring equal amounts to individuals irrespective of their specific attributes. However, the most recent basic income initiatives all around the world are attached to [...] Read more.
Appeals to governments for implementing basic income are contemporary. The theoretical backgrounds of the basic income notion only prescribe transferring equal amounts to individuals irrespective of their specific attributes. However, the most recent basic income initiatives all around the world are attached to certain rules with regard to the attributes of the households. This approach is facing significant challenges to appropriately recognize vulnerable groups. A possible alternative for setting rules with regard to the welfare attributes of the households is to employ artificial intelligence algorithms that can process unprecedented amounts of data. Can integrating machine learning change the future of basic income by predicting households vulnerable to future poverty? In this paper, we utilize multidimensional and longitudinal welfare data comprising one and a half million individuals’ data and a Bayesian beliefs network approach to examine the feasibility of predicting households’ vulnerability to future poverty based on the existing households’ welfare attributes. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The Bayesian belief network’s directed acyclic graph incorporating 30 welfare variables.</p>
Full article ">Figure 2
<p>The test set’s ROC and PR metrics.</p>
Full article ">Figure 3
<p>Government’s play room to recognize higher True positive rates.</p>
Full article ">Figure 4
<p>Welfare attributes’ importance to provide evidence regarding negative and positives.</p>
Full article ">Figure 5
<p>The Bayesian belief network’s directed acyclic graph incorporating non-banking welfare variables.</p>
Full article ">Figure 6
<p>The test set’s ROC and PR metrics by incorporating non-banking welfare variables.</p>
Full article ">Figure 7
<p>Government’s play room to recognize higher True-positive rates by incorporating non-banking welfare variables.</p>
Full article ">
Previous Issue
Next Issue
Back to TopTop