[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (46)

Search Parameters:
Keywords = HDBSCAN

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 715 KiB  
Article
Sentence Embeddings and Semantic Entity Extraction for Identification of Topics of Short Fact-Checked Claims
by Krzysztof Węcel, Marcin Sawiński, Włodzimierz Lewoniewski, Milena Stróżyna, Ewelina Księżniak and Witold Abramowicz
Information 2024, 15(10), 659; https://doi.org/10.3390/info15100659 - 21 Oct 2024
Viewed by 1174
Abstract
The objective of this research was to design a method to assign topics to claims debunked by fact-checking agencies. During the fact-checking process, access to more structured knowledge is necessary; therefore, we aim to describe topics with semantic vocabulary. Classification of topics should [...] Read more.
The objective of this research was to design a method to assign topics to claims debunked by fact-checking agencies. During the fact-checking process, access to more structured knowledge is necessary; therefore, we aim to describe topics with semantic vocabulary. Classification of topics should go beyond simple connotations like instance-class and rather reflect broader phenomena that are recognized by fact checkers. The assignment of semantic entities is also crucial for the automatic verification of facts using the underlying knowledge graphs. Our method is based on sentence embeddings, various clustering methods (HDBSCAN, UMAP, K-means), semantic entity matching, and terms importance assessment based on TF-IDF. We represent our topics in semantic space using Wikidata Q-ids, DBpedia, Wikipedia topics, YAGO, and other relevant ontologies. Such an approach based on semantic entities also supports hierarchical navigation within topics. For evaluation, we compare topic modeling results with claims already tagged by fact checkers. The work presented in this paper is useful for researchers and practitioners interested in semantic topic modeling of fake news narratives. Full article
Show Figures

Figure 1

Figure 1
<p>Hierarchy of classes.</p>
Full article ">Figure 2
<p>Workflow of the system.</p>
Full article ">Figure 3
<p>Distribution of words among topics for PolitiFact subset modeled with LDA with 20 topics and DMM with 80 topics.</p>
Full article ">Figure 4
<p>Distribution of terms among topics for custom methods.</p>
Full article ">Figure 5
<p>Distribution of ontology terms among topics for custom methods.</p>
Full article ">Figure 6
<p>Accuracy of various topic modeling methods, for PolitiFact with 80 topics.</p>
Full article ">Figure 7
<p>Accuracy of custom methods, for various tag assignment approaches.</p>
Full article ">Figure 8
<p>Coherence of 20 topics for various topic modeling methods.</p>
Full article ">Figure 9
<p>Coherence of topics produced by our methods.</p>
Full article ">Figure A1
<p>Full set of term frequencies for two datasets, with two clustering methods and various generalization schemes, part 1. The same terms are encoded with the same color.</p>
Full article ">Figure A1 Cont.
<p>Full set of term frequencies for two datasets, with two clustering methods and various generalization schemes, part 1. The same terms are encoded with the same color.</p>
Full article ">Figure A2
<p>Full set of term frequencies for two datasets, with two clustering methods and various generalization schemes, part 2. The same terms are encoded with the same color.</p>
Full article ">Figure A2 Cont.
<p>Full set of term frequencies for two datasets, with two clustering methods and various generalization schemes, part 2. The same terms are encoded with the same color.</p>
Full article ">
18 pages, 4797 KiB  
Article
coiTAD: Detection of Topologically Associating Domains Based on Clustering of Circular Influence Features from Hi-C Data
by Drew Houchens, H. M. A. Mohit Chowdhury and Oluwatosin Oluwadare
Genes 2024, 15(10), 1293; https://doi.org/10.3390/genes15101293 - 30 Sep 2024
Viewed by 1343
Abstract
Background/Objectives: Topologically associating domains (TADs) are key structural units of the genome, playing a crucial role in gene regulation. TAD boundaries are enriched with specific biological markers and have been linked to genetic diseases, making consistent TAD detection essential. However, accurately identifying TADs [...] Read more.
Background/Objectives: Topologically associating domains (TADs) are key structural units of the genome, playing a crucial role in gene regulation. TAD boundaries are enriched with specific biological markers and have been linked to genetic diseases, making consistent TAD detection essential. However, accurately identifying TADs remains challenging due to the lack of a definitive validation method. This study aims to develop a novel algorithm, termed coiTAD, which introduces an innovative approach for preprocessing Hi-C data to improve TAD prediction. This method employs a proposed “circle of influence” (COI) approach derived from Hi-C contact matrices. Methods: The coiTAD algorithm is based on the creation of novel features derived from the circle of influence in input contact matrices, which are subsequently clustered using the HDBSCAN clustering algorithm. The TADs are extracted from the clustered features based on intra-cluster interactions, thereby providing a more accurate method for identifying TADs. Results: Rigorous tests were conducted using both simulated and real Hi-C datasets. The algorithm’s validation included analysis of boundary proteins such as H3K4me1, RNAPII, and CTCF. coiTAD consistently matched other TAD prediction methods. Conclusions: The coiTAD algorithm represents a novel approach for detecting TADs. At its core, the circle-of-influence methodology introduces an innovative strategy for preparing Hi-C data, enabling the assessment of interaction strengths between genomic regions. This approach facilitates a nuanced analysis that effectively captures structural variations within chromatin. Ultimately, the coiTAD algorithm enhances our understanding of chromatin organization and offers a robust tool for genomic research. The source code for coiTAD is publicly available, and the URL can be found in the Data Availability Statement section. Full article
(This article belongs to the Collection Feature Papers in Bioinformatics)
Show Figures

Figure 1

Figure 1
<p>coiTAD’s pipeline. This figure details coiTAD’s entire pipeline in a graphic table. It details the process of creating features, employing HDBSCAN, identifying TADs from given clusters, evaluating the quality of those TADs, and finally receiving the best radius result.</p>
Full article ">Figure 2
<p>coiTAD’s feature generation. The contact matrix is taken in by coiTAD. For each radius, there are features along every point of the diagonal that contribute to the final feature vector. These contact points are stored in the order top left, top center, top right, right, left, bottom left, bottom center, and bottom right, for each point on the diagonal. See semi-circle section for details on that order.</p>
Full article ">Figure 3
<p>PCA/non-PCA results on simulated data (<b>a</b>,<b>b</b>). Comparison of full-circle (FC) and semi-circle (SC) non-PCA and PCA results across low- and high-noise simulated matrices. Numbers indicate the PCA retention level for a specified result. (<b>a</b>) low-noise result (4-noise) (<b>b</b>) high-noise result (12 noise).</p>
Full article ">Figure 4
<p>Semi-circle vs. full-circle results on simulated data (<b>a</b>–<b>e</b>). Comparison of semi-circle (SC) feature against full-circle feature (FC) on CASPIAN simulated matrices (<b>a</b>) 4-noise results (<b>b</b>) 8-noise results (<b>c</b>) 12-noise results (<b>d</b>) 16-noise results (<b>e</b>) 20-noise results.</p>
Full article ">Figure 5
<p>Comparison of TAD callers on simulated 4-noise and 12-noise datasets. TAD callers were analyzed based on numbers of TADs identified, size distribution of called TADs, and the average measure of concordance across all callers. (<b>a</b>) Number of TADs identified; (<b>b</b>) size distribution of TADs; (<b>c</b>) average measure of concordance across callers.</p>
Full article ">Figure 6
<p>Comparison of callers on hESC chromosome 19. Comparison of multiple TAD callers on raw hESC chr19 Hi-C dataset from Dixon et al. (<b>a</b>) Comparison of TAD size across callers; (<b>b</b>) one-versus-all comparison of shared boundaries with coiTAD; (<b>c</b>) numbers of identified TADs across callers; (<b>d</b>) PCA Comparison Plot on TAD callers’ results.</p>
Full article ">Figure 7
<p>Comparison of callers on HESC chromosome 19 at 10 kb. Comparison of multiple TAD callers on raw hESC chr19 Hi-C dataset from Dixon et al. (<b>a</b>) Comparison of TAD size across callers; (<b>b</b>) one-versus-all comparison of shared boundaries with coiTAD; (<b>c</b>) numbers of identified TADs across callers.</p>
Full article ">Figure 8
<p>Comparison of callers on HESC chromosome 1. Comparison of multiple TAD callers on raw hESC chr1 Hi-C dataset from Dixon et al. (<b>a</b>) Comparison of TAD size across callers; (<b>b</b>) one-versus-all comparison of shared boundaries with coiTAD; (<b>c</b>) numbers of identified TADs across callers.</p>
Full article ">Figure 9
<p>Enrichment analysis across callers on HESC chromosome 19. Enrichment analysis of active histone modification marks and CTCF binding sites at domain boundaries on hESC chromosome 19. Callers assessed included coiTAD, TopDom, ClusterTAD, HiCSeg, and Spectral. (<b>a</b>) H3K4me1 peaks across callers; (<b>b</b>) H3k27ac peaks across callers; (<b>c</b>) CTCF peaks across callers; (<b>d</b>) RNAPIII peaks across callers; (<b>e</b>) H3K4me3 peaks across callers.</p>
Full article ">Figure 10
<p>Enrichment analysis across TAD callers on GM12878 chromosome 19 at 10 Kb resolution. Enrichment analysis of active histone modification marks and CTCF binding sites at domain boundaries. (<b>a</b>) CTCF; (<b>b</b>) H3K4me1; (<b>c</b>) H3K4me3; (<b>d</b>) H3K27ac; and (<b>e</b>) RNAPII peak analysis across TAD callers (coiTAD, ClusterTAD, HiCSeg, Spectral, and TopDom).</p>
Full article ">Figure 11
<p>Computational performance benchmarking of coiTAD with other TAD callers. We analyzed running time and peak memory consumption of five TAD callers including coiTAD, and coiTAD showed a result comparable to those of the other TAD callers.</p>
Full article ">
31 pages, 74393 KiB  
Article
Hyperspectral Sensor Management for UAS: Performance Analysis of Context-Based System Architectures for Camouflage and UXO Anomaly Detection Workflows
by Linda Eckel and Peter Stütz
Drones 2024, 8(10), 529; https://doi.org/10.3390/drones8100529 - 27 Sep 2024
Cited by 1 | Viewed by 969
Abstract
Tactical aerial reconnaissance missions using small unmanned aerial systems (UASs) have become a common scenario in the military. In particular, the detection of visually obscured objects such as camouflage materials and unexploded ordnance (UXO) is of great interest. Hyperspectral sensors, which provide detailed [...] Read more.
Tactical aerial reconnaissance missions using small unmanned aerial systems (UASs) have become a common scenario in the military. In particular, the detection of visually obscured objects such as camouflage materials and unexploded ordnance (UXO) is of great interest. Hyperspectral sensors, which provide detailed spectral information beyond the visible spectrum, are highly suitable for this type of reconnaissance mission. However, the additional spectral information places higher demands on system architectures to achieve efficient and robust data processing and object detection. To overcome these challenges, the concept of data reduction by band selection is introduced. In this paper, a specialized and robust concept of context-based hyperspectral sensor management with an implemented methodology of band selection for small and challenging UXO and camouflaged material detection is presented and evaluated with two hyperspectral datasets. For this purpose, several anomaly detectors such as LRX, NCC, HDBSCAN, and bandpass filters are introduced as part of the detection workflows and tested together with the sensor management in different system architectures. The results demonstrate how sensor management can significantly improve the detection performance for UXO compared to using all sensor bands or statistically selected bands. Furthermore, the implemented detection workflows and architectures yield strong performance results and improve the anomaly detection accuracy significantly compared to common approaches of processing hyperspectral images with a single, specialized anomaly detector. Full article
(This article belongs to the Collection Drones for Security and Defense Applications)
Show Figures

Figure 1

Figure 1
<p>Targets of a common reconnaissance scenario with camouflage materials and various types of UXO: (<b>a</b>) Improvised green tarp. (<b>b</b>) Military camouflage net, 2nd generation with far-infrared (FIR) and radar characteristics. (<b>c</b>) Military desert net. (<b>d</b>) Military urban net. (<b>e</b>) Mine type 72. (<b>f</b>) Directional mine. (<b>g</b>) Grenade. (<b>h</b>) Dud grenade.</p>
Full article ">Figure 2
<p>Airborne VIS-image of the two testsites: (<b>a</b>) Testsite 1 with meadow, deciduous forest, gravel, sand, and roads. (<b>b</b>) Testsite 2 with coniferous forest and areas of swamp, moss, and sand.</p>
Full article ">Figure 3
<p>Randomly selected samples of the datasets 1 and 2 with their corresponding ground truths: (<b>a</b>) Sample of dataset 1 with an improvised camouflage material. (<b>b</b>) Sample of dataset 1 with a mine, directional mine, and rocket of the target group UXO. (<b>c</b>) Sample of dataset 2 with a military camouflage materials. (<b>d</b>) Sample of dataset 2 with two military camouflage materials and a mine.</p>
Full article ">Figure 4
<p>Workflow of the Sensor Performance Prediction with its data reduction by selecting context bands, followed by the subsequent clustering-based extraction of the <span class="html-italic">Sensor Context</span> for the final band prediction as part of the <span class="html-italic">Sensor Model</span>.</p>
Full article ">Figure 5
<p>Definition of the background and test pixel for an LRX.</p>
Full article ">Figure 6
<p>Workflow of the preprocessing and anomaly detection process.</p>
Full article ">Figure 7
<p>Counts of algorithm selections as the best-performing one for the workflow of UXO on dataset 1.</p>
Full article ">Figure 8
<p>Counts of algorithm selections as the best-performing one for the workflow of camouflage materials on dataset 1 coloured by detector group, where the counts of LRX detector settings are coloured in blue, the C-HDBSCAN in red, the C-NCC in orange and the bandpass filter in green.</p>
Full article ">
25 pages, 35656 KiB  
Article
Development and Application of an Advanced Automatic Identification System (AIS)-Based Ship Trajectory Extraction Framework for Maritime Traffic Analysis
by I-Lun Huang, Man-Chun Lee, Li Chang and Juan-Chen Huang
J. Mar. Sci. Eng. 2024, 12(9), 1672; https://doi.org/10.3390/jmse12091672 - 18 Sep 2024
Cited by 1 | Viewed by 1241
Abstract
This study addresses the challenges of maritime traffic management in the western waters of Taiwan, a region characterized by substantial commercial shipping activity and ongoing environmental development. Using 2023 Automatic Identification System (AIS) data, this study develops a robust feature extraction framework involving [...] Read more.
This study addresses the challenges of maritime traffic management in the western waters of Taiwan, a region characterized by substantial commercial shipping activity and ongoing environmental development. Using 2023 Automatic Identification System (AIS) data, this study develops a robust feature extraction framework involving data cleaning, anomaly trajectory point detection, trajectory compression, and advanced processing techniques. Dynamic Time Warping (DTW) and the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithms are applied to cluster the trajectory data, revealing 16 distinct maritime traffic patterns, key navigation routes, and intersections. The findings provide fresh perspectives on analyzing maritime traffic, identifying high-risk areas, and informing safety and spatial planning. In practical applications, the results help navigators optimize route planning, improve resource allocation for maritime authorities, and inform the development of infrastructure and navigational aids. Furthermore, these outcomes are essential for detecting abnormal ship behavior, and they highlight the potential of route extraction in maritime surveillance. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

Figure 1
<p>Methodological framework of research.</p>
Full article ">Figure 2
<p>Principle diagram of DP algorithm; (<b>a</b>) Original trajectory. (<b>b</b>) Baseline construction and distance calculation. (<b>c</b>) Trajectory segmentation at farthest points. (<b>d</b>) Segmentation progression. (<b>e</b>) Incomplete segment handling. (<b>f</b>) Final simplified trajectory.</p>
Full article ">Figure 3
<p>Container ship traffic density in the western waters of Taiwan, 2023.</p>
Full article ">Figure 4
<p>Parameter analysis for anomaly trajectory point detection and trajectory compression; (<b>a</b>) Relationship between parameter α and the number of anomaly data points. (<b>b</b>) Distribution of ship length and width. (<b>c</b>) Relationship between <span class="html-italic">ε</span> and average reduced distance. (<b>d</b>) Relationship between <span class="html-italic">ε</span> and average reduced point.</p>
Full article ">Figure 5
<p>Impact of <span class="html-italic">min_samples</span> on the number of clusters and non-clustered trajectories.</p>
Full article ">Figure 6
<p>Impact of <span class="html-italic">min_samples</span> on Silhouette Coefficient (SC) and Davies–Bouldin Index (DBI).</p>
Full article ">Figure 7
<p>Clustering analysis results for <span class="html-italic">min_samples</span> = 4 (26 clusters and 1 non-clustered trajectory data).</p>
Full article ">Figure 8
<p>Clustering analysis results for <span class="html-italic">min_samples</span> = 17 (16 clusters and 1 non-clustered trajectory data).</p>
Full article ">Figure 9
<p>Clustering analysis results for <span class="html-italic">min_samples</span> = 50 (10 clusters and 1 non-clustered trajectory data).</p>
Full article ">Figure 10
<p>Route extraction of container ships in the western waters of Taiwan in 2023.</p>
Full article ">
10 pages, 1672 KiB  
Article
Enhanced Performance of the Optimized Dye CF583R in Direct Stochastic Optical Reconstruction Microscopy of Active Zones in Drosophila Melanogaster
by Marvin Noß, Dmitrij Ljaschenko and Achmed Mrestani
Cells 2024, 13(17), 1445; https://doi.org/10.3390/cells13171445 - 28 Aug 2024
Viewed by 884
Abstract
Super-resolution single-molecule localization microscopy (SMLM) of presynaptic active zones (AZs) and postsynaptic densities contributed to the observation of protein nanoclusters that are involved in defining functional characteristics and in plasticity of synaptic connections. Among SMLM techniques, direct stochastic optical reconstruction microscopy (d [...] Read more.
Super-resolution single-molecule localization microscopy (SMLM) of presynaptic active zones (AZs) and postsynaptic densities contributed to the observation of protein nanoclusters that are involved in defining functional characteristics and in plasticity of synaptic connections. Among SMLM techniques, direct stochastic optical reconstruction microscopy (dSTORM) depends on organic fluorophores that exert high brightness and reliable photoswitching. While multicolor imaging is highly desirable, the requirements necessary for high-quality dSTORM make it challenging to identify combinations of equally performing, spectrally separated dyes. Red-excited carbocyanine dyes, e.g., Alexa Fluor 647 (AF647) or Cy5, are currently regarded as “gold standard” fluorophores for dSTORM imaging. However, a recent study introduced a set of chemically modified rhodamine dyes, including CF583R, that promise to display similar performance in dSTORM. In this study, we defined CF583R’s performance compared to AF647 and CF568 based on a nanoscopic analysis of Bruchpilot (Brp), a nanotopologically well-characterized scaffold protein at Drosophila melanogaster AZs. We demonstrate equal suitability of AF647, CF568 and CF583R for basal AZ morphometry, while in Brp subcluster analysis CF583R outperforms CF568 and is on par with AF647. Thus, the AF647/CF583R combination will be useful in future dSTORM-based analyses of AZs and other subcellularly located marker molecules and their role in physiological and pathophysiological contexts. Full article
(This article belongs to the Special Issue Diving Deep into Synaptic Transmission)
Show Figures

Figure 1

Figure 1
<p>Application of Alexa Fluor 647, CF568 and CF583R for <span class="html-italic">direct</span> stochastic optical reconstruction microscopy of Bruchpilot. (<b>a</b>) Binned images (10 nm pixels) of <span class="html-italic">direct</span> stochastic optical reconstruction microscopy (<span class="html-italic">d</span>STORM) of Bruchpilot (Brp) stained with the primary monoclonal mouse antibody Brp<sup>Nc82</sup> and secondary F(ab’)<sub>2</sub> fragments coupled to Alexa Fluor 647 (AF647, <b>left</b>) or secondary IgGs, either coupled to CF568 (<b>middle</b>) or CF583R (<b>right</b>), at distal type Ib boutons of the 3rd instar larval <span class="html-italic">Drosophila melanogaster</span> neuromuscular junction (NMJ) formed at abdominal muscles 6 and 7. Symbols (+, *, #) mark the enlarged individual active zones (AZs) in (<b>b</b>). (<b>b</b>) Enlarged AZs from (<b>a</b>). (<b>c</b>) Localization precision determined by NeNA algorithm (nearest neighbor-based analysis, see Materials and Methods) for AF647 (grey, n = 11 images from 3 animals), CF568 (green, n = 13 images from 4 animals) and CF583R (orange, n = 14 images from 4 animals) shown as boxplots, where horizontal lines represent medians, boxes quartiles and whiskers minimum and maximum values, combined with swarm plots displaying individual data points. (<b>d</b>) Number of <span class="html-italic">d</span>STORM localizations (locs.) per 100 frames for the three different dyes (averaged for the same images as in (<b>c</b>)). Scale bars in (<b>a</b>) 1 µm, in (<b>b</b>) 100 nm.</p>
Full article ">Figure 2
<p>AF647, CF568 and CF583R yield sufficient localization to obtain basal AZ morphometric parameters. (<b>a</b>) Contour plots of the median number of AZs per <span class="html-italic">d</span>STORM image depending on the HDBSCAN (hierarchical density-based spatial clustering of applications with noise) parameters “minimum samples” and “minimum cluster size” for AF647 (left, n = 11 images from 3 animals), CF568 (middle, n = 13 images from 4 animals) and CF583R (right, n = 14 images from 4 animals). Crosshairs indicate the parameter combination used in (<b>b</b>–<b>e</b>) (25 and 100, respectively, for all three conditions). (<b>b</b>) Line and scatter plots of median AZ area (black) as well as 25th and 75th percentiles (up- and downward magenta triangles, respectively, same images as in (<b>a</b>)) plotted against alpha values used for determination of alpha shape areas. Blue line and scatter plots indicate the percent increases of AZ areas with increasing alpha (zeros omitted due to logarithmic scale). The relative increase dropped below 5% at alpha values of 1225, 900 and 1600 nm<sup>2</sup> for AF647, CF568 and CF583R, respectively (the selected parameters used for quantification in (<b>e</b>)). (<b>c</b>) Scatter plots of <span class="html-italic">d</span>STORM localizations of type Ib boutons shown in <a href="#cells-13-01445-f001" class="html-fig">Figure 1</a>a. Different colors indicate the identity of AZs detected by HDBSCAN, grey dots are unclustered localizations. (<b>d</b>) Numbers of localizations per AZ for AF647 (grey, 557 AZs from 11 NMJs and 3 animals), CF568 (green, 514 AZs from 13 NMJs and 4 animals) and CF583R (orange, 607 AZs from 14 NMJs and 4 animals) shown as boxplots, where horizontal lines represent medians, boxes quartiles and whiskers 10th and 90th percentiles. (<b>e</b>) Area of individual AZs using the three different dyes shown as histograms and boxplots (same AZs as in (<b>d</b>)). Scale bar in (<b>c</b>) 1 µm.</p>
Full article ">Figure 3
<p>CF583R and AF647 perform equally in AZ nanocluster analyses. (<b>a</b>) Same AZs as in <a href="#cells-13-01445-f001" class="html-fig">Figure 1</a>b shown as scatter plots of <span class="html-italic">d</span>STORM localizations. Brp subclusters (SCs) per AZ were detected by a second-level HDBSCAN on individual AZs with parameters individually adjusted for each imaging condition to yield comparable SC radii with respect to H function maxima in (<b>b</b>). Parameters for minimum cluster size and minimum samples were 22 and 5, 44 and 11 as well as 15 and 3 for AF647, CF568 and CF583R, respectively. SC identity is indicated by different colors, black dots are unclustered localizations and colored lines show alpha shapes used for area determination in (<b>c</b>). (<b>b</b>) Averaged H functions (as derivatives of Ripley’s K function) for AF647 (grey), CF568 (green) and CF583R (orange) shown as straight lines. Colored dashed lines indicate H function maxima (AF647: 25 nm, CF568, 34 nm, CF583R: 26 nm, n = 535, 514 and 594 averaged H functions, respectively), while the dashed black line shows the prediction for a random Poisson distribution. (<b>c</b>) Number of SCs per AZ (left) and SC area (right) for the three different dyes shown as boxplots (n = 557 AZs from 11 NMJs and 3 animals, 502 AZs from 13 NMJs and 4 animals and 607 AZs from 14 NMJs and 4 animals for AF647, CF568 and CF583R, respectively). Scale bar in (<b>a</b>) 100 nm.</p>
Full article ">
26 pages, 16140 KiB  
Article
Biomass Burning in Northeast China over Two Decades: Temporal Trends and Geographic Patterns
by Heng Huang, Yinbao Jin, Wei Sun, Yang Gao, Peilun Sun and Wei Ding
Remote Sens. 2024, 16(11), 1911; https://doi.org/10.3390/rs16111911 - 26 May 2024
Cited by 1 | Viewed by 1151
Abstract
Despite the significant impacts of biomass burning (BB) on global climate change and regional air pollution, there is a relative lack of research on the temporal trends and geographic patterns of BB in Northeast China (NEC). This study investigates the spatial–temporal distribution of [...] Read more.
Despite the significant impacts of biomass burning (BB) on global climate change and regional air pollution, there is a relative lack of research on the temporal trends and geographic patterns of BB in Northeast China (NEC). This study investigates the spatial–temporal distribution of BB and its impact on the atmospheric environment in the NEC region during 2004 to 2023 based on remote sensing satellite data and reanalyzed data, using the Siegel’s Repeated Median Estimator and Mann–Kendall test for trend analysis, HDBSCAN to identify significant BB change regions, and Moran’s Index to examine the spatial autocorrelation of BB. The obtained results indicate a fluctuating yet overall increasing BB trend, characterized by annual increases of 759 for fire point counts (FPC) and 12,000 MW for fire radiated power (FRP). BB predominantly occurs in the Songnen Plain (SNP), Sanjiang Plain (SJP), Liaohe Plain (LHP), and the transitional area between SNP and the adjacent Greater Khingan Mountains (GKM) and Lesser Khingan Mountains (LKM). Cropland and urban areas exhibit the highest growth in BB trends, each surpassing 60% (p < 0.05), with the most significant growth cluster spanning 68,634.9 km2. Seasonal analysis shows that BB peaks in spring and autumn, with spring experiencing the highest severity. The most critical periods for BB are March–April and October–November, during which FPC and FRP contribute to over 80% of the annual total. This trend correlates with spring planting and autumn harvesting, where cropland FPC constitutes 71% of all land-cover types involved in BB. Comparative analysis of the aerosol extinction coefficient (AEC) between areas with increasing and decreasing BB indicates higher AEC in BB increasing regions, especially in spring, with the vertical transport of BB reaching up to 1.5 km. County-level spatial autocorrelation analysis indicates high–high clustering in the SNP and SJP, with a notable resurgence of autocorrelation in the SNP, suggesting the need for coordinated provincial prevention and control efforts. Finally, our analysis of the impact of BB on atmospheric pollutants shows that there is a correlation between FRP and pollutants, with correlations for PM2.5, PM10, and CO of 0.4, 0.4, and 0.5, respectively. In addition, the impacts of BB vary by region and season, with the most significant impacts occurring in the spring, especially in the SNP, which requires more attention. In summary, considering the escalating BB trend in NEC and its significant effect on air quality, this study highlights the urgent necessity for improved monitoring and strategic interventions. Full article
(This article belongs to the Section Atmospheric Remote Sensing)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>(<b>a</b>) The annual average distribution of fire pixel counts (FPC) in China from 2004 to 2022, with the red-bordered area representing Northeast China (NEC). (<b>b</b>) Digital elevation model (DEM) of the NEC region, which is divided into three plains (Songnen plain (SNP), Sanjiang plain (SJP), and Liaohe plain (LHP)) and three mountain ranges (Greater Khingan Mountains (GKM), Lesser Khingan Mountains (LKM), and Changbai Mountains (CBM)). (<b>c</b>) The distribution of land types (Forest, Cropland, Urban, and Others) in NEC, where the purple border line is Liaoning (LN), the red is Jilin (JL), and the blue is Heilongjiang (HLJ).</p>
Full article ">Figure 2
<p>The first columns (<b>a</b>–<b>e</b>) show the trend of fire point counts (FPC) time-series, the spatial distribution of annual means, and the Mann–Kendall (MK) trend of biomass burning (BB) in northeast China (NEC) over a 20-year period, respectively. The second column is the same as the first column, but indicates the fire radiative power (FRP). Additionally, in (<b>d</b>), the blue dots represent air quality monitoring stations, with blue text indicating their codes. In the legend of (<b>e</b>) and (<b>f</b>), SI, NSI, UNC, SD, and NSD represent a significant increase, non-significant increase, unchanged, significant decrease, and non-significant decrease, respectively.</p>
Full article ">Figure 3
<p>The proportion of Mann–Kendall (MK) trends for various land-cover types, with (<b>a</b>) representing fire point counts (FPC) and (<b>b</b>) representing fire radiative power (FRP). (<b>c</b>) depicts the clustering analysis results for significant biomass burning increasing (BBI) and biomass burning decreasing (BBD) regions for cropland and urban land covers. The red dots represent significant BBI points, while the blue dots denote significant BBD points. The red dashed lines and square points indicate the clustering boundaries and centers of significant BBI, respectively, whereas the blue dashed lines and square points represent those of significant BBD. Regions ①, ②, and ③ are typical significant BBI areas, while ④ and ⑤ represent significant BBD areas.</p>
Full article ">Figure 4
<p>Spatial distribution of mean fire pixel counts (FPC) and fire radiative power (FPR) in different seasons in northeast China (NEC), where (<b>a</b>–<b>d</b>) are FPC and (<b>e</b>–<b>h</b>) are FPR, corresponding to spring, summer, autumn, and winter sequences.</p>
Full article ">Figure 5
<p>(<b>a</b>,<b>b</b>) Monthly statistics of biomass burning (BB) for fire pixel counts (FPC) and fire radiative power (FRP). (<b>c</b>,<b>d</b>) Seasonal Statistics of BB for FPC and FRP, where the yellow line indicates rainfall and the blue line indicates snowfall.</p>
Full article ">Figure 6
<p>The aerosol extinction coefficient (AEC) profile of biomass burning (BB) for (<b>a</b>) spring, (<b>b</b>) summer, (<b>c</b>) autumn, and (<b>d</b>) winter in regions ① and ④ in <a href="#remotesensing-16-01911-f003" class="html-fig">Figure 3</a>c from 2013 to 2022, with the red and blue lines representing increasing and decreasing areas. The upper-right panel shows a localized zoom.</p>
Full article ">Figure 7
<p>Interannual variability of aerosol extinction coefficient (AEC) profile in significant biomass burning increasing (BBI) and biomass burning decreasing (BBD) areas (<a href="#remotesensing-16-01911-f003" class="html-fig">Figure 3</a>c, ① and ④) from 2013 to 2023 during spring, with the red and blue lines representing BBI and BBD areas, where the upper-right panel is a localized zoom.</p>
Full article ">Figure 8
<p>The local indicators of spatial association (LISA) cluster maps of the average (<b>a</b>) fire pixel counts (FPC) and (<b>b</b>) fire radiative power (FRP) from 2004 to 2023. HH, LH, LL, and ns represent high–high, low–high, low–low, and not significant, respectively.</p>
Full article ">Figure 9
<p>The local indicators of spatial association (LISA) maps of the (<b>a</b>) fire pixel counts (FPC, <b>a1</b>–<b>a20</b>) and (<b>b</b>) fire radiative power (FRP, <b>b1</b>–<b>b20</b>) from 2004 to 2023. HH, HL, LH, LL, and ns represent high–high, high-low, low–high, low–low, and not significant, respectively.</p>
Full article ">Figure 10
<p>The spatial distribution map of the correlation coefficient between biomass burning (BB) and major atmospheric pollutants: (<b>a</b>) aerosol optical depth (AOD), (<b>b</b>) ozone (O<sub>3</sub>), (<b>c</b>) carbon monoxide (CO), (<b>d</b>) sulfur dioxide (SO<sub>2</sub>), (<b>e</b>) carbon dioxide (CO<sub>2</sub>), (<b>f</b>) methane (CH<sub>4</sub>).</p>
Full article ">
11 pages, 4075 KiB  
Proceeding Paper
Mapping Activity-Based Segregation of Names in Dublin Using Google Point of Interest Data
by Punit Gupta, Hamidreza Rabiei-Dastjerdi and Gavin McArdle
Eng. Proc. 2024, 63(1), 18; https://doi.org/10.3390/engproc2024063018 - 29 Feb 2024
Cited by 1 | Viewed by 737
Abstract
The current generation of cities with vast cultures and heritage is influenced by various factors like immigrants from different countries, religious heritage, tourism, and many more factors. Segregation in geographical regions is one of the ways to find patterns in cities influenced by [...] Read more.
The current generation of cities with vast cultures and heritage is influenced by various factors like immigrants from different countries, religious heritage, tourism, and many more factors. Segregation in geographical regions is one of the ways to find patterns in cities influenced by gender, religion, age, income, and many more. In this study, an HDBSCAN-based activity segregation model using Google POI (Point of Interest) is proposed to study the multi-density patterns of reviewers, with possible Indian names, and activities in the Dublin metropolitan area. In this work, the POI dataset is used to study the activity segregation of Indian names in Dublin. This research uses the username to identify the possible gender and nationality of the reviewer using the NamSor app (a machine learning model for prediction of gender and nationality) with an accuracy of 92%. The result shows the proposed HDBSCAN models identify 16 unique segregations which is just nine clusters using the traditional DBSCAN classification model. Full article
Show Figures

Figure 1

Figure 1
<p>Proposed flow diagram.</p>
Full article ">Figure 2
<p>(<b>a</b>) POIs in Dublin; (<b>b</b>) Filtered POIs in Dublin based on the origin of username.</p>
Full article ">Figure 3
<p>Nationality review count after removing empty and Irish comments.</p>
Full article ">Figure 4
<p>Clustering using the HDBSCAN algorithm with outliers.</p>
Full article ">Figure 5
<p>(<b>a</b>) Segregation in Dublin using HDBSCAN; (<b>b</b>) Segregation in Dublin using DBSCAN.</p>
Full article ">Figure 6
<p>Cluster count using HDBSCAN.</p>
Full article ">Figure 7
<p>(<b>a</b>) Clusters of reviews attributed to male names (according to NamSor); (<b>b</b>) Clusters of reviews attributed to female names (according to NamSor).</p>
Full article ">
24 pages, 13048 KiB  
Article
Analysis of Urban Residents’ Travelling Characteristics and Hotspots Based on Taxi Trajectory Data
by Jiusheng Du, Chengyang Meng and Xingwang Liu
Appl. Sci. 2024, 14(3), 1279; https://doi.org/10.3390/app14031279 - 3 Feb 2024
Cited by 2 | Viewed by 1352
Abstract
This study utilizes taxi trajectory data to uncover urban residents’ travel patterns, offering critical insights into the spatial and temporal dynamics of urban mobility. A fusion clustering algorithm is introduced, enhancing the clustering accuracy of trajectory data. This approach integrates the hierarchical density-based [...] Read more.
This study utilizes taxi trajectory data to uncover urban residents’ travel patterns, offering critical insights into the spatial and temporal dynamics of urban mobility. A fusion clustering algorithm is introduced, enhancing the clustering accuracy of trajectory data. This approach integrates the hierarchical density-based spatial clustering of applications with noise (HDBSCAN) algorithm, modified to incorporate time factors, with kernel density analysis. The fusion algorithm demonstrates a higher noise point detection rate (15.85%) compared with the DBSCAN algorithm alone (7.31%), thus significantly reducing noise impact in kernel density analysis. Spatial correlation analysis between hotspot areas and paths uncovers distinct travel behaviors: During morning and afternoon peak hours on weekdays, travel times (19–40 min) exceed those on weekends (16–35 min). Morning peak hours see higher taxi utilization in residential and transportation hubs, with schools and commercial and government areas as primary destinations. Conversely, afternoon peaks show a trend towards dining and entertainment zones from the abovementioned places. In the evening rush, residents enjoy a vibrant nightlife, and there are numerous locations for picking up and dropping off people. A chi-square test on weekday travel data yields a p-value of 0.023, indicating a significant correlation between the distribution of travel hotspots and paths. Full article
(This article belongs to the Special Issue Advances in Internet of Things and Computer Vision)
Show Figures

Figure 1

Figure 1
<p>Technology roadmap.</p>
Full article ">Figure 2
<p>Schematic diagram of noise points.</p>
Full article ">Figure 3
<p>Taxi operation diagram.</p>
Full article ">Figure 4
<p>OD point visualization: (<b>a</b>) visualization of picking-up points; (<b>b</b>) visualization of drop-off points.</p>
Full article ">Figure 5
<p>Schematic diagram of trajectory matching.</p>
Full article ">Figure 6
<p>Minimum tree diagram.</p>
Full article ">Figure 7
<p>Hierarchical clustering tree connection diagram.</p>
Full article ">Figure 8
<p>Comparison of different aggregation results: (<b>a</b>) DBSCAN clustering; (<b>b</b>) HDBSCAN clustering.</p>
Full article ">Figure 9
<p>Comparison of before and after clustering and density analysis: (<b>a</b>) kernel density analysis before clustering; (<b>b</b>) kernel density analysis after clustering.</p>
Full article ">Figure 10
<p>Statistical chart of passenger pick-up and drop-off on weekdays: (<b>a</b>) statistics of passenger pick-up and drop-off on 7 August; (<b>b</b>) statistics of passenger pick-up and drop-off on 8 August.</p>
Full article ">Figure 11
<p>Statistics of passenger pick-up and drop-off on weekends: (<b>a</b>) statistics of passenger pick-up and drop-off on 9 August; (<b>b</b>) statistics of passenger pick-up and drop-off on 10 August.</p>
Full article ">Figure 11 Cont.
<p>Statistics of passenger pick-up and drop-off on weekends: (<b>a</b>) statistics of passenger pick-up and drop-off on 9 August; (<b>b</b>) statistics of passenger pick-up and drop-off on 10 August.</p>
Full article ">Figure 12
<p>Statistics of residents’ travel time on taxis: (<b>a</b>) statistics of residents’ travel time on weekdays; (<b>b</b>) statistics of residents’ travel time on weekends.</p>
Full article ">Figure 13
<p>Heat map of pick-up and drop-off points from 7:00 to 9:00: (<b>a</b>) heat map of pick-up points; (<b>b</b>) heat map of drop-off points.</p>
Full article ">Figure 14
<p>Heat map of pick-up and drop-off points from 11:00 to 13:00: (<b>a</b>) heat map of pick-up points; (<b>b</b>) heat map of drop-off points.</p>
Full article ">Figure 15
<p>Heat map of pick-up and drop-off points during 19:00–21:00: (<b>a</b>) heat map of pick-up points; (<b>b</b>) heat map of drop-off points.</p>
Full article ">Figure 16
<p>Hotspot path map of pick-up and drop-off points during 7:00–9:00: (<b>a</b>) hotspot path for pick-up points; (<b>b</b>) hotspot path for drop-off points.</p>
Full article ">Figure 17
<p>Hotspot path map of pick-up and drop-off points during 11:00–13:00: (<b>a</b>) hotspot path for pick-up points; (<b>b</b>) hotspot path for drop-off points.</p>
Full article ">Figure 18
<p>Hotspot path map of pick-up and drop-off points during 19:00–21:00: (<b>a</b>) hotspot path for pick-up points; (<b>b</b>) hotspot route for drop-off points.</p>
Full article ">
23 pages, 933 KiB  
Article
Clustering on the Chicago Array of Things: Spotting Anomalies in the Internet of Things Records
by Kyle DeMedeiros, Chan Young Koh and Abdeltawab Hendawi
Future Internet 2024, 16(1), 28; https://doi.org/10.3390/fi16010028 - 16 Jan 2024
Viewed by 2192
Abstract
The Chicago Array of Things (AoT) is a robust dataset taken from over 100 nodes over four years. Each node contains over a dozen sensors. The array contains a series of Internet of Things (IoT) devices with multiple heterogeneous sensors connected to a [...] Read more.
The Chicago Array of Things (AoT) is a robust dataset taken from over 100 nodes over four years. Each node contains over a dozen sensors. The array contains a series of Internet of Things (IoT) devices with multiple heterogeneous sensors connected to a processing and storage backbone to collect data from across Chicago, IL, USA. The data collected include meteorological data such as temperature, humidity, and heat, as well as chemical data like CO2 concentration, PM2.5, and light intensity. The AoT sensor network is one of the largest open IoT systems available for researchers to utilize its data. Anomaly detection (AD) in IoT and sensor networks is an important tool to ensure that the ever-growing IoT ecosystem is protected from faulty data and sensors, as well as from attacking threats. Interestingly, an in-depth analysis of the Chicago AoT for anomaly detection is rare. Here, we study the viability of the Chicago AoT dataset to be used in anomaly detection by utilizing clustering techniques. We utilized K-Means, DBSCAN, and Hierarchical DBSCAN (H-DBSCAN) to determine the viability of labeling an unlabeled dataset at the sensor level. The results show that the clustering algorithm best suited for this task varies based on the density of the anomalous readings and the variability of the data points being clustered; however, at the sensor level, the K-Means algorithm, though simple, is better suited for the task of determining specific, at-a-glance anomalies than the more complex DBSCAN and HDBSCAN algorithms, though it comes with drawbacks. Full article
(This article belongs to the Special Issue State-of-the-Art Future Internet Technology in USA 2022–2023)
Show Figures

Figure 1

Figure 1
<p>An example of the AoT data.</p>
Full article ">Figure 2
<p>The data subset focused on for this work.</p>
Full article ">Figure 3
<p>Typical Errors. Colors indicate different sensors, with green indicating NOAA data.</p>
Full article ">Figure 4
<p>Temperature sensor results for node f02f.</p>
Full article ">Figure 5
<p>Temperature sensor results for node d620.</p>
Full article ">Figure 6
<p>Temperature sensor results for node ba3b.</p>
Full article ">Figure 7
<p>Temperature sensor results for node 3f54.</p>
Full article ">
21 pages, 5177 KiB  
Article
Applying Density-Based Clustering for the Analysis of Emission Events in Real Driving Emissions Calibration
by Sascha Krysmon, Stefan Pischinger, Johannes Claßen, Georgi Trendafilov, Marc Düzgün, Frank Dorscheidt, Martin Nijs and Michael Görgen
Future Transp. 2024, 4(1), 46-66; https://doi.org/10.3390/futuretransp4010004 - 10 Jan 2024
Cited by 2 | Viewed by 1403
Abstract
Further reducing greenhouse gas and pollutant emissions from road vehicles is a major task for the automotive industry. Stricter regulations regarding emissions and fleet fuel consumption require the continuous development of new powertrains and methods. In particular, the combination of hybrid powertrains on [...] Read more.
Further reducing greenhouse gas and pollutant emissions from road vehicles is a major task for the automotive industry. Stricter regulations regarding emissions and fleet fuel consumption require the continuous development of new powertrains and methods. In particular, the combination of hybrid powertrains on the technical side and the focus on real driving emissions (RDE) on the legislative side pose significant challenges to the vehicle calibration process. Against this background, new test methods and environments are being investigated to counteract the high number of interactions between hybrid drive systems and quasi-infinite test conditions due to RDE. Complementary to new test environments, innovative methods for data analysis are needed that allow the exploitation of the complete potential of measurement data. The application of such a method in the field of emission calibration is presented in this paper. For this purpose, a clustering method (HDBSCAN) is applied to critical sequences from emission tests. Within this presentation, the clustering process is based on a single signal only. This paper shows how signals of various characteristics can be processed with dynamic time warping and generically structured with the clustering method used. Here, 959 single events are automatically categorized into 24 clusters. This provides a new basis for system evaluation, enabling the automatic identification, categorization, and prioritization of calibration weaknesses. Using twelve signals of different characteristics, the generic usability of the clustering method is demonstrated. Full article
Show Figures

Figure 1

Figure 1
<p>Concept of the RDE validation methodology as context to the clustering process.</p>
Full article ">Figure 2
<p>Comparison using original, best match synchronized, and time-warped signals.</p>
Full article ">Figure 3
<p>Schematic overview of the cluster application.</p>
Full article ">Figure 4
<p>Impact of limited and unlimited DTW based on the downstream lambda sensor voltage.</p>
Full article ">Figure 5
<p>Application of complexity estimate correction to DTW comparisons.</p>
Full article ">Figure 6
<p>Extract of HDBSCAN-created clusters for engine speed.</p>
Full article ">Figure 7
<p>Procedure of outlier re-clustering.</p>
Full article ">Figure 8
<p>Silhouette Score of cluster results for Loop 2 and Loop 3.</p>
Full article ">Figure 9
<p>Overview of the final cluster definition for the engine speed signal of NO<sub>X</sub> events.</p>
Full article ">Figure 10
<p>Extract of cluster 13’s event profiles.</p>
Full article ">Figure 11
<p>Boxplot of distance-specific NO<sub>X</sub> emission intensity distribution.</p>
Full article ">
25 pages, 11691 KiB  
Article
Enhancement Methods of Hydropower Unit Monitoring Data Quality Based on the Hierarchical Density-Based Spatial Clustering of Applications with a Noise–Wasserstein Slim Generative Adversarial Imputation Network with a Gradient Penalty
by Fangqing Zhang, Jiang Guo, Fang Yuan, Yuanfeng Qiu, Pei Wang, Fangjuan Cheng and Yifeng Gu
Sensors 2024, 24(1), 118; https://doi.org/10.3390/s24010118 - 25 Dec 2023
Cited by 2 | Viewed by 1432
Abstract
In order to solve low-quality problems such as data anomalies and missing data in the condition monitoring data of hydropower units, this paper proposes a monitoring data quality enhancement method based on HDBSCAN-WSGAIN-GP, which improves the quality and usability of the condition monitoring [...] Read more.
In order to solve low-quality problems such as data anomalies and missing data in the condition monitoring data of hydropower units, this paper proposes a monitoring data quality enhancement method based on HDBSCAN-WSGAIN-GP, which improves the quality and usability of the condition monitoring data of hydropower units by combining the advantages of density clustering and a generative adversarial network. First, the monitoring data are grouped according to the density level by the HDBSCAN clustering method in combination with the working conditions, and the anomalies in this dataset are detected, recognized adaptively and cleaned. Further combining the superiority of the WSGAIN-GP model in data filling, the missing values in the cleaned data are automatically generated by the unsupervised learning of the features and the distribution of real monitoring data. The validation analysis is carried out by the online monitoring dataset of the actual operating units, and the comparison experiments show that the clustering contour coefficient (SCI) of the HDBSCAN-based anomaly detection model reaches 0.4935, which is higher than that of the other comparative models, indicating that the proposed model has superiority in distinguishing between the valid samples and anomalous samples. The probability density distribution of the data filling model based on WSGAIN-GP is similar to that of the measured data, and the KL dispersion, JS dispersion and Hellinger’s distance of the distribution between the filled data and the original data are close to 0. Compared with the filling methods such as SGAIN, GAIN, KNN, etc., the effect of data filling with different missing rates is verified, and the RMSE error of data filling with WSGAIN-GP is lower than that of other comparative models. The WSGAIN-GP method has the lowest RMSE error under different missing rates, which proves that the proposed filling model has good accuracy and generalization, and the research results in this paper provide a high-quality data basis for the subsequent trend prediction and state warning. Full article
Show Figures

Figure 1

Figure 1
<p>HDBSCAN algorithm flow chart.</p>
Full article ">Figure 2
<p>Structure of the WSGAIN-GP network.</p>
Full article ">Figure 3
<p>Quality Enhancement Process of Hydropower Unit Monitoring Data Based on HDBSCAN-WSGAIN-GP.</p>
Full article ">Figure 4
<p>Time series of online monitoring data for hydropower units.</p>
Full article ">Figure 5
<p>Unit Measurement Dataset from Fengtan Hydropower Station Unit 2.</p>
Full article ">Figure 6
<p>Three-dimensional dataset of upper guide swing and operating condition parameters from Fengtan Hydropower Station Unit 2.</p>
Full article ">Figure 7
<p>Clustering results of different anomaly detection methods for the upper guide X-axis swing.</p>
Full article ">Figure 8
<p>Clustering results of different anomaly detection methods for the upper guide Y-axis swing.</p>
Full article ">Figure 9
<p>Results of WSGAIN-GP filled head, active and upward-guided swing.</p>
Full article ">Figure 10
<p>Relative frequency distributions after filling in the enhancement with measured data.</p>
Full article ">Figure 11
<p>A complete short-term state sequence dataset for hydropower units.</p>
Full article ">Figure 12
<p>Filling results of each method for the upper guide X-axis swing at different missing rates.</p>
Full article ">Figure 13
<p>Filling results of each method for the head at different missing rates.</p>
Full article ">Figure 13 Cont.
<p>Filling results of each method for the head at different missing rates.</p>
Full article ">Figure 14
<p>Filling results of each method for active power at different missing rates.</p>
Full article ">Figure 14 Cont.
<p>Filling results of each method for active power at different missing rates.</p>
Full article ">Figure 15
<p>Comparison of filling errors (RMSE) for different methods.</p>
Full article ">
20 pages, 2450 KiB  
Article
Evaluation of Density-Based Spatial Clustering for Identifying Genomic Loci Associated with Ischemic Stroke in Genome-Wide Data
by Gennady V. Khvorykh, Nikita A. Sapozhnikov, Svetlana A. Limborska and Andrey V. Khrunin
Int. J. Mol. Sci. 2023, 24(20), 15355; https://doi.org/10.3390/ijms242015355 - 19 Oct 2023
Cited by 2 | Viewed by 1281
Abstract
The genetic architecture of ischemic stroke (IS), which is one of the leading causes of death worldwide, is complex and underexplored. The traditional approach for associative gene mapping is genome-wide association studies (GWASs), testing individual single-nucleotide polymorphisms (SNPs) across the genomes of case [...] Read more.
The genetic architecture of ischemic stroke (IS), which is one of the leading causes of death worldwide, is complex and underexplored. The traditional approach for associative gene mapping is genome-wide association studies (GWASs), testing individual single-nucleotide polymorphisms (SNPs) across the genomes of case and control groups. The purpose of this research is to develop an alternative approach in which groups of SNPs are examined rather than individual ones. We proposed, validated and applied to real data a new workflow consisting of three key stages: grouping SNPs in clusters, inferring the haplotypes in the clusters and testing haplotypes for the association with phenotype. To group SNPs, we applied the clustering algorithms DBSCAN and HDBSCAN to linkage disequilibrium (LD) matrices, representing pairwise r2 values between all genotyped SNPs. These clustering algorithms have never before been applied to genotype data as part of the workflow of associative studies. In total, 883,908 SNPs and insertion/deletion polymorphisms from people of European ancestry (4929 cases and 652 controls) were processed. The subsequent testing for frequencies of haplotypes restored in the clusters of SNPs revealed dozens of genes associated with IS and suggested the complex role that protocadherin molecules play in IS. The developed workflow was validated with the use of a simulated dataset of similar ancestry and the same sample sizes. The results of classic GWASs are also provided and discussed. The considered clustering algorithms can be applied to genotypic data to identify the genomic loci associated with different qualitative traits, using the workflow presented in this research. Full article
(This article belongs to the Section Molecular Genetics and Genomics)
Show Figures

Figure 1

Figure 1
<p>UpSet plot of significant SNPs obtained for different models under the traditional GWAS.</p>
Full article ">Figure 2
<p>The changes in the quality of clusterization by arguments of clusterization functions for chromosome 1.</p>
Full article ">Figure 3
<p>The number of clusters per chromosome obtained by DBSCAN and HDBSCAN.</p>
Full article ">Figure 4
<p>The histograms of cluster sizes (the <span class="html-italic">y</span>-axis was limited to the value of 75).</p>
Full article ">Figure 5
<p>The composition of clusters formed and the heatmap of LD at the genomic region chr17:31860912-31881235 containing significantly associated LD-blocks of four SNPs (red color) identified with the HDSCAN algorithm. The tracks DBSCAN and HDBSCAN show the cluster memberships of SNPs. The SNPs from the same cluster within the track are marked by similar colors. The heatmap of LD represents r<sup>2</sup> values (deep green denotes 1.0 and the white color denotes 0).</p>
Full article ">Figure 6
<p>The distribution of sizes of the LD-blocks significantly associated with IS.</p>
Full article ">Figure 7
<p>Venn diagram illustrating the overlap of the three approaches with each other. The number of risk polymorphisms obtained by classic GWAS and cluster-based approaches (DBSCAN and HDBSCAN) are depicted.</p>
Full article ">
23 pages, 3600 KiB  
Article
A Topic Modeling Approach to Discover the Global and Local Subjects in Membrane Distillation Separation Process
by Ersin Aytaç and Mohamed Khayet
Separations 2023, 10(9), 482; https://doi.org/10.3390/separations10090482 - 2 Sep 2023
Cited by 7 | Viewed by 2358
Abstract
Membrane distillation (MD) is proposed as an environmentally friendly technology of emerging interest able to aid in the resolution of the worldwide water issue and brine processing by producing distilled water and treating high-saline solutions up to their saturation with a view toward [...] Read more.
Membrane distillation (MD) is proposed as an environmentally friendly technology of emerging interest able to aid in the resolution of the worldwide water issue and brine processing by producing distilled water and treating high-saline solutions up to their saturation with a view toward reaching zero liquid discharge (ZLD) at relatively low temperature requirements and a low operating hydrostatic pressure. Topic modeling (TM), which is a Machine Learning (ML) method combined with Natural Language Processing (NLP), is a customizable approach that is ideal for researching massive datasets with unknown themes. In this study, we used BERTopic, a new cutting-edge Python library for topic modeling, to explore the global and local themes in the MD separation literature. By using the BERTopic model, the words describing the collected dataset were detected together with over- and underexplored research topics to guide MD researchers in planning their future works. The results indicated that two global themes are widely discussed and are relevant to MD scientists abroad. In brief, these topics are permeate flux, heat-energy recovery, surface modification, and polyvinylidene fluoride hydrophobic membranes. BERTopic discovered 62 local concepts. The most researched local topics were solar applications, membrane scaling, and electrospun membranes, while the least investigated were boron removal, dairy effluent applications, and nickel wastewater treatment. In addition, the topics were illustrated in a 2D plane to better understand the obtained results. Full article
(This article belongs to the Collection Synthetic Membrane Separation Science and Technology)
Show Figures

Figure 1

Figure 1
<p>Basic process sequence of the BERTopic algorithm used.</p>
Full article ">Figure 2
<p>Yearly MD article publications.</p>
Full article ">Figure 3
<p>Violin plots of significant MD values: (<b>a</b>) publication year; (<b>b</b>) times cited; (<b>c</b>) page count; (<b>d</b>) reference count.</p>
Full article ">Figure 4
<p>Representation of the collection in 2D space (<span class="html-italic">min_cluster_size</span> = 3684).</p>
Full article ">Figure 5
<p>Similarity matrix of the resulting 63 topics.</p>
Full article ">Figure 6
<p>Distribution of the resulting local MD topics in a 2D plane.</p>
Full article ">
25 pages, 6723 KiB  
Article
Ship Trajectory Prediction: An Integrated Approach Using ConvLSTM-Based Sequence-to-Sequence Model
by Wenxiong Wu, Pengfei Chen, Linying Chen and Junmin Mou
J. Mar. Sci. Eng. 2023, 11(8), 1484; https://doi.org/10.3390/jmse11081484 - 25 Jul 2023
Cited by 14 | Viewed by 2546
Abstract
Maritime transportation is one of the major contributors to the development of the global economy. To ensure its safety and reduce the occurrence of a maritime accident, intelligent maritime monitoring and ship behavior identification have been drawing much attention from industry and academia, [...] Read more.
Maritime transportation is one of the major contributors to the development of the global economy. To ensure its safety and reduce the occurrence of a maritime accident, intelligent maritime monitoring and ship behavior identification have been drawing much attention from industry and academia, among which, the accurate prediction of ship trajectory is one of the key questions. This paper proposed a trajectory prediction model integrating the Convolutional LSTM (ConvLSTM) and Sequence to Sequence (Seq2Seq) models to facilitate simultaneous extraction of temporal and spatial features of ship trajectories, thereby enhancing the accuracy of prediction. Firstly, the trajectories are preprocessed using kinematic-based anomaly removal and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to improve the data quality for the training process of trajectory prediction. Secondly, the ConvLSTM-based Seq2seq model is designed to extract temporal and spatial features of the ship trajectory and improve the performance of long-time prediction. Finally, by using real AIS data, the proposed model is compared with the Seq2Seq and Bidirectional LSTM based on attention mechanism (Bi-Attention-LSTM) models to verify its effectiveness. The experimental results demonstrate that the proposed model achieves excellent performance in predicting turning trajectories, good predictive accuracy on straight line motions, and greater improvement in prediction accuracy compared to the other two benchmark models. Overall, the proposed model represents a promising contribution to improving ship trajectory prediction accuracy and may enhance the safety and quality of ship navigation in complex and volatile marine environments. Full article
(This article belongs to the Special Issue New Insights into Safety of Ships and Offshore Structures)
Show Figures

Figure 1

Figure 1
<p>Overall model design flow chart.</p>
Full article ">Figure 2
<p>Structure of ConvLSTM [<a href="#B31-jmse-11-01484" class="html-bibr">31</a>].</p>
Full article ">Figure 3
<p>ConvLSTM cell structure with ReLU activation function.</p>
Full article ">Figure 4
<p>Trajectory prediction model structure.</p>
Full article ">Figure 5
<p>Trajectory Sequence.</p>
Full article ">Figure 6
<p>The raw AIS trajectory data in the research area.</p>
Full article ">Figure 7
<p>Abnormal data removal.</p>
Full article ">Figure 8
<p>Cubic Spline Interpolation: (<b>a</b>) Interpolation of ship trajectory in a curvilinear section. (<b>b</b>) Interpolation of ship trajectory in a rectilinear section.</p>
Full article ">Figure 9
<p>Trajectory preprocessing results.</p>
Full article ">Figure 10
<p>Prediction Results of Dataset 2 (20 min): (<b>a</b>) Predicted trajectory with 5 input and output points; (<b>b</b>) Predicted trajectory with 10 input and output points; (<b>c</b>) Predicted trajectory with 15 input and output points; (<b>d</b>) Predicted trajectory with 20 input and output points; (<b>e</b>) Predicted trajectory with 25 input and output points.</p>
Full article ">Figure 11
<p>Prediction Results of Dataset 3 (20 min): (<b>a</b>) Predicted trajectory with 5 input and output points; (<b>b</b>) Predicted trajectory with 10 input and output points; (<b>c</b>) Predicted trajectory with 15 input and output points; (<b>d</b>) Predicted trajectory with 20 input and output points; (<b>e</b>) Predicted trajectory with 25 input and output points.</p>
Full article ">Figure 12
<p>Prediction Results of Dataset 1 (60 min): (<b>a</b>) Predicted trajectory with 5 input and output points; (<b>b</b>) Predicted trajectory with 10 input and output points; (<b>c</b>) Predicted trajectory with 15 input and output points; (<b>d</b>) Predicted trajectory with 20 input and output points; (<b>e</b>) Predicted trajectory with 25 input and output points.</p>
Full article ">Figure 13
<p>Comparison of three models in Dataset 2 (20 min).</p>
Full article ">Figure 14
<p>Comparison of three models in Dataset 3 (20 min).</p>
Full article ">Figure 15
<p>Comparison of three models in Dataset 1 (60 min).</p>
Full article ">
25 pages, 7955 KiB  
Article
Unstable Approach Detection and Analysis Based on Energy Management and a Deep Neural Network
by Tzu-Ying Chiu and Ying-Chih Lai
Aerospace 2023, 10(6), 565; https://doi.org/10.3390/aerospace10060565 - 16 Jun 2023
Cited by 4 | Viewed by 1817
Abstract
The study of managing risk in aviation is the key to improving flight safety. Compared to the other flight operation phases, the approach and landing phases are more critical and dangerous. This study aims to detect and analyze unstable approaches in Taiwan through [...] Read more.
The study of managing risk in aviation is the key to improving flight safety. Compared to the other flight operation phases, the approach and landing phases are more critical and dangerous. This study aims to detect and analyze unstable approaches in Taiwan through historical flight data. In addition to weather factors such as low visibility and crosswinds, human factors also account for a large part of the risk. From the accidents studied in the stochastic report of the Flight Safety Foundation, nearly 70% of the accidents occurred during the approach and landing phases, which were caused by improper control of aircraft energy. Since the information of the flight data recorder (FDR) is regarded as the airline’s confidential information, this study calculates the aircraft’s energy-related metrics and investigates the influence of non-weather-related factors on unstable approaches through a publicly available source, automatic dependent surveillance-broadcast (ADS-B) flight data. To evaluate the influence of weather- and non-weather-related factors, the outliers of each group classified by weather labels are detected and eliminated from the analysis by applying hierarchical density-based spatial clustering of applications with noise (HDBSCAN), which is utilized for detecting abnormal flights that are spatial anomalies. The deep learning method was adopted to detect and predict unstable arrival flights landing at Taipei Songshan Airport. The accuracy of the prediction for the normalized total energy and trajectory deviation of all flights is 85.15% and 82.11%, respectively. The results show that in different kinds of weather conditions, or not considering the weather, the models have similar good performance. The input features were analyzed after the model was obtained, and the flights detected as abnormal are discussed. Full article
(This article belongs to the Special Issue Machine Learning for Aeronautics)
Show Figures

Figure 1

Figure 1
<p>Research flowchart.</p>
Full article ">Figure 2
<p>METAR and overview of the weather algorithm.</p>
Full article ">Figure 3
<p>Energy boundary.</p>
Full article ">Figure 4
<p>Energy boundary of flight AE1264.</p>
Full article ">Figure 5
<p>Normalized energy boundary.</p>
Full article ">Figure 6
<p>Normalized energy boundary of flight AE1264.</p>
Full article ">Figure 7
<p>Excess energy window selection.</p>
Full article ">Figure 8
<p>Excess area (red area) of normalized energy.</p>
Full article ">Figure 9
<p>Approach zone.</p>
Full article ">Figure 10
<p>Results of outlier detection with HDBSCAN.</p>
Full article ">Figure 11
<p>Data interpolation: (<b>a</b>) Before interpolation; (<b>b</b>) After interpolation.</p>
Full article ">Figure 12
<p>Deep neural network model architecture of cluster A.</p>
Full article ">Figure 13
<p>Training and testing loss of cluster A.</p>
Full article ">Figure 14
<p>Normalized total energy accuracy of cluster A.</p>
Full article ">Figure 15
<p>Trajectory deviation accuracy of cluster A.</p>
Full article ">Figure 16
<p>Boxplot of normalized total energy for all flights.</p>
Full article ">Figure 17
<p>Boxplot of trajectory deviation for all flights.</p>
Full article ">Figure 18
<p>Normalized energy of flight AE1276.</p>
Full article ">Figure 19
<p>Normalized energy of flight AE366.</p>
Full article ">Figure 20
<p>Normalized energy of flight AE374.</p>
Full article ">
Back to TopTop