[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (5,120)

Search Parameters:
Keywords = semantic models

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 1362 KiB  
Article
Joint Translation Method for English–Chinese Place Names Based on Prompt Learning and Knowledge Graph Enhancement
by Hanyou Liu and Xi Mao
ISPRS Int. J. Geo-Inf. 2025, 14(3), 128; https://doi.org/10.3390/ijgi14030128 - 13 Mar 2025
Abstract
In producing English-Chinese bilingual maps, it is usually necessary to translate English place names into Chinese. However, pipeline-based methods for translating place names splits the place name translation task into multiple sub-tasks, carries the risk of error propagation, resulting in lower efficiency and [...] Read more.
In producing English-Chinese bilingual maps, it is usually necessary to translate English place names into Chinese. However, pipeline-based methods for translating place names splits the place name translation task into multiple sub-tasks, carries the risk of error propagation, resulting in lower efficiency and poorer accuracy. Meanwhile, there is relatively little research on place name joint translation. In this regard, the study proposes an English-Chinese place name joint translation method based on prompt learning and knowledge graph enhancement. This method aims to improve the accuracy of English-Chinese place name translation. The proposed method is divided into two parts: The first part is the construction of prompt word template for place name translation. For the translation task of place names, the study first analyzes the characteristics of the transliteration of specific names and the semantic translation of generic names, constructing prompt word templates for the joint translation of ordinary place names. Then, based on the prompt words for ordinary place name translation, it takes into account the translation characteristics of the derived parts in derived place names, constructing a prompt word template for the joint translation of derived place names. Ultimately, leveraging the powerful contextual learning ability of LLM (Large Language Models), it achieves the joint translation of English and Chinese place names. The second part is the construction of the ontology of place name translation knowledge graph. To retrieve relevant knowledge about the input place names, the study designs an ontology for a knowledge graph of place names translation aimed at the English-Chinese place name translation task, combining the needs of English-Chinese place name translation and the semantic relationships between place names. This enhances the contextual information of the input place names and improves the performance of large language models in the English-Chinese place name translation task. Experiments have shown that compared to traditional pipeline-based place name translation methods, the place name translation method proposed in the study has improved performance by 21.26% in ordinary place name translation and an average of 27.70% in the field of derived place name translation. In bilingual map production, the method effectively improves the efficiency and accuracy of toponymic translation. Simultaneously providing reference for place name translation tasks in other languages. Full article
28 pages, 25234 KiB  
Article
Integrating Street View Images, Deep Learning, and sDNA for Evaluating University Campus Outdoor Public Spaces: A Focus on Restorative Benefits and Accessibility
by Tingjin Wu, Deqing Lin, Yi Chen and Jinxiu Wu
Land 2025, 14(3), 610; https://doi.org/10.3390/land14030610 - 13 Mar 2025
Abstract
The mental health of university students has received much attention due to the various pressures of studies, life, and employment. Several studies have confirmed that campus public spaces contain multiple restorative potentials. Yet, the campus public space is still not ready to meet [...] Read more.
The mental health of university students has received much attention due to the various pressures of studies, life, and employment. Several studies have confirmed that campus public spaces contain multiple restorative potentials. Yet, the campus public space is still not ready to meet students’ new need for restorative percetions. Renewal practices for campus public spaces that integrate multi-issues are becoming more important, and further clarification of the measurement methods and optimization pathways is also needed. This study applied the semantic segmentation technique of the deep learning model to extract the feature indicators of outdoor public space based on street view image (SVI) data. The subjective evaluation of small-scale SVIs was obtained using the perceived restorative scale-11 (PRS-11) questionnaire. On this basis, restorative benefit evaluation models were established, including the explanatory and predictive models. The explanatory model used Pearson’s correlation and multiple linear regression analysis to identify the key indicators affecting restorative benefits, and the predictive model used the XGBoost 1.7.3 algorithm to predict the restorative benefit scores on the campus scale. The accessibility results from sDNA were then overlayed to form a comprehensive assessment matrix of restoration benefits and accessibility dimensions to identify further “areas with optimization potential”. In this way, three types of spatial dimensions (LRB-HA, HRB-LA, and LRB-LA) and sequential orders of temporal dimensions (short-term, medium-term, and long-term) were combined to propose optimization pathways for campus public space with the dual control of restorative benefits and accessibility. This study provides methodological guidelines and empirical data for campus regeneration and promotes outdoor public space efficiency. In addition, it can offer positive references for neighborhood-scale urban design and sustainable development. Full article
(This article belongs to the Section Land, Biodiversity, and Human Wellbeing)
Show Figures

Figure 1

Figure 1
<p>The analytical framework of the study.</p>
Full article ">Figure 2
<p>Research workflow that integrates street view image data, deep learning algorithms, and sDNA methods.</p>
Full article ">Figure 3
<p>Overview of the study area: (<b>a</b>) location of Jiangsu province in China, (<b>b</b>) location of Nanjing in Jiangsu province, (<b>c</b>) SEU Jiulonghu campus in Jiangning district, (<b>d</b>) university campus distribution map in Nanjing, and (<b>e</b>) satellite map of Jiulonghu campus.</p>
Full article ">Figure 4
<p>Example of the Baidu SVI collection process.</p>
Full article ">Figure 5
<p>Schematic of image semantic segmentation using PSPNet.</p>
Full article ">Figure 6
<p>Correlation analysis between objective spatial elements (<b>a</b>); morphological indicators (<b>b</b>); and subjective restorative perception scores.</p>
Full article ">Figure 7
<p>Map of predicted restorative benefits for campus public spaces and schematic of representative spaces.</p>
Full article ">Figure 8
<p>Comparison of restorative benefit scores between real-world environment and model prediction.</p>
Full article ">Figure 9
<p>Results of accessibility analysis: 400 m (<b>a</b>), 800 m (<b>b</b>), and 1200 m (<b>c</b>); overlay analysis of accessibility and restoration benefits: 400 m (<b>d</b>), 800 m (<b>e</b>), and 1200 m (<b>f</b>).</p>
Full article ">Figure 10
<p>Optimization pathways of campus public space with the dual control of restorative benefits and accessibility.</p>
Full article ">Figure 11
<p>The sequential order of campus public spaces with dual control of restorative benefits and accessibility.</p>
Full article ">
19 pages, 5625 KiB  
Article
UAV Imagery Real-Time Semantic Segmentation with Global–Local Information Attention
by Zikang Zhang and Gongquan Li
Sensors 2025, 25(6), 1786; https://doi.org/10.3390/s25061786 - 13 Mar 2025
Abstract
In real-time semantic segmentation for drone imagery, current lightweight algorithms suffer from the lack of integration of global and local information in the image, leading to missed detections and misclassifications in the classification categories. This paper proposes a method for the real-time semantic [...] Read more.
In real-time semantic segmentation for drone imagery, current lightweight algorithms suffer from the lack of integration of global and local information in the image, leading to missed detections and misclassifications in the classification categories. This paper proposes a method for the real-time semantic segmentation of drones that integrates multi-scale global context information. The principle utilizes a UNet structure, with the encoder employing a Resnet18 network to extract features. The decoder incorporates a global–local attention module, where the global branch compresses and extracts global information in both vertical and horizontal directions, and the local branch extracts local information through convolution, thereby enhancing the fusion of global and local information in the image. In the segmentation head, a shallow-feature fusion module is used to multi-scale integrate the various features extracted by the encoder, thereby strengthening the spatial information in the shallow features. The model was tested on the UAvid and UDD6 datasets, achieving accuracies of 68% mIoU (mean Intersection over Union) and 67% mIoU on the two datasets, respectively, 10% and 21.2% higher than the baseline model UNet. The real-time performance of the model reached 72.4 frames/s, which is 54.4 frames/s higher than the baseline model UNet. The experimental results demonstrate that the proposed model balances accuracy and real-time performance well. Full article
Show Figures

Figure 1

Figure 1
<p>Two drone images and their semantic annotations from the UAvid and the UDD6 dataset, respectively. (<b>a</b>) Oblique projection, (<b>b</b>) Orthophoto projection.</p>
Full article ">Figure 2
<p>Overall structure diagram of the proposed model. (*: Upsampling).</p>
Full article ">Figure 3
<p>Global–local attention module network structure (GLAM).</p>
Full article ">Figure 4
<p>Shallow layer feature fusion module network structure (SLFM).</p>
Full article ">Figure 5
<p>UDD ablation study. (The image name is: DJI_0538.JPG).</p>
Full article ">Figure 6
<p>UAVid ablation study. (The image name is: Seg5\000400.png).</p>
Full article ">Figure 7
<p>Comparison of Visual Results from Different Models on UAVid Dataset. (The image name is: seq1\00800.png).</p>
Full article ">Figure 8
<p>Comparison of visual results from different models on the UDD6 dataset. (The image name is: DJI_0667.JPG).</p>
Full article ">Figure 9
<p>LoveDA ablation study. (The (<b>a</b>) image name is: urban\4178.png, the (<b>b</b>) image name is: rural\2522.png).</p>
Full article ">Figure 10
<p>Comparison of visual results from different models on the LoveDA dataset. (The (<b>a</b>) image name is: urban\4178.png, the (<b>b</b>) image name is: rural\2522.png).</p>
Full article ">
17 pages, 32249 KiB  
Article
HPRT-DETR: A High-Precision Real-Time Object Detection Algorithm for Intelligent Driving Vehicles
by Xiaona Song, Bin Fan, Haichao Liu, Lijun Wang and Jinxing Niu
Sensors 2025, 25(6), 1778; https://doi.org/10.3390/s25061778 - 13 Mar 2025
Viewed by 76
Abstract
Object detection is essential for the perception systems of intelligent driving vehicles. RT-DETR has emerged as a prominent model. However, its direct application in intelligent driving vehicles still faces issues with the misdetection of occluded or small targets. To address these challenges, we [...] Read more.
Object detection is essential for the perception systems of intelligent driving vehicles. RT-DETR has emerged as a prominent model. However, its direct application in intelligent driving vehicles still faces issues with the misdetection of occluded or small targets. To address these challenges, we propose a High-Precision Real-Time object detection algorithm (HPRT-DETR). We designed a Basic-iRMB-CGA (BIC) Block for a backbone network that efficiently extracts features and reduces the model’s parameters. We thus propose a Deformable Attention-based Intra-scale Feature Interaction (DAIFI) module by combining the Deformable Attention mechanism with the Intra-Scale Feature Interaction module. This enables the model to capture rich semantic features and enhance object detection accuracy in occlusion. The Local Feature Extraction Fusion (LFEF) block was created by integrating the local feature extraction module with the CNN-based Cross-scale Feature Fusion (CCFF) module. This integration expands the model’s receptive field and enhances feature extraction without adding learnable parameters or complex computations, effectively minimizing missed detections of small targets. Experiments on the KITTI dataset show that, compared to RT-DETR, HPRT-DETR improves mAP50 and FPS by 1.98% and 15.25%, respectively. Additionally, its generalization ability is assessed on the SODA 10M dataset, where HPRT-DETR outperforms RT-DETR in most evaluation metrics, confirming the model’s effectiveness. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

Figure 1
<p>HPRT-DETR model structure diagram.</p>
Full article ">Figure 2
<p>Basic Block structure diagram.</p>
Full article ">Figure 3
<p>Inverted Residual Mobile Block (iRMB) and Basic-iRMB-CGA (BIC) Block structure diagram. (<b>a</b>) The iRMB structure diagram; (<b>b</b>) the BIC Block structure diagram.</p>
Full article ">Figure 4
<p>Deformable Attention-based Intra-scale Feature Interaction (DAIFI) module structure diagram.</p>
Full article ">Figure 5
<p>Deformable Attention mechanism structure diagram.</p>
Full article ">Figure 6
<p>Local Feature Extraction Fusion (LFEF) Block and Shift-convolution structure diagram. (<b>a</b>) the LFEF Block structure diagram; (<b>b</b>) the Shift-convolution structure diagram.</p>
Full article ">Figure 7
<p>Comparison of detection results from different models.</p>
Full article ">Figure 8
<p>Comparison of model heatmaps before and after the introduction of the DAIFI module.</p>
Full article ">Figure 9
<p>The visualization results of the RT-DETR and HPRT-DETR models on SODA 10M.</p>
Full article ">
22 pages, 1390 KiB  
Article
Emotion-Aware Embedding Fusion in Large Language Models (Flan-T5, Llama 2, DeepSeek-R1, and ChatGPT 4) for Intelligent Response Generation
by Abdur Rasool, Muhammad Irfan Shahzad, Hafsa Aslam, Vincent Chan and Muhammad Ali Arshad
AI 2025, 6(3), 56; https://doi.org/10.3390/ai6030056 - 13 Mar 2025
Viewed by 66
Abstract
Empathetic and coherent responses are critical in automated chatbot-facilitated psychotherapy. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce Emotion-Aware Embedding Fusion, a novel framework integrating hierarchical fusion and attention [...] Read more.
Empathetic and coherent responses are critical in automated chatbot-facilitated psychotherapy. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce Emotion-Aware Embedding Fusion, a novel framework integrating hierarchical fusion and attention mechanisms to prioritize semantic and emotional features in therapy transcripts. Our approach combines multiple emotion lexicons, including NRC Emotion Lexicon, VADER, WordNet, and SentiWordNet, with state-of-the-art LLMs such as Flan-T5, Llama 2, DeepSeek-R1, and ChatGPT 4. Therapy session transcripts, comprising over 2000 samples, are segmented into hierarchical levels (word, sentence, and session) using neural networks, while hierarchical fusion combines these features with pooling techniques to refine emotional representations. Attention mechanisms, including multi-head self-attention and cross-attention, further prioritize emotional and contextual features, enabling the temporal modeling of emotional shifts across sessions. The processed embeddings, computed using BERT, GPT-3, and RoBERTa, are stored in the Facebook AI similarity search vector database, which enables efficient similarity search and clustering across dense vector spaces. Upon user queries, relevant segments are retrieved and provided as context to LLMs, enhancing their ability to generate empathetic and contextually relevant responses. The proposed framework is evaluated across multiple practical use cases to demonstrate real-world applicability, including AI-driven therapy chatbots. The system can be integrated into existing mental health platforms to generate personalized responses based on retrieved therapy session data. The experimental results show that our framework enhances empathy, coherence, informativeness, and fluency, surpassing baseline models while improving LLMs’ emotional intelligence and contextual adaptability for psychotherapy. Full article
(This article belongs to the Special Issue Multimodal Artificial Intelligence in Healthcare)
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed framework for emotion-aware response generation in LLMs. The Psychotherapy Transcripts Dataset is processed through text extraction and splitting, generating word-level, sentence-level, and session-level embeddings. These embeddings are enriched with external emotional knowledge from lexicons and fused hierarchically using cross-attention and temporal modeling. The fused embeddings undergo neural feature extraction and are stored in FAISS for efficient retrieval. Finally, the retrieved embeddings enhance LLM-generated responses, evaluated based on four quality metrics.</p>
Full article ">Figure 2
<p>Attention weights for contextual and emotionally significant words across LLMs. DeepSeek-R1 assigns high attention to “work” (0.94) and “angry” (0.85), showing strong balance between situational and emotional context, while ChatGPT 4 emphasizes “work” (0.98) and “injustice” (0.90), favoring contextual understanding. Flan-T5 and Llama 2 distribute attention more evenly, highlighting trade-offs between structured response generation and emotional specificity.</p>
Full article ">Figure 3
<p>Comparative analysis of temporal emotion shifts in therapy sessions across LLMs. DeepSeek-R1 excels in emotional adaptability, smoothly transitioning from frustration to calmness. ChatGPT 4 maintains balance, while Flan-T5 struggles with neutrality misclassification. Lexicon integration boosts empathy scores by 20–40%, enhancing emotional awareness but exposing coherence issues in lower-performing models.</p>
Full article ">Figure 4
<p>Performance analysis of LLM and lexicon combinations showing (<b>a</b>) highest improvements (ChatGPT 4 achieves the highest total performance gain, outperforming DeepSeek-R1 by 31.7% in VADER vs. WordNet) and (<b>b</b>) lowest decreases in effectiveness (Llama 2 maintains the most stable coherence, while DeepSeek-R1 and ChatGPT 4 show trade-offs in informativeness and fluency).</p>
Full article ">Figure 5
<p>Comparisons of generated responses from different LLMs with and without affect-enriched embeddings using NRC lexicon.</p>
Full article ">
17 pages, 1381 KiB  
Article
Robust Adversarial Example Detection Algorithm Based on High-Level Feature Differences
by Hua Mu, Chenggang Li, Anjie Peng, Yangyang Wang and Zhenyu Liang
Sensors 2025, 25(6), 1770; https://doi.org/10.3390/s25061770 - 12 Mar 2025
Viewed by 97
Abstract
The threat posed by adversarial examples (AEs) to deep learning applications has garnered significant attention from the academic community. In response, various defense strategies have been proposed, including adversarial example detection. A range of detection algorithms has been developed to differentiate between benign [...] Read more.
The threat posed by adversarial examples (AEs) to deep learning applications has garnered significant attention from the academic community. In response, various defense strategies have been proposed, including adversarial example detection. A range of detection algorithms has been developed to differentiate between benign samples and adversarial examples. However, the detection accuracy of these algorithms is significantly influenced by the characteristics of the adversarial attacks, such as attack type and intensity. Furthermore, the impact of image preprocessing on detection robustness—a common step before adversarial example generation—has been largely overlooked in prior research. To address these challenges, this paper introduces a novel adversarial example detection algorithm based on high-level feature differences (HFDs), which is specifically designed to improve robustness against both attacks and preprocessing operations. For each test image, a counterpart image with the same predicted label is randomly selected from the training dataset. The high-level features of both images are extracted using an encoder and compared through a similarity measurement model. If the feature similarity is low, the test image is classified as an adversarial example. The proposed method was evaluated for detection accuracy against four comparison methods, showing significant improvements over FS, DF, and MD, with a performance comparable to ESRM. Therefore, the subsequent robustness experiments focused exclusively on ESRM. Our results demonstrate that the proposed method exhibits superior robustness against preprocessing operations, such as downsampling and common corruptions, applied by attackers before generating adversarial examples. It is also applicable to various target models. By exploiting semantic conflicts in high-level features between clean and adversarial examples with the same predicted label, the method achieves high detection accuracy across diverse attack types while maintaining resilience to preprocessing, providing a valuable new perspective in the design of adversarial example detection algorithms. Full article
(This article belongs to the Special Issue Advances in Security for Emerging Intelligent Systems)
Show Figures

Figure 1

Figure 1
<p>Two types of high-level features of image pairs: (<b>left</b>) a pair of clean images belonging to the same class ‘parrot’; (<b>right</b>) an adversarial example and a clean image sharing the same predicted class label ‘zebra’.</p>
Full article ">Figure 2
<p>Training of the similarity measurement model. The top panel shows the training sample of a similarity pair, and the bottom panel shows a difference pair.</p>
Full article ">Figure 3
<p>Detection accuracy against attacks with different magnitudes of disturbance on ResNet-50.</p>
Full article ">
16 pages, 1051 KiB  
Article
Kafka’s Literary Style: A Mixed-Method Approach
by Carsten Strathausen, Wenyi Shang and Andrei Kazakov
Humanities 2025, 14(3), 61; https://doi.org/10.3390/h14030061 - 12 Mar 2025
Viewed by 26
Abstract
In this essay, we examine how the polyvalence of meaning in Kafka’s texts is engineered both semantically (on the narrative level) and syntactically (on the linguistic level), and we ask whether a computational approach can shed new light on the long-standing debate about [...] Read more.
In this essay, we examine how the polyvalence of meaning in Kafka’s texts is engineered both semantically (on the narrative level) and syntactically (on the linguistic level), and we ask whether a computational approach can shed new light on the long-standing debate about the major characteristics of Kafka’s literary style. A mixed-method approach means that we seek out points of connection that interlink traditional humanist (i.e., interpretative) and computational (i.e., quantitative) methods of investigation. Following the introduction, the second section of our article provides a critical overview of the existing scholarship from both a humanist and a computational perspective. We argue that the main methodological difference between traditional humanist and AI-enhanced computational studies of Kafka’s literary style lies not in the use of statistics but in the new interpretative possibilities enabled by AI methods to explore stylistic features beyond the scope of human comprehension. In the third and fourth sections of our article, we will introduce our own stylometric approach to Kafka, detail our methods, and interpret our findings. Rather than focusing on training an AI model capable of accurately attributing authorship to Kafka, we examine whether AI could help us detect significant stylistic differences between the writing Kafka himself published during his lifetime (Kafka Core) and his posthumous writings edited and published by Max Brod. Full article
(This article belongs to the Special Issue Franz Kafka in the Age of Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Classification accuracy for Brod’s edition of Kafka.</p>
Full article ">Figure 2
<p>Detailed variation in linguistic traits across the three corpora.</p>
Full article ">Figure A1
<p>Workflow for the classification experiment<a href="#fn014-humanities-14-00061" class="html-fn">14</a>.</p>
Full article ">
27 pages, 1340 KiB  
Article
Asymmetric Training and Symmetric Fusion for Image Denoising in Edge Computing
by Yupeng Zhang and Xiaofeng Liao
Symmetry 2025, 17(3), 424; https://doi.org/10.3390/sym17030424 - 12 Mar 2025
Viewed by 170
Abstract
Effectively handling mixed noise types and varying intensities is crucial for accurate information extraction and analysis, particularly in resource-limited edge computing scenarios. Conventional image denoising approaches struggle with unseen noise distributions, limiting their effectiveness in real-world applications such as object detection, classification, and [...] Read more.
Effectively handling mixed noise types and varying intensities is crucial for accurate information extraction and analysis, particularly in resource-limited edge computing scenarios. Conventional image denoising approaches struggle with unseen noise distributions, limiting their effectiveness in real-world applications such as object detection, classification, and change detection. To address these challenges, we introduce a novel image denoising framework that integrates asymmetric learning with symmetric fusion. It leverages a pretrained model trained only on clean images to provide semantic priors, while a supervised module learns direct noise-to-clean mappings using paired noisy–clean data. The asymmetry in our approach stems from its dual training objectives: a pretrained encoder extracts semantic priors from noise-free data, while a supervised module learns noise-to-clean mappings. The symmetry is achieved through a structured fusion of pretrained priors and supervised features, enhancing generalization across diverse noise distributions, including those in edge computing environments. Extensive evaluations across multiple noise types and intensities, including real-world remote sensing data, demonstrate the superior robustness of our approach. Our method achieves state-of-the-art performance in both in-distribution and out-of-distribution noise scenarios, significantly enhancing image quality for downstream tasks such as environmental monitoring and disaster response. Future work may explore extending this framework to specialized applications like hyperspectral imaging and nighttime analysis while further refining the interplay between symmetry and asymmetry in deep-learning-based image restoration. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Embedded Systems)
Show Figures

Figure 1

Figure 1
<p>Visual Comparison of Denoising Results in Both In-Distribution and Out-of-Distribution Noise Scenarios. The first column presents the input noisy/clean images, the second column shows results from the baseline Masked Denoising (MD), and the third column displays results from our method. In the in-distribution case (top row), our method preserves fine texture details, such as the fabric and facial structure, better than MD. The red box highlights the hat region, where our method effectively eliminates the unnatural green artifacts presented in MD’s output, ensuring accurate color preservation. In the out-of-distribution case (bottom row), our method demonstrates superior generalization, restoring clearer textures and details in the suit and background while maintaining realistic color consistency. The red box highlights the tree region, where our approach successfully recovers fine-grained textures, reducing blurring and preserving structural details compared to MD.</p>
Full article ">Figure 2
<p>Toward Efficient Denoising: Overcoming the Constraints of Self2Self. This figure illustrates the limitations of the Self2Self algorithm and proposes a pathway toward an efficient denoising approach. While Self2Self enables single-image denoising without requiring a dataset by relying on per-image training with an untrained neural network, this method incurs significant computational costs and restricts model complexity to lightweight architectures like UNet. In contrast, an optimized approach eliminates the need for retraining on each image, leveraging more powerful models to achieve superior denoising performance with reduced computational overhead.</p>
Full article ">Figure 3
<p>Denoising results using MAE [<a href="#B37-symmetry-17-00424" class="html-bibr">37</a>]. The images in this figure highlight the denoising performance of the MAE pretrained model on different types and intensities of noise. The model leverages its feature extraction capabilities to address Gaussian and salt-and-pepper noise scenarios. The results indicate that even without specific denoising training, the MAE-based approach can significantly enhance image quality, demonstrating its potential for general-purpose denoising tasks.</p>
Full article ">Figure 4
<p>Clean Prior Block Based on a MAE Pretrained Model. The Clean Prior Block utilizes a noisy input image and a random mask to generate clean image priors. The noisy image is processed through the encoder-decoder structure of the frozen MAE pretrained model. By performing multiple reconstructions, the Prior Synthesis integrates them into a single clean image prior.</p>
Full article ">Figure 5
<p>Symmetric Channel Attention for Noisy and Clean Feature Fusion. The figure illustrates our symmetric attention mechanism in the denoising framework. The top pink block processes a noisy image feature from a supervised learning task, while the bottom green block handles a clean prior feature derived from a pretrained model in a non-denoising task. Operating in parallel, these two blocks form a symmetric attention mechanism that effectively integrates the noisy and clean feature. The integrated output is then passed to the next node in the backbone network.</p>
Full article ">Figure 6
<p>Overall Framework of the Proposed Denoising Model with Attention and Prior Integration. The model processes the noisy image through an Initial Feature Extraction Module and Prior Synthesis to generate the Clean Prior. The Clean Prior is combined with the backbone network features through the Channel Attention Res Block (CARB) and the Channel Attention Prior Block (CAPB), enhancing the integration of features and improving the denoising process. The final denoised image is produced with refined features from both the prior and the backbone layers.</p>
Full article ">Figure 7
<p>Visual comparison of denoising performance on in-distribution Gaussian noise (<math display="inline"><semantics> <mi>σ</mi> </semantics></math> = 35). The noisy and clean images are shown on the left, with a zoomed-in red box highlighting the region of interest, especially the numbers “76” and the letters “IDAHO”. This region is chosen because it contains fine-grained text and sharp edges, making it a challenging area for denoising methods. Effective denoising should remove noise while preserving the integrity of text contours and fine details. The performance of traditional, supervised learning, self-supervised learning and generalization-enhanced methods is compared based on PSNR/SSIM scores. Our method achieves an optimal balance between noise removal and detail preservation, retaining sharper edges and clearer text compared to MD, Restormer, and BM3D.</p>
Full article ">Figure 8
<p>Visual Comparison on Out-of-Distribution (Degradation Level) Noise. The first column shows the noisy and clean reference images, while the following columns present denoised results from different methods. The red box highlights a zoomed-in region for better visualization. This region is chosen because it contains both smooth textures (sky) and structured edges (architectural lines), making it a crucial area for assessing denoising performance. Effective denoising should remove noise while maintaining color consistency in the sky and preserving geometric edges in the architectural structure. Notably, traditional and some self-supervised and supervised learning methods struggle to recover fine details, particularly in the sky region, where color distortions and noise residues remain, and in the architectural structure, where geometric lines are blurred or lost. Our method demonstrates superior generalization by effectively preserving the structural edges, restoring the sky’s smooth texture, and maintaining accurate color consistency across the entire image.</p>
Full article ">Figure 9
<p>Visual Comparison on Out-of-Distribution (Degradation Type) Noise. The first column shows the noisy and clean reference images, while the following columns present denoised results from different methods. The red box highlights a zoomed-in region for better visualization. This region is chosen because it contains both fine textures (train details) and smooth regions (sky), making it a crucial area for assessing denoising performance. Effective denoising should remove noise artifacts while preserving textural sharpness on the train and maintaining smooth color transition in the sky. Notably, traditional and some supervised learning methods struggle to restore fine details, particularly in the sky region, where noise artifacts remain visible, and in the train windows, where textural details and sharpness are lost. Furthermore, some methods introduce noticeable color distortions, failing to preserve the overall contrast and intensity balance of the scene. Our method demonstrates superior generalization by effectively removing noise while maintaining the structural integrity of the train, restoring the sky’s smoothness, and preserving accurate color fidelity.</p>
Full article ">Figure 10
<p>Visual Comparison on Remote Sensing Dataset. (<b>a</b>–<b>e</b>) represent different image samples from the dataset. Each row corresponds to a different denoising method: Noisy input, Clean reference, N2C, MD, and Ours. The numerical values indicate PSNR/SSIM scores for each sample, demonstrating the effectiveness of different denoising approaches.</p>
Full article ">
14 pages, 48905 KiB  
Article
RSM-Optimizer: Branch Optimization for Dual- or Multi-Branch Semantic Segmentation Networks
by Xiaohong Zhang, Wenwen Zong and Yaning Jiang
Electronics 2025, 14(6), 1109; https://doi.org/10.3390/electronics14061109 - 11 Mar 2025
Viewed by 86
Abstract
Semantic segmentation is a crucial task in the field of computer vision, with important applications in areas such as autonomous driving, medical image analysis, and remote sensing image analysis. Dual-branch and multi-branch semantic segmentation networks that leverage deep learning technologies can enhance both [...] Read more.
Semantic segmentation is a crucial task in the field of computer vision, with important applications in areas such as autonomous driving, medical image analysis, and remote sensing image analysis. Dual-branch and multi-branch semantic segmentation networks that leverage deep learning technologies can enhance both segmentation accuracy and speed. These networks typically contain a semantic branch and a context branch. However, the feature maps in the detail branch are limited to a single type of receptive field, which limits models’ abilities to perceive objects at different scales. During the feature map fusion process, low-resolution feature maps from the semantic branch are upsampled with a large factor to match the feature maps in the detail branch. Unfortunately, these upsampling operations inevitably introduce noise. To address these issues, we propose several improvements to optimize the detail and semantic branches. We first design a receptive field-driven feature enhancement module to enrich the receptive fields of feature maps in the detail branch. Then, we propose a stepwise upsampling and fusion module to reduce the noise introduced during the upsampling process of feature fusion. Finally, we introduce a pyramid mixed pooling module (PMPM) to improve models’ abilities to perceive objects of different shapes. Considering the diversity of objects in terms of scale, shape, and category in urban street scene data, we carried out experiments on the Cityscapes and CamVid datasets. The experimental results on both datasets validate the effectiveness and efficiency of the proposed improvements. Full article
Show Figures

Figure 1

Figure 1
<p>PIDNet with our improvements.</p>
Full article ">Figure 2
<p>The structure of RF-FEM.</p>
Full article ">Figure 3
<p>The structure of SUFM (top) and WFB (bottom).</p>
Full article ">Figure 4
<p>The structure of MPM.</p>
Full article ">Figure 5
<p>Visual segmentation results of the Cityscapes dataset.</p>
Full article ">
22 pages, 3652 KiB  
Article
Named Entity Recognition in Online Medical Consultation Using Deep Learning
by Ze Hu, Wenjun Li and Hongyu Yang
Appl. Sci. 2025, 15(6), 3033; https://doi.org/10.3390/app15063033 - 11 Mar 2025
Viewed by 157
Abstract
Named entity recognition in online medical consultation aims to address the challenge of identifying various types of medical entities within complex and unstructured social text in the context of online medical consultations. This can provide important data support for constructing more powerful online [...] Read more.
Named entity recognition in online medical consultation aims to address the challenge of identifying various types of medical entities within complex and unstructured social text in the context of online medical consultations. This can provide important data support for constructing more powerful online medical consultation knowledge graphs and improving virtual intelligent health assistants. A dataset of 26 medical entity types for named entity recognition for online medical consultations is first constructed. Then, a novel approach for deep named entity recognition in the medical field based on the fusion context mechanism is proposed. This approach captures enhanced local and global contextual semantic representations of online medical consultation text while simultaneously modeling high- and low-order feature interactions between local and global contexts, thereby effectively improving the sequence labeling performance. The experimental results show that the proposed approach can effectively identify 26 medical entity types with an average F1 score of 85.47%, outperforming the state-of-the-art (SOTA) method. The practical significance of this study lies in improving the efficiency and performance of domain-specific knowledge extraction in online medical consultation, supporting the development of virtual intelligent health assistants based on large language models and enabling real-time intelligent medical decision-making, thereby helping patients and their caregivers access common medical information more promptly. Full article
Show Figures

Figure 1

Figure 1
<p>Framework for a medical named entity recognition model utilizing a fusion context mechanism. <math display="inline"><semantics> <mrow> <msub> <mi>W</mi> <mi>F</mi> </msub> </mrow> </semantics></math> denotes the weight matrix of the fully connected fusion layer; <math display="inline"><semantics> <mo>⊗</mo> </semantics></math> and <math display="inline"><semantics> <mo>⊕</mo> </semantics></math>, respectively, denote vector concatenation and addition.</p>
Full article ">Figure 2
<p>Architecture diagram of the pretrained model component.</p>
Full article ">Figure 3
<p>Architecture diagram of stacked CNN with local context mechanism component.</p>
Full article ">Figure 4
<p>Architecture diagram of BiLSTM with global context mechanism component.</p>
Full article ">Figure 5
<p>Architecture diagram of deep factorization machine with fusion context mechanism component.</p>
Full article ">Figure 6
<p>An example of the MCRF component workflow.</p>
Full article ">Figure 7
<p>The effect of the number of epochs on the F1 score during a single experimental cycle. M13 represents our proposed approach.</p>
Full article ">
20 pages, 10686 KiB  
Article
Parametric GIS and HBIM for Archaeological Site Management and Historic Reconstruction Through 3D Survey Integration
by Marco Limongiello, Daniela Musmeci, Lorenzo Radaelli, Antonio Chiumiento, Andrea di Filippo and Ilaria Limongiello
Remote Sens. 2025, 17(6), 984; https://doi.org/10.3390/rs17060984 - 11 Mar 2025
Viewed by 216
Abstract
This study presents a practical methodology for integrating the multiscale spatial information of archaeological sites by combining Geographic Information Systems (GISs) with Historic Building Information Modelling (HBIM). The methodology categorises and integrates data based on its type and geometric scale, leveraging advanced 3D [...] Read more.
This study presents a practical methodology for integrating the multiscale spatial information of archaeological sites by combining Geographic Information Systems (GISs) with Historic Building Information Modelling (HBIM). The methodology categorises and integrates data based on its type and geometric scale, leveraging advanced 3D surveying techniques alongside semantic and parametric modelling tools. A multiscale system is proposed to manage heterogeneous geospatial data efficiently, enabling the development of enriched geometric models with detailed semantic and parametric attributes. The effectiveness of this approach is demonstrated through a case study of the Archaeological Area of Ancient “Abellinum”, showcasing seamless integration between HGISs and HBIM across multiple levels of detail. This work highlights the potential for enhanced management and the interpretation of archaeological heritage using innovative digital methodologies, highlighting the importance of representation in documenting historical transformations. Full article
Show Figures

Figure 1

Figure 1
<p>Flow chart proposed for the complete knowledge of an archaeological building.</p>
Full article ">Figure 2
<p>Atripalda. The archaeological remains on the Civita plateau. A: Cava Guanci; B: <span class="html-italic">cryptoporticus</span> and the <span class="html-italic">cardo</span>; C: the public baths; D: the <span class="html-italic">decumanus</span>; E: the <span class="html-italic">domus</span> of <span class="html-italic">Vipsanius Primigenius</span>; in the blue frame, the excavation areas of the “<span class="html-italic">Abellinum</span>” project.</p>
Full article ">Figure 3
<p>A: two partially investigated buildings along the <span class="html-italic">decumanus</span>, west to the <span class="html-italic">domus</span> of <span class="html-italic">Vipsanius Primigenius</span>. B: rooms of another building, in the city block east to the <span class="html-italic">domus</span>.</p>
Full article ">Figure 4
<p>Process of georeferencing and integration of point cloud from range-based and image-based sensors.</p>
Full article ">Figure 5
<p>Macro/regional level: The upper and middle valley of the Sabato river, with an indication of the main archaeological sites. The location of the ancient city of Abellinum is marked with a yellow pentagon.</p>
Full article ">Figure 6
<p>Local/medium/site level: Main archaeological features were identified through excavations, surveys, and geophysical surveys, botanical sampling, and the reconstruction of the hypothesised urban scheme. The map base consists of a DTM from LIDAR data (1 m) overlaid with analytical hillshading (315° azimuth, 35° elevation). The map shows the location of the site along the Sabato River.</p>
Full article ">Figure 7
<p>Micro/inter-site level: Map of Saggio 1 (August 2023), where the different stratigraphic units (US) are indicated. The map shows the location of the excavation trench within the Civita plateau.</p>
Full article ">Figure 8
<p>Examples of HBIM from the point cloud of the columns of the domus and some walls through the subtraction process.</p>
Full article ">Figure 9
<p>Virtual reconstruction and assignment of historical phases through parameterization in HBIM environment.</p>
Full article ">Figure 10
<p>(<b>a</b>) Axonometric view of the Imperial phase modelled in a BIM environment, (<b>b</b>) the modelling of the interiors and the longitudinal section of the domus, and (<b>c</b>) the rendering of the atrium of the domus in the Imperial phase (I–III sec A.D.).</p>
Full article ">
18 pages, 7130 KiB  
Article
Improving Cerebrovascular Imaging with Deep Learning: Semantic Segmentation for Time-of-Flight Magnetic Resonance Angiography Maximum Intensity Projection Image Enhancement
by Tomonari Yamada, Takaaki Yoshimura, Shota Ichikawa and Hiroyuki Sugimori
Appl. Sci. 2025, 15(6), 3034; https://doi.org/10.3390/app15063034 - 11 Mar 2025
Viewed by 183
Abstract
Magnetic Resonance Angiography (MRA) is widely used for cerebrovascular assessment, with Time-of-Flight (TOF) MRA being a common non-contrast imaging technique. However, maximum intensity projection (MIP) images generated from TOF-MRA often include non-essential vascular structures such as external carotid branches, requiring manual editing for [...] Read more.
Magnetic Resonance Angiography (MRA) is widely used for cerebrovascular assessment, with Time-of-Flight (TOF) MRA being a common non-contrast imaging technique. However, maximum intensity projection (MIP) images generated from TOF-MRA often include non-essential vascular structures such as external carotid branches, requiring manual editing for accurate visualization of intracranial arteries. This study proposes a deep learning-based semantic segmentation approach to automate the removal of these structures, enhancing MIP image clarity while reducing manual workload. Using DeepLab v3+, a convolutional neural network model optimized for segmentation accuracy, the method achieved an average Dice Similarity Coefficient (DSC) of 0.9615 and an Intersection over Union (IoU) of 0.9261 across five-fold cross-validation. The developed system processed MRA datasets at an average speed of 16.61 frames per second, demonstrating real-time feasibility. A dedicated software tool was implemented to apply the segmentation model directly to DICOM images, enabling fully automated MIP image generation. While the model effectively removed most external carotid structures, further refinement is needed to improve venous structure suppression. These results indicate that deep learning can provide an efficient and reliable approach for automated cerebrovascular image processing, with potential applications in clinical workflows and neurovascular disease diagnosis. Full article
(This article belongs to the Special Issue MR-Based Neuroimaging)
Show Figures

Figure 1

Figure 1
<p>MIP images from different views and arteries that hinder the evaluation of the internal carotid system. (<b>a</b>) Sagittal MIP image, (<b>b</b>) coronal MIP image, and (<b>c</b>) axial MIP image. Yellow arrow: superficial temporal artery, red arrow: middle meningeal artery, and light blue arrow: occipital artery.</p>
Full article ">Figure 2
<p>Example of manually annotated training data at the basal ganglia level. (<b>a</b>) Original TOF-MRA axial image (red circles indicate the superficial temporal artery). (<b>b</b>) Manually defined region of interest (ROI) in blue, excluding the external carotid artery and non-intracranial structures. (<b>c</b>) Corresponding indexPNG mask for semantic segmentation training.</p>
Full article ">Figure 3
<p>Example of manually annotated training data at the orbital level. (<b>a</b>) Original TOF-MRA axial image (red line indicates the line connecting both optic nerves’ endpoints). (<b>b</b>) Manually defined region of interest (ROI) in blue, excluding the external carotid artery and non-intracranial structures. (<b>c</b>) Corresponding indexPNG mask for semantic segmentation training.</p>
Full article ">Figure 4
<p>Example of manually annotated training data at the temporal–occipital junction level. (<b>a</b>) Original TOF-MRA axial image (inside red line indicates internal carotid arteries and vertebrobasilar arteries). (<b>b</b>) Manually defined region of interest (ROI) in blue, excluding the external carotid artery and non-intracranial structures. (<b>c</b>) Corresponding indexPNG mask for semantic segmentation training.</p>
Full article ">Figure 5
<p>Example of manually annotated training data at the skull base level. (<b>a</b>) Original TOF-MRA axial image (red circles indicate the middle meningeal artery). (<b>b</b>) Manually defined region of interest (ROI) in blue, excluding the external carotid artery and non-intracranial structures. (<b>c</b>) Corresponding indexPNG mask for semantic segmentation training.</p>
Full article ">Figure 6
<p>Overview of the developed software for automated segmentation and MIP image generation.</p>
Full article ">Figure 7
<p>Comparison of MIP images from different views. (<b>a</b>) Sagittal MIP image, (<b>b</b>) coronal MIP image, and (<b>c</b>) axial MIP image, and corresponding segmented MIP images (<b>a’</b>–<b>c’</b>).</p>
Full article ">
22 pages, 6129 KiB  
Article
A Novel Machine Vision-Based Collision Risk Warning Method for Unsignalized Intersections on Arterial Roads
by Zhongbin Luo, Yanqiu Bi, Qing Ye, Yong Li and Shaofei Wang
Electronics 2025, 14(6), 1098; https://doi.org/10.3390/electronics14061098 - 11 Mar 2025
Viewed by 71
Abstract
To address the critical need for collision risk warning at unsignalized intersections, this study proposes an advanced predictive system combining YOLOv8 for object detection, Deep SORT for tracking, and Bi-LSTM networks for trajectory prediction. To adapt YOLOv8 for complex intersection scenarios, several architectural [...] Read more.
To address the critical need for collision risk warning at unsignalized intersections, this study proposes an advanced predictive system combining YOLOv8 for object detection, Deep SORT for tracking, and Bi-LSTM networks for trajectory prediction. To adapt YOLOv8 for complex intersection scenarios, several architectural enhancements were incorporated. The RepLayer module replaced the original C2f module in the backbone, integrating large-kernel depthwise separable convolution to better capture contextual information in cluttered environments. The GIoU loss function was introduced to improve bounding box regression accuracy, mitigating the issues related to missed or incorrect detections due to occlusion and overlapping objects. Furthermore, a Global Attention Mechanism (GAM) was implemented in the neck network to better learn both location and semantic information, while the ReContext gradient composition feature pyramid replaced the traditional FPN, enabling more effective multi-scale object detection. Additionally, the CSPNet structure in the neck was substituted with Res-CSP, enhancing feature fusion flexibility and improving detection performance in complex traffic conditions. For tracking, the Deep SORT algorithm was optimized with enhanced appearance feature extraction, reducing the identity switches caused by occlusions and ensuring the stable tracking of vehicles, pedestrians, and non-motorized vehicles. The Bi-LSTM model was employed for trajectory prediction, capturing long-range dependencies to provide accurate forecasting of future positions. The collision risk was quantified using the predictive collision risk area (PCRA) method, categorizing risks into three levels (danger, warning, and caution) based on the predicted overlaps in trajectories. In the experimental setup, the dataset used for training the model consisted of 30,000 images annotated with bounding boxes around vehicles, pedestrians, and non-motorized vehicles. Data augmentation techniques such as Mosaic, Random_perspective, Mixup, HSV adjustments, Flipud, and Fliplr were applied to enrich the dataset and improve model robustness. In real-world testing, the system was deployed as part of the G310 highway safety project, where it achieved a mean Average Precision (mAP) of over 90% for object detection. Over a one-month period, 120 warning events involving vehicles, pedestrians, and non-motorized vehicles were recorded. Manual verification of the warnings indicated a prediction accuracy of 97%, demonstrating the system’s reliability in identifying potential collisions and issuing timely warnings. This approach represents a significant advancement in enhancing safety at unsignalized intersections in urban traffic environments. Full article
(This article belongs to the Special Issue Computer Vision and Image Processing in Machine Learning)
Show Figures

Figure 1

Figure 1
<p>Unsignalized intersection scenario.</p>
Full article ">Figure 2
<p>The overall structure of the proposed system.</p>
Full article ">Figure 3
<p>Overall framework diagram of RGGE-YOLOv8.</p>
Full article ">Figure 4
<p>Overall framework diagram of RGGE-YOLOv8 structure of Deep SORT.</p>
Full article ">Figure 5
<p>Workflow diagram of Deep SORT algorithm.</p>
Full article ">Figure 6
<p>Pixel coordinate conversion diagram.</p>
Full article ">Figure 7
<p>Bi-LSTM network architecture.</p>
Full article ">Figure 8
<p>LSTM modules.</p>
Full article ">Figure 9
<p>LSTM memory cell processing workflow.</p>
Full article ">Figure 10
<p>Bi-LSTM model schematic diagram.</p>
Full article ">Figure 11
<p>PCRA levels.</p>
Full article ">Figure 12
<p>Trajectory tracking performance of different models.</p>
Full article ">Figure 13
<p>The impact of different prediction trajectory lengths on prediction accuracy.</p>
Full article ">Figure 14
<p>Proactive safety warning system for unsignalized intersections.</p>
Full article ">
21 pages, 4729 KiB  
Article
Enhancing Hierarchical Classification in Tree-Based Models Using Level-Wise Entropy Adjustment
by Olga Narushynska, Anastasiya Doroshenko, Vasyl Teslyuk, Volodymyr Antoniv and Maksym Arzubov
Big Data Cogn. Comput. 2025, 9(3), 65; https://doi.org/10.3390/bdcc9030065 - 11 Mar 2025
Viewed by 129
Abstract
Hierarchical classification, which organizes items into structured categories and subcategories, has emerged as a powerful solution for handling large and complex datasets. However, traditional flat classification approaches often overlook the hierarchical dependencies between classes, leading to suboptimal predictions and limited interpretability. This paper [...] Read more.
Hierarchical classification, which organizes items into structured categories and subcategories, has emerged as a powerful solution for handling large and complex datasets. However, traditional flat classification approaches often overlook the hierarchical dependencies between classes, leading to suboptimal predictions and limited interpretability. This paper addresses these challenges by proposing a novel integration of tree-based models with hierarchical-aware split criteria through adjusted entropy calculations. The proposed method calculates entropy at multiple hierarchical levels, ensuring that the model respects the taxonomic structure during training. This approach aligns statistical optimization with class semantic relationships, enabling more accurate and coherent predictions. Experiments conducted on real-world datasets structured according to the GS1 Global Product Classification (GPC) system demonstrate the effectiveness of our method. The proposed model was applied using tree-based ensemble methods combined with the newly developed hierarchy-aware metric Penalized Information Gain (PIG). PIG was implemented with level-wise entropy adjustments, assigning greater weight to higher hierarchical levels to maintain the taxonomic structure. The model was trained and evaluated on two real-world datasets based on the GS1 Global Product Classification (GPC) system. The final dataset included approximately 30,000 product descriptions spanning four hierarchical levels. An 80-20 train–test split was used, with model hyperparameters optimized through 5-fold cross-validation and Bayesian search. The experimental results showed a 12.7% improvement in classification accuracy at the lowest hierarchy level compared to traditional flat classification methods, with significant gains in datasets featuring highly imbalanced class distributions and deep hierarchies. The proposed approach also increased the F1 score by 12.6%. Despite these promising results, challenges remain in scaling the model for very large datasets and handling classes with limited training samples. Future research will focus on integrating neural networks with hierarchy-aware metrics, enhancing data augmentation to address class imbalance, and developing real-time classification systems for practical use in industries such as retail, logistics, and healthcare. Full article
(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)
Show Figures

Figure 1

Figure 1
<p>Hierarchical taxonomy of product classification with example assignments.</p>
Full article ">Figure 2
<p>Class (bricks level) proportion for train and test datasets.</p>
Full article ">Figure 3
<p>Performance evaluation of hierarchical classification: F1 score, precision, and recall across taxonomy levels.</p>
Full article ">Figure 3 Cont.
<p>Performance evaluation of hierarchical classification: F1 score, precision, and recall across taxonomy levels.</p>
Full article ">Figure 4
<p>ROC curve comparison for hierarchical classification models across multiple classes for brick level.</p>
Full article ">Figure 5
<p>Multimodel normalized confusion matrices across hierarchical segment level.</p>
Full article ">Figure 6
<p>Multimodel normalized confusion matrices across hierarchical family level.</p>
Full article ">
27 pages, 14625 KiB  
Article
Generative Architectural Design from Textual Prompts: Enhancing High-Rise Building Concepts for Assisting Architects
by Feng Yang and Wenliang Qian
Appl. Sci. 2025, 15(6), 3000; https://doi.org/10.3390/app15063000 - 10 Mar 2025
Viewed by 253
Abstract
In the early stages of architectural design, architects convert initial ideas into concrete design schemes, which heavily rely on their creativity and consume considerable time. Therefore, generative design methods based on artificial intelligence are promising for such tasks. However, effectively communicating design concepts [...] Read more.
In the early stages of architectural design, architects convert initial ideas into concrete design schemes, which heavily rely on their creativity and consume considerable time. Therefore, generative design methods based on artificial intelligence are promising for such tasks. However, effectively communicating design concepts to machines is challenging. To address this challenge, this paper proposes a novel cross-model approach for architectural design concepts using textual descriptions to assist architects, comprising a design concept extraction module and an architectural appearance generation module. The design concept extraction module adopts a contrastive learning framework to yield a text encoder with semantic extraction. Subsequently, the architectural appearance generation module proposes a novel deep sparse and text fusion generative adversarial network to convert the extracted design concept semantics into conceptual sketches, utilizing the unique sparsity of sketches. Additionally, it employs the pre-trained latent stable diffusion model to generate realistic and diverse high-rise building renderings, simulating the recreation processes of architects. The generated designs are evaluated qualitatively and quantitatively and further compared with existing real-life buildings to demonstrate the effectiveness of the proposed method. Furthermore, we demonstrate the feasibility of applying the proposed methodology in the early stages of architectural design by modeling a generated design. Full article
Show Figures

Figure 1

Figure 1
<p>Overall architecture of the proposed method. DCE and AAG refer to the design concept extraction and architectural appearance generation modules, respectively. The AAG module contains a deep sparse and text fusion generative adversarial network (DSTF-GAN) and a pre-trained rendering (PR) module.</p>
Full article ">Figure 2
<p>Framework of the contrastive text–image matching model.</p>
Full article ">Figure 3
<p>Overall architecture of the proposed AAG module.</p>
Full article ">Figure 4
<p>Proposed DSTF-GAN architecture for text-to-conceptual sketch generation. <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">z</mi> </mrow> </semantics></math> is a 200-dimensional latent code sampled from a normal distribution, and <math display="inline"><semantics> <mrow> <mi mathvariant="bold-italic">T</mi> </mrow> </semantics></math> is a 512-dimensional text semantic vector of textual description. FC, Conv, DeConv, BatchNorm, InstanceNorm, ReLU, LeakyReLU, and Tanh denote the fully connected layer, convolutional layer, deconvolutional layer, batch normalization layer, instance normalization layer, rectified linear unit activation function, LeakyReLU activation function, and hyperbolic tangent activation function, respectively. Additionally, the STF module, which fuses the sparsity and text features into sketches, refers to the sparse text fusion blocks. The detailed network architecture is shown in <a href="#applsci-15-03000-t0A1" class="html-table">Table A1</a> in <a href="#app1-applsci-15-03000" class="html-app">Appendix A</a>.</p>
Full article ">Figure 5
<p>Architectures of the sparse and text fusion blocks. The detailed network architecture is shown in <a href="#applsci-15-03000-t0A2" class="html-table">Table A2</a> in <a href="#app1-applsci-15-03000" class="html-app">Appendix A</a>. MLPs, Conv, DeConv, ReLU, and LeakyReLU denote the multi-layer perceptron, Conv, DeConv, rectified linear unit activation function, and LeakyReLU activation function, respectively.</p>
Full article ">Figure 6
<p>Reshaping, background removal, and sketch conversion of high-rise building images using the XDoG operator.</p>
Full article ">Figure 7
<p>Data augmentation using the pre-trained latent stable diffusion model and ChatGPT.</p>
Full article ">Figure 8
<p>Matching the accuracy of experiments in different divisions of the dataset with different data augmentation techniques. “Original” denotes training the CTIMM without any data augmentations. “DA” denotes training the model with scaling and cropping operations for data augmentation. “SD” and “GPT” denote that the pre-trained latent stable diffusion model and ChatGPT are adopted for data augmentation, respectively.</p>
Full article ">Figure 9
<p>One batch of matching results of “DA and SD and GPT” on the validation set of Division 1.</p>
Full article ">Figure 9 Cont.
<p>One batch of matching results of “DA and SD and GPT” on the validation set of Division 1.</p>
Full article ">Figure 10
<p>Comparison of semantic consistency between generated high-rise conceptual sketches and textual descriptions. The semantic correspondences highlighted in the text are indicated by colored bounding boxes in the sketches.</p>
Full article ">Figure 11
<p>Creative comparison of generated high-rise conceptual sketches with real sketches.</p>
Full article ">Figure 12
<p>Rendering high-rise building appearances from generated conceptual sketches using the pre-trained <span class="html-italic">stable-diffusion-v1-5</span> model.</p>
Full article ">Figure 13
<p>Comparisons of generated high-rise building appearances directly from the pre-trained latent stable diffusion model and the proposed two-stage method. (<b>a</b>) Comparison of high-rise type building generation results. (<b>b</b>) Comparison of tower type building generation results.</p>
Full article ">Figure 14
<p>Comparative representative examples of generated sketches and renderings with existing buildings. The generated sketches and existing buildings are out of the training dataset, and the corresponding renderings are similar to the main features of some existing buildings.</p>
Full article ">Figure 15
<p>Examples of high-rise sketches synthesized by different text fusion methods.</p>
Full article ">Figure 16
<p>Three-dimensional conceptualization of a rendered high-rise building appearance in an office building type.</p>
Full article ">Figure A1
<p>Work interface for generating high-rise sketches and renderings from input text prompts within a Python framework. The interface integrates the pre-trained text encoder from the DCE module for semantic text extraction, the trained DSTF-GAN for sketch generation, and the pre-trained <span class="html-italic">stable-diffusion-v1-5</span> model for rendering.</p>
Full article ">
Back to TopTop