[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (521)

Search Parameters:
Keywords = vision-based interaction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 57199 KiB  
Article
CM-UNet++: A Multi-Level Information Optimized Network for Urban Water Body Extraction from High-Resolution Remote Sensing Imagery
by Jiangchen Cai, Liufeng Tao and Yang Li
Remote Sens. 2025, 17(6), 980; https://doi.org/10.3390/rs17060980 (registering DOI) - 11 Mar 2025
Viewed by 101
Abstract
Urban water bodies are crucial in urban planning and flood detection, and they are susceptible to changes due to climate change and rapid urbanization. With the development of high-resolution remote sensing technology and the success of semantic segmentation using deep learning in computer [...] Read more.
Urban water bodies are crucial in urban planning and flood detection, and they are susceptible to changes due to climate change and rapid urbanization. With the development of high-resolution remote sensing technology and the success of semantic segmentation using deep learning in computer vision, it is possible to extract urban water bodies from high-resolution remote sensing images. However, many urban water bodies are small, oddly shaped, silted, or spectrally similar to other objects, making their extraction extremely challenging. In this paper, we propose a neural network named CM-UNet++, a combination of the dense-skip module based on UNet++ and the CSMamba module to encode different levels’ information with interactions and then extract global and local information at each level. We use a size-weighted auxiliary loss function to balance feature maps of different levels. Additionally, features beyond RGB are incorporated into the input of the neural network to enhance the distinction between water bodies and other objects. We produced a labeled urban water extraction dataset, and experiments on this dataset show that CM-UNet++ attains 0.8781 on the IOU (intersection over union) metric, which indicates that this method outperforms other recent semantic segmentation methods and achieves better completeness, connectivity, and boundary accuracy. The proposed dense-skip module and CSMamba module significantly improve the extraction of small and spectrally indistinct water bodies. Furthermore, experiments on a public dataset confirm the method’s robustness. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Examples of urban water bodies. (<b>a</b>): Small water bodies on the construction site; (<b>b</b>): Tiny ditches; (<b>c</b>): Seasonal siltation (<b>d</b>): Roads shaded by buildings that spectral similar to water. Corresponding areas are marked with red boxes.</p>
Full article ">Figure 2
<p>The structure of the proposed CM-UNet++ network.</p>
Full article ">Figure 3
<p>Visualization examples of NIR, NDWI, and NDVI. (<b>a</b>): NIR for water and shadows; (<b>b</b>): NDWI for water and siltation; (<b>c</b>): NDVI for water and agricultural land.</p>
Full article ">Figure 4
<p>CSMamba Block.</p>
Full article ">Figure 5
<p>Visualization examples of the AUWED dataset. (<b>a</b>): Hefei City; (<b>b</b>): Lu’an City; (<b>c</b>): Yingshang County.</p>
Full article ">Figure 6
<p>Visualization examples of the KWSD2 dataset. (<b>a</b>): Water bodies near residential areas; (<b>b</b>): Water bodies near roads; (<b>c</b>): Small ponds in urban areas.</p>
Full article ">Figure 7
<p>Examples of results from CM-UNet++ on the AUWED dataset. (<b>a</b>): Water bodies near building shadows; (<b>b</b>): Water bodies near siltation and bridges; (<b>c</b>): Water bodies with spectrally similar surroundings; (<b>d</b>): Narrow ditches.</p>
Full article ">Figure 8
<p>Examples of results from CM-UNet++ on the KWSD2 dataset. (<b>a</b>): River and small ponds; (<b>b</b>,<b>c</b>): Water body with slender branches; (<b>d</b>): Slender water bodies with spectrally insignificant surroundings.</p>
Full article ">Figure 9
<p>Examples of results from the ablation study on input features. (<b>a</b>): NIR distinguished water from shadows; (<b>b</b>): NDWI significantly reduced the portion of siltation misclassified as water; (<b>c</b>): NDVI improved discrimination between farmland and water bodies. Corresponding areas are marked with red boxes.</p>
Full article ">Figure 10
<p>Examples of results from the ablation study on network structure. (<b>a</b>): Network without CSMamba block failed to extract water bodies near the edge; (<b>b</b>): Only when both modules introduced, boundary of the water body near the bank and bridge was accurately extracted; (<b>c</b>): The small water body adjacent to a bright building was successfully extracted only when both modules included; (<b>d</b>): Dense-skip module enhanced the network’s competence of extracting ditches. Corresponding areas are marked with red boxes.</p>
Full article ">Figure 11
<p>Examples of results from the ablation study on loss functions. (<b>a</b>): SWA loss improved the completeness of ditches and precludes siltation; (<b>b</b>): SWA loss successfully distinguished water from roads and shadows; (<b>c</b>): SWA loss enhanced the ability of extracting small water bodies; (<b>d</b>): SWA loss improved the recognition capability of water bodies spectrally different to others. Corresponding areas are marked with red boxes.</p>
Full article ">Figure 12
<p>Examples of comparative results with other semantic segmentation methods on the AUWED dataset. (<b>a</b>,<b>b</b>): CM-UNet++ and DeeplabV3+ are more effective for extracting narrow water bodies than other methods; (<b>c</b>): Only CM-UNet++ successfully extracted most part of the spectrally insignificant pond; (<b>d</b>): CM-UNet++ performs well for boundaries than other methods. Corresponding areas are marked with red boxes.</p>
Full article ">Figure 13
<p>Examples of comparative results with other semantic segmentation methods on the KWSD2 dataset. (<b>a</b>): CM-UNet++ performed well in boundaries with spectrally insignificant surroundings than other methods; (<b>b</b>): CM-UNet++ and DeeplabV3+ successfully extracted the ditch; (<b>c</b>): CM-UNet++ performed significantly better than other methods for the end of water body and the part blocked by artificial targets; (<b>d</b>): CM-UNet++ successfully extracted narrow water bodies at the corner and center. Corresponding areas are marked with red boxes.</p>
Full article ">Figure 14
<p>Visualization of comparisons of FLOPs, model parameters, and IOU.</p>
Full article ">
47 pages, 2260 KiB  
Review
Hand Gesture Recognition on Edge Devices: Sensor Technologies, Algorithms, and Processing Hardware
by Elfi Fertl, Encarnación Castillo, Georg Stettinger, Manuel P. Cuéllar and Diego P. Morales
Sensors 2025, 25(6), 1687; https://doi.org/10.3390/s25061687 - 8 Mar 2025
Viewed by 209
Abstract
Hand gesture recognition (HGR) is a convenient and natural form of human–computer interaction. It is suitable for various applications. Much research has already focused on wearable device-based HGR. By contrast, this paper gives an overview focused on device-free HGR. That means we evaluate [...] Read more.
Hand gesture recognition (HGR) is a convenient and natural form of human–computer interaction. It is suitable for various applications. Much research has already focused on wearable device-based HGR. By contrast, this paper gives an overview focused on device-free HGR. That means we evaluate HGR systems that do not require the user to wear something like a data glove or hold a device. HGR systems are explored regarding technology, hardware, and algorithms. The interconnectedness of timing and power requirements with hardware, pre-processing algorithm, classification, and technology and how they permit more or less granularity, accuracy, and number of gestures is clearly demonstrated. Sensor modalities evaluated are WIFI, vision, radar, mobile networks, and ultrasound. The pre-processing technologies stereo vision, multiple-input multiple-output (MIMO), spectrogram, phased array, range-doppler-map, range-angle-map, doppler-angle-map, and multilateration are explored. Classification approaches with and without ML are studied. Among those with ML, assessed algorithms range from simple tree structures to transformers. All applications are evaluated taking into account their level of integration. This encompasses determining whether the application presented is suitable for edge integration, their real-time capability, whether continuous learning is implemented, which robustness was achieved, whether ML is applied, and the accuracy level. Our survey aims to provide a thorough understanding of the current state of the art in device-free HGR on edge devices and in general. Finally, on the basis of present-day challenges and opportunities in this field, we outline which further research we suggest for HGR improvement. Our goal is to promote the development of efficient and accurate gesture recognition systems. Full article
(This article belongs to the Special Issue Multimodal Sensing Technologies for IoT and AI-Enabled Systems)
Show Figures

Figure 1

Figure 1
<p>Applications of gesture recognition and similar systems.</p>
Full article ">Figure 2
<p>Technology-agnostic processing flow.</p>
Full article ">Figure 3
<p>Synergy of requirements, technology, algorithms for HGR systems.</p>
Full article ">Figure 4
<p>Actuation with a single frequency.</p>
Full article ">Figure 5
<p>Actuation with a chirp.</p>
Full article ">Figure 6
<p>Actuation for UWB.</p>
Full article ">Figure 7
<p>Spectrogram (in radar often called micro-doppler), calculated by 1D Fourier transform in fast-time direction, over the entire sample in a frequency band around the sent single frequency.</p>
Full article ">Figure 8
<p>4 ultrasound sender at <math display="inline"><semantics> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>λ</mi> <mn>2</mn> </mfrac> </mstyle> </semantics></math> spacing, 24.5 kHz frequency, 0° steering.</p>
Full article ">Figure 9
<p>4 ultrasound sender at <math display="inline"><semantics> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>λ</mi> <mn>2</mn> </mfrac> </mstyle> </semantics></math> spacing, 24.5 kHz frequency, 45° steering.</p>
Full article ">Figure 10
<p>8 ultrasound sender at <math display="inline"><semantics> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>λ</mi> <mn>2</mn> </mfrac> </mstyle> </semantics></math> spacing, 24.5 kHz frequency, 0° steering.</p>
Full article ">Figure 11
<p>8 ultrasound sender at <math display="inline"><semantics> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>λ</mi> <mn>2</mn> </mfrac> </mstyle> </semantics></math> spacing, 24.5 kHz frequency, 45° steering.</p>
Full article ">Figure 12
<p>Raw ultrasound data sample in 3D.</p>
Full article ">Figure 13
<p>Raw ultrasound data sample in 2D only segmented part plotted.</p>
Full article ">Figure 14
<p>Raw ultrasound data sample in 2D.</p>
Full article ">Figure 15
<p>Range-doppler/velocity map, calculated by 2D Fourier transform (in fast - first and then in slow-time direction), of the segmented gesture.</p>
Full article ">Figure 16
<p>HGR classification algorithms and processes.</p>
Full article ">
27 pages, 1938 KiB  
Article
Skeleton Reconstruction Using Generative Adversarial Networks for Human Activity Recognition Under Occlusion
by Ioannis Vernikos and Evaggelos Spyrou
Sensors 2025, 25(5), 1567; https://doi.org/10.3390/s25051567 - 4 Mar 2025
Viewed by 153
Abstract
Recognizing human activities from motion data is a complex task in computer vision, involving the recognition of human behaviors from sequences of 3D motion data. These activities encompass successive body part movements, interactions with objects, or group dynamics. Camera-based recognition methods are cost-effective [...] Read more.
Recognizing human activities from motion data is a complex task in computer vision, involving the recognition of human behaviors from sequences of 3D motion data. These activities encompass successive body part movements, interactions with objects, or group dynamics. Camera-based recognition methods are cost-effective and perform well under controlled conditions but face challenges in real-world scenarios due to factors such as viewpoint changes, illumination variations, and occlusion. The latter is the most significant challenge in real-world recognition; partial occlusion impacts recognition accuracy to varying degrees depending on the activity and the occluded body parts while complete occlusion can render activity recognition impossible. In this paper, we propose a novel approach for human activity recognition in the presence of partial occlusion, which may be applied in cases wherein up to two body parts are occluded. The proposed approach works under the assumptions that (a) human motion is modeled using a set of 3D skeletal joints, and (b) the same body parts remain occluded throughout the whole activity. Contrary to previous research, in this work, we address this problem using a Generative Adversarial Network (GAN). Specifically, we train a Convolutional Recurrent Neural Network (CRNN), whose goal is to serve as the generator of the GAN. Its aim is to complete the missing parts of the skeleton due to occlusion. Specifically, the input to this CRNN consists of raw 3D skeleton joint positions, upon the removal of joints corresponding to occluded parts. The output of the CRNN is a reconstructed skeleton. For the discriminator of the GAN, we use a simple long short-term memory (LSTM) network. We evaluate the proposed approach using publicly available datasets in a series of occlusion scenarios. We demonstrate that in all scenarios, the occlusion of certain body parts causes a significant decline in performance, although in some cases, the reconstruction process leads to almost perfect recognition. Nonetheless, in almost every circumstance, the herein proposed approach exhibits superior performance compared to previous works, which varies between 2.2% and 37.5%, depending on the dataset used and the occlusion case. Full article
(This article belongs to the Special Issue Robust Motion Recognition Based on Sensor Technology)
Show Figures

Figure 1

Figure 1
<p>A human body pose with the 20 and 25 skeletal joints that are extracted using the Microsoft Kinect v1 (<b>left</b>) and v2 (<b>right</b>) cameras. Joints have been divided into subsets, each corresponding to one of the five main body parts, i.e., torso (blue), left hand (green), right hand (red), left leg (orange), and right leg (magenta). For illustrative purposes and also to facilitate comparisons between the two different versions, body parts have been colored using the same colors. Numbering follows the Kinect SDK in both cases; therefore, there exist several differences between the two versions.</p>
Full article ">Figure 2
<p>Example skeleton sequences of the activities (<b>a</b>) <span class="html-italic">handshaking</span> and (<b>b</b>) <span class="html-italic">hugging other person</span> from the PKU-MMD dataset, captured by Microsoft Kinect v2. First row: original skeletons, including all 25 joints (i.e., without any occlusion); second row: joints corresponding to (<b>a</b>) left arm; (<b>b</b>) both arms (see <a href="#sensors-25-01567-f001" class="html-fig">Figure 1</a>) have been discarded (i.e., the skeleton is partially occluded); third row: skeletons have been reconstructed using the proposed deep regression approach. The example of (<b>a</b>) is successfully reconstructed and correctly classified, while the example of (<b>b</b>) is unsuccessfully reconstructed and incorrectly classified.</p>
Full article ">Figure 3
<p>The architecture of the generator of the proposed GAN.</p>
Full article ">Figure 4
<p>The architecture of the discriminator of the proposed GAN architecture.</p>
Full article ">Figure 5
<p>A visual overview of the proposed approach.</p>
Full article ">Figure 6
<p>The architecture of the classifier of the proposed approach for the three-camera case.</p>
Full article ">Figure 7
<p>The architecture of the classifier of the proposed approach for the one-camera case.</p>
Full article ">Figure 8
<p>Normalized confusion matrices for classification for all datasets, without removing any body part.</p>
Full article ">Figure 9
<p>Confidence intervals using the proposed approach on all datasets, compared with the best weighted accuracies reported in previous works. In case of the proposed approach, red dot denotes the upper bound of the confidence interval, i.e., the best weighted accuracy achieved.</p>
Full article ">Figure 10
<p>Normalized confusion matrices for classification for the NTU-RGB+D dataset. LA, RA, LL and RL correspond to cases of occluded Left Arm, Right Arm, Left Leg and Right Leg, respectively.</p>
Full article ">Figure 11
<p>Normalized confusion matrices for classification for the PKU-MMD dataset. LA, RA, LL and RL correspond to cases of occluded Left Arm, Right Arm, Left Leg and Right Leg, respectively.</p>
Full article ">Figure 12
<p>Normalized confusion matrices for classification for the SYSU-3D-HOI dataset. LA, RA, LL and RL correspond to cases of occluded Left Arm, Right Arm, Left Leg and Right Leg, respectively.</p>
Full article ">Figure 13
<p>Normalized confusion matrices for classification for the UT-Kinect-Action-3D dataset. LA, RA, LL and RL correspond to cases of occluded Left Arm, Right Arm, Left Leg and Right Leg, respectively.</p>
Full article ">
28 pages, 7320 KiB  
Article
Technology for Improving the Accuracy of Predicting the Position and Speed of Human Movement Based on Machine Learning Models
by Artem Obukhov, Denis Dedov, Andrey Volkov and Maksim Rybachok
Technologies 2025, 13(3), 101; https://doi.org/10.3390/technologies13030101 - 3 Mar 2025
Viewed by 409
Abstract
The solution to the problem of insufficient accuracy in determining the position and speed of human movement during interaction with a treadmill-based training complex is considered. Control command generation based on the training complex user’s actions may be performed with a delay, may [...] Read more.
The solution to the problem of insufficient accuracy in determining the position and speed of human movement during interaction with a treadmill-based training complex is considered. Control command generation based on the training complex user’s actions may be performed with a delay, may not take into account the specificity of movements, or be inaccurate due to the error of the initial data. The article introduces a technology for improving the accuracy of predicting a person’s position and speed on a running platform using machine learning and computer vision methods. The proposed technology includes analysing and processing data from the tracking system, developing machine learning models to improve the quality of the raw data, predicting the position and speed of human movement, and implementing and integrating neural network methods into the running platform control system. Experimental results demonstrate that the decision tree (DT) model provides better accuracy and performance in solving the problem of positioning key points of a human model in complex conditions with overlapping limbs. For speed prediction, the linear regression (LR) model showed the best results when the analysed window length was 10 frames. Prediction of the person’s position (based on 10 previous frames) is performed using the DT model, which is optimal in terms of accuracy and computation time relative to other options. The comparison of the control methods of the running platform based on machine learning models showed the advantage of the combined method (linear control function combined with the speed prediction model), which provides an average absolute error value of 0.116 m/s. The results of the research confirmed the achievement of the primary objective (increasing the accuracy of human position and speed prediction), making the proposed technology promising for application in human-machine systems. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

Figure 1
<p>Schematic Diagram of the Research Methodology. The diagram illustrates the entire process—from video acquisition and key point extraction, through the stages of preprocessing, model design, and training of ML1 (positional correction), ML2 (speed detection), and ML3 (position prediction), to their subsequent integration into five neural network-based control methods (C1–C5). Different stages of the methodology are highlighted in distinct colours.</p>
Full article ">Figure 2
<p>Example of inserting noise into the original video data: (<b>a</b>) during model training; (<b>b</b>) during testing. Artificial noise in the form of grey rectangles is used to complicate the operation of the human body model recognition system.</p>
Full article ">Figure 3
<p>Comparison of LR, XGB, and DT models under conditions of artificial interference (grey rectangles) for reconstructing the correct positions of body segments (green dots and lines) during treadmill movement.</p>
Full article ">Figure 4
<p>Comparison of models for velocity determination: (<b>a</b>) at low speed; (<b>b</b>) at medium speed; and (<b>c</b>) at high speed. The caption at the top right displays the current treadmill speed reference, while the left side shows the speed predictions for different numbers of analysed frames (10, 15, and 20) across various machine learning models applied to the video frames.</p>
Full article ">Figure 4 Cont.
<p>Comparison of models for velocity determination: (<b>a</b>) at low speed; (<b>b</b>) at medium speed; and (<b>c</b>) at high speed. The caption at the top right displays the current treadmill speed reference, while the left side shows the speed predictions for different numbers of analysed frames (10, 15, and 20) across various machine learning models applied to the video frames.</p>
Full article ">Figure 5
<p>Performance visualisation of different neural network methods. Comparison graphs of treadmill control methods C1–C5 are presented, illustrating the predicted speed values relative to the reference speed set by the user.</p>
Full article ">Figure 6
<p>Visualisation of C3 method operation (the position of the person and the speed of the treadmill in this position). The absence of a linear component in method C3 leads to abrupt changes in speed.</p>
Full article ">Figure 7
<p>Visualisation of C4 method operation (the position of the person and the speed of the treadmill in this position). The combined approach employed in method C4 maintains a comfortable, smooth speed trajectory throughout the entire time interval.</p>
Full article ">Figure 8
<p>Visualisation of C5 method operation (the position of the person and the speed of the treadmill in that position). Method C5 is characterised by a very smooth start; however, despite incorporating three components, it does not demonstrate any advantages over method C4.</p>
Full article ">Figure 9
<p>Test fragments of computer vision technology under different conditions: (<b>a</b>) non-contrasting user’s clothing; (<b>b</b>) no white background and an additional person in the background; (<b>c</b>) an additional person in front of the camera. The results demonstrate the viability of the computer vision technology (for body model recognition) under real-world conditions in the presence of external interference.</p>
Full article ">Figure 10
<p>Fragments of low-light computer vision technology tests: (<b>a</b>) half of normal; (<b>b</b>) minimum level; (<b>c</b>) minimum level after exposure correction. The results indicate that the computer vision technology (for body model recognition) remains functional under challenging lighting conditions.</p>
Full article ">
21 pages, 5031 KiB  
Article
A Comparative Study of Vision Language Models for Italian Cultural Heritage
by Chiara Vitaloni, Dasara Shullani and Daniele Baracchi
Heritage 2025, 8(3), 95; https://doi.org/10.3390/heritage8030095 - 2 Mar 2025
Viewed by 317
Abstract
Human communication has long relied on visual media for interaction, and is facilitated by electronic devices that access visual data. Traditionally, this exchange was unidirectional, constrained to text-based queries. However, advancements in human–computer interaction have introduced technologies like reverse image search and large [...] Read more.
Human communication has long relied on visual media for interaction, and is facilitated by electronic devices that access visual data. Traditionally, this exchange was unidirectional, constrained to text-based queries. However, advancements in human–computer interaction have introduced technologies like reverse image search and large language models (LLMs), enabling both textual and visual queries. These innovations are particularly valuable in Cultural Heritage applications, such as connecting tourists with point-of-interest recognition systems during city visits. This paper investigates the use of various Vision Language Models (VLMs) for Cultural Heritage visual question aswering, including Bing’s search engine with GPT-4 and open models such as Qwen2-VL and Pixtral. Twenty Italian landmarks were selected for the study, including the Colosseum, Milan Cathedral, and Michelangelo’s David. For each landmark, two images were chosen: one from Wikipedia and another from a scientific database or private collection. These images were input into each VLM with textual queries regarding their content. We studied the quality of the responses in terms of their completeness, assessing the impact of various levels of detail in the queries. Additionally, we explored the effect of language (English vs. Italian) on the models’ ability to provide accurate answers. Our findings indicate that larger models, such as Qwen2-VL and Bing+ChatGPT-4, which are trained on multilingual datasets, perform better in both English and Italian. Iconic landmarks like the Colosseum and Florence’s Duomo are easily recognized, and providing context (e.g., the city) improves identification accuracy. Surprisingly, the Wikimedia dataset did not perform as expected, with varying results across models. Open models like Qwen2-VL, which can run on consumer workstations, showed performance similar to larger models. While the algorithms demonstrated strong results, they also generated occasional hallucinated responses, highlighting the need for ongoing refinement of AI systems for Cultural Heritage applications. Full article
(This article belongs to the Special Issue AI and the Future of Cultural Heritage)
Show Figures

Figure 1

Figure 1
<p>The considered images from FloreView [<a href="#B28-heritage-08-00095" class="html-bibr">28</a>], Wikimedia, and Other sources.</p>
Full article ">Figure 1 Cont.
<p>The considered images from FloreView [<a href="#B28-heritage-08-00095" class="html-bibr">28</a>], Wikimedia, and Other sources.</p>
Full article ">Figure 2
<p>Accuracy in identifying the city and subject in English. 2Q refers to the answers given to the second question.</p>
Full article ">Figure 3
<p>Accuracy in identifying the city and subject in Italian. 2Q refers to the answers given to the second question.</p>
Full article ">Figure 4
<p>Impact of including the city in the second question on overall performance. The first plot reports results in English, while the second one reports those in Italian. Positive values indicate an increase when the additional context is provided, while negative values indicate a decrease.</p>
Full article ">Figure 5
<p>Accuracy improvement in the subject detection when using English instead of Italian. Positive values indicate an increase when the conversation uses the English language, while negative values indicate an increase when the conversation uses the Italian language.</p>
Full article ">Figure 6
<p>Accuracy in identifying each subject across all the analyzed models. Light colors refer to responses in Italian, while bold colors to those in English.</p>
Full article ">
19 pages, 771 KiB  
Article
FRU-Adapter: Frame Recalibration Unit Adapter for Dynamic Facial Expression Recognition
by Myungbeom Her, Hamza Ghulam Nabi and Ji-Hyeong Han
Electronics 2025, 14(5), 978; https://doi.org/10.3390/electronics14050978 - 28 Feb 2025
Viewed by 174
Abstract
Dynamic facial expression recognition (DFER) is one of the most important challenges in computer vision, as it plays a crucial role in human–computer interaction. Recently, adapter-based approaches have been introduced into DFER, and they have achieved remarkable success. However, the adapters still suffer [...] Read more.
Dynamic facial expression recognition (DFER) is one of the most important challenges in computer vision, as it plays a crucial role in human–computer interaction. Recently, adapter-based approaches have been introduced into DFER, and they have achieved remarkable success. However, the adapters still suffer from the following problems: overlooking irrelevant frames and interference with pre-trained information. In this paper, we propose a frame recalibration unit adapter (FRU-Adapter) which combines the strengths of a frame recalibration unit (FRU) and temporal self-attention (T-SA) to address the aforementioned issues. The FRU initially recalibrates the frames by emphasizing important frames and suppressing less relevant frames. The recalibrated frames are then fed into T-SA to capture the correlations between meaningful frames. As a result, the FRU-Adapter captures enhanced temporal dependencies by considering the irrelevant frames in a clip. Furthermore, we propose a method for attaching the FRU-Adapter to each encoder layer in parallel to reduce the loss of pre-trained information. Notably, the FRU-Adapter uses only 2% of the total training parameters per task while achieving an improved accuracy. Extended experiments on DFER tasks show that the proposed FRU-Adapter not only outperforms the state-of-the-art models but also exhibits parameter efficiency. The source code will be made publicly available. Full article
Show Figures

Figure 1

Figure 1
<p><b>Overall architecture of the FRU-Adapter</b>. (<b>a</b>) is the encoder layer attached to (<b>b</b>) the frame recalibration unit adapter (FRU-Adapter), where (<b>c</b>) the frame recalibration unit (FRU) recalibrates the frames, followed by temporal self-attention (T-SA) for capturing the enhanced temporal dependency.</p>
Full article ">Figure 2
<p>Possible positions of FRU-Adapter.</p>
Full article ">Figure 3
<p>High-level feature visualization on 1st-fold using t-SNE [<a href="#B50-electronics-14-00978" class="html-bibr">50</a>]. Here, we compare the proposed FRU-Adapter with a baseline model.</p>
Full article ">Figure 4
<p>Visualization of input image sequence and FRU weights of the last FRU-Adapter block.</p>
Full article ">
25 pages, 17199 KiB  
Article
DRFormer: A Benchmark Model for RNA Sequence Downstream Tasks
by Jianqi Fu, Haohao Li, Yanlei Kang, Hancan Zhu, Tiren Huang and Zhong Li
Genes 2025, 16(3), 284; https://doi.org/10.3390/genes16030284 - 26 Feb 2025
Viewed by 185
Abstract
Background/Objectives: RNA research is critical for understanding gene regulation, disease mechanisms, and therapeutic development. Constructing effective RNA benchmark models for accurate downstream analysis has become a significant research challenge. The objective of this study is to propose a robust benchmark model, DRFormer, for [...] Read more.
Background/Objectives: RNA research is critical for understanding gene regulation, disease mechanisms, and therapeutic development. Constructing effective RNA benchmark models for accurate downstream analysis has become a significant research challenge. The objective of this study is to propose a robust benchmark model, DRFormer, for RNA sequence downstream tasks. Methods: The DRFormer model utilizes RNA sequences to construct novel vision features based on secondary structure and sequence distance. These features are pre-trained using the SWIN model to develop a SWIN-RNA submodel. This submodel is then integrated with an RNA sequence model to construct a multimodal model for downstream analysis. Results: We conducted experiments on various RNA downstream tasks. In the sequence classification task, the MCC reached 94.4%, surpassing the state-of-the-art RNAErnie model by 1.2%. In the protein–RNA interaction prediction, DRFormer achieved an MCC of 0.492, outperforming advanced models like BERT-RBP and PrismNet. In RNA secondary structure prediction, the F1 score was 0.690, exceeding the widely used SPOT-RNA model by 1%. Additionally, generalization experiments on DNA tasks yielded satisfactory results. Conclusions: DRFormer is the first RNA sequence downstream analysis model that leverages structural features to construct a vision model and integrates sequence and vision models in a multimodal manner. This approach yields excellent prediction and analysis results, making it a valuable contribution to RNA research. Full article
(This article belongs to the Section RNA)
Show Figures

Figure 1

Figure 1
<p>Overall framework of the model.</p>
Full article ">Figure 2
<p>Feature embedding visualization on the nRC dataset.</p>
Full article ">Figure 3
<p>Attention distance and entropy analysis. (<b>Left</b>): Local average entropy. (<b>Right</b>): Local average attention distance.</p>
Full article ">Figure 4
<p>Comparison of true structures and predicted structures by DRFormer and SPOT-RNA. (<b>Left</b>): DRFormer predicted structure. (<b>Middle</b>): SPOT-RNA predicted structure. (<b>Right</b>): True structure. In the figure, red boxes indicate prediction errors or no predictions, and green boxes indicate true structures.</p>
Full article ">Figure 5
<p>Motif analysis of subsequences existing paired relationship in the TS0_112 dataset.</p>
Full article ">Figure 6
<p>RNA sequence classification performance comparison.</p>
Full article ">Figure 7
<p>RBPs violin plot of MCC for different methods.</p>
Full article ">Figure 8
<p>Performance comparison of DRFormer with PrismNet, PrismNet_Str, GraphProt, and BERT-RBP using multiple metrics. Each dot represents the corresponding metric score of DRFormer and the corresponding baseline model trained using the same RBP dataset, and the diagonal dotted line indicates that the performance of two models is the same. The dots in the figure represent DRFormer, and the triangles represent the corresponding baseline models.</p>
Full article ">Figure 9
<p>Attention analysis of DRFormer on the RNA sequence and its sequence structure. (<b>A</b>,<b>B</b>) represent different RNA fragments corresponding to SND1 in the K562 lineage, and (<b>C</b>) represents the RNA fragment corresponding to AUH in the K562 lineage. For any fragment, the (<b>top</b>) represents the attention area of our model, the (<b>middle</b>) represents the actual sequence, and the (<b>bottom</b>) represents the value corresponding to icSHAPE, where 1 represents single-stranded, and 0 represents double-stranded. (<b>D</b>) The motif of SND1 in the K562 lineage in mCross. (<b>E</b>) The motif of AUH in the K562 lineage in mCross.</p>
Full article ">Figure 10
<p>Performance chart for TF (human), CPD, and TF (mouse).</p>
Full article ">Figure A1
<p>Comparison of the true structure and predicted structures between DRFormer and SPOT-RNA. (<b>Left</b>): Predicted structure by DRFormer. (<b>Middle</b>): Predicted structure SPOT-RNA. (<b>Right</b>): True structure. The red box indicates that the prediction is wrong or not predicted, while the green box indicates the true structure.</p>
Full article ">Figure A2
<p>RBPs violin plot for different metric comparison.</p>
Full article ">Figure A3
<p>Performance comparison of DRFormer with PrismNet, PrismNet_Str, GraphProt, and BERT-RBP using multiple metrics. Each dot represents the corresponding metric score of DRFormer and the corresponding baseline model trained using the same RBP dataset, and the diagonal dotted line indicates that the performance of two models is the same. The dots in the figure represent DRFormer, and the triangles represent the corresponding baseline models.</p>
Full article ">
21 pages, 6566 KiB  
Article
Retina-Targeted 17β-Estradiol by the DHED Prodrug Rescues Visual Function and Actuates Neuroprotective Protein Networks After Optic Nerve Crush in a Rat Model of Surgical Menopause
by Katalin Prokai-Tatrai, Khadiza Zaman, Ammar Kapic, Kelleigh Hogan, Gabriela Sanchez-Rodriguez, Anna E. Silverio, Vien Nguyen, Laszlo Prokai and Andrew J. Feola
Int. J. Mol. Sci. 2025, 26(5), 1846; https://doi.org/10.3390/ijms26051846 - 21 Feb 2025
Viewed by 195
Abstract
The association between 17β-estradiol (E2) deprivation, seen in menopause, and a risk for developing glaucoma has been shown. Thus, exogenous supplementation of E2 may protect against retinal ganglion cell (RGC) degradation and vision loss. Here, we investigated the utility of topical 10β,17β-dihydroxyestra-1,4-dien-3-one (DHED), [...] Read more.
The association between 17β-estradiol (E2) deprivation, seen in menopause, and a risk for developing glaucoma has been shown. Thus, exogenous supplementation of E2 may protect against retinal ganglion cell (RGC) degradation and vision loss. Here, we investigated the utility of topical 10β,17β-dihydroxyestra-1,4-dien-3-one (DHED), a prodrug of E2 that selectively produces the neuroprotective hormone in the retina, on visual function after optic nerve crush (ONC) and ovariectomy (OVX). We used female Brown Norway rats that underwent either Sham or OVX surgeries. After ONC, OVX animals received DHED or vehicle eye drops for 12 weeks. Visual function, via the optomotor reflex, and retinal thickness, via optical coherence tomography, were followed longitudinally. Afterward, we performed mass spectrometry-based label-free retina proteomics to survey retinal protein interaction networks in our selected animal model and to identify E2-responsive proteins after OVX on neurodegeneration. We found that ONC with OVX caused a significant decline in visual functions that were ameliorated by DHED treatments. Discovery-driven retina proteomics identified numerous proteins associated with neurodegenerative processes due to ONC that were remediated by DHED eye drops. Altogether, our three-pronged phenotypic preclinical evaluation of the topical DHED in the OVX + ONC model of glaucoma reveals the therapeutic potential of the prodrug to prevent visual deficits after glaucomatous retinal injury. Full article
(This article belongs to the Special Issue Neuroprotective Strategies 2024)
Show Figures

Figure 1

Figure 1
<p>Schematic illustration of the DHED bioprecursor prodrug’s site-specific metabolism to E2 in the CNS catalyzed by a short-chain reductase (SDR) utilizing NADP(H) as a cofactor [<a href="#B25-ijms-26-01846" class="html-bibr">25</a>].</p>
Full article ">Figure 2
<p>Behavioral assessments of vision based on the OMR of the animal in Sham and OVX BN rats after ONC and neuroprotective intervention. CL eyes were used as innate control. (<b>a</b>) SF decreased in ONC eyes relative to CL eyes for all cohorts over 12 weeks; (<b>b</b>) SF at 12 weeks after ONC injury as well as DHED or vehicle treatments; (<b>c</b>) the rate of change (slope) in SF over the 12-week observational period; (<b>d</b>) CS decreased in ONC eyes over 12 weeks for all cohorts; (<b>e</b>) CS at 12 weeks; (<b>f</b>) the rate of change (slope) was lowest in ONC eyes of OVX + Vehicle animals, and was significantly lower compared to OVX + DHED animals. Similarly to SF outcomes, we did not find a significant difference in CS at 12 weeks, (<b>e</b>) or in the rate of change in SF (<b>f</b>) between CL of all cohorts (<span class="html-italic">p</span> &gt; 0.05 for all comparisons). Results are displayed as mean with error bars representing the 95th confidence intervals. Statistical significance: * <span class="html-italic">p</span> &lt; 0.05, ** <span class="html-italic">p</span> &lt; 0.01, *** <span class="html-italic">p</span> &lt; 0.001, **** <span class="html-italic">p</span> &lt; 0.0001 (<span class="html-italic">n</span> = 6 per group).</p>
Full article ">Figure 3
<p>Impact of ONC on retinal thickness in the BN retired breeder female rats after Sham or OVX surgery; (<b>a</b>) total retinal thickness decreased in ONC eyes relative the CL eye for all cohorts); (<b>b</b>) at 12 weeks, total retinal thickness was significantly lower in ONC compared to CL eyes for all cohorts; (<b>c</b>) the rate of retinal thinning (slope) was significantly higher in ONC compared to CL. Results are displayed as mean with error bars representing the 95th confidence intervals. Statistical significance: ** <span class="html-italic">p &lt;</span> 0.01, *** <span class="html-italic">p &lt;</span> 0.001, **** <span class="html-italic">p &lt;</span> 0.0001, <span class="html-italic">n</span> = 6 per group.</p>
Full article ">Figure 4
<p>Principal component analysis (PCA) plot showing the similarity and dissimilarity among retinas originating from the three experimental groups (green: Young-Ref; magenta: Sham-CL; blue: Sham-ONC).</p>
Full article ">Figure 5
<p>IPA<sup>®</sup> mapping of ONC-impacted proteins in the Sham BN rat retina: (<b>a</b>) A subset of these proteins was linked to a disease and function network associated by the knowledge base on retinal disease, inflammation of the retina, degeneration of retinal rod and cone cells, and neurological disorder or retinal cells. Blue dashed line: inhibition/decrease; orange dashed line: activation/increase; yellow dashed line: cannot be predicted. (<b>b</b>) An IPA<sup>®</sup> protein interaction network linked to cellular movement, hematological system development and function, and immune cell trafficking also shows crystallins’ regulation and association with retinal disease processes. CP—canonical pathway; red—upregulation; green—downregulation; shade of color indicates the extent of change in expression. Solid gray line—direct relationship; dashed gray line—indirect relationship; blue dashed line—inhibition/decrease; orange dashed line—activation/increase; yellow dashed line—cannot be predicted; blue solid line—inhibition. Asterisks indicate that multiple protein identifiers (isoforms) in the input file were mapped to the same gene by IPA<sup>®</sup>. Other (placed in the rectangular box on the right): functions and diseases associated with the indicated elements of the network. Abbreviations of proteins are listed in <a href="#app1-ijms-26-01846" class="html-app">Table S7a,b</a>.</p>
Full article ">Figure 6
<p>IPA<sup>®</sup>-based illustration of the neuroprotective effects of retina-targeted E2 treatment via its DHED prodrug in the OVX + DHED group: (<b>a</b>) A subgroup of these proteins was mapped to a physiological function and disease network pinpointing the associated proteins with their regulation pattern by the knowledge base assigning functions related to neurodegeneration, the number of photoreceptors and sensory neurons, degeneration of photoreceptors, neurosensory neurons, and the retina. Blue dashed line: inhibition/decrease; orange dashed line: activation/increase; yellow dashed line: cannot be predicted. (<b>b</b>) An IPA<sup>®</sup> protein interaction network predicted inhibition of eye degeneration mediated by our therapeutic intervention. This IPA<sup>®</sup>-mapped protein interaction network also represents cell death and survival, cellular compromise, and neurological disease, and shows the regulation of different forms of neuroprotective crystallins. (<b>c</b>) An IPA<sup>®</sup> ML protein interaction network predicted inhibition of eye degeneration mediated by DHED-derived E2 treatment. CP—canonical pathway; red—upregulation; green—downregulation; shade of color indicates the extent of change in expression. Solid gray line—direct relationship; dashed gray line—indirect relationship; blue dashed line—inhibition/decrease; orange dashed line—activation/increase; yellow dashed line—cannot be predicted; orange solid line—direct activation/increase. Asterisks denote that multiple protein identifiers (isoforms) in the input file were mapped to the same gene. Other (placed in the rectangular box on the right): functions and diseases associated with the indicated elements of the network. Abbreviations of proteins are listed in <a href="#app1-ijms-26-01846" class="html-app">Table S7c–e</a>.</p>
Full article ">Figure 6 Cont.
<p>IPA<sup>®</sup>-based illustration of the neuroprotective effects of retina-targeted E2 treatment via its DHED prodrug in the OVX + DHED group: (<b>a</b>) A subgroup of these proteins was mapped to a physiological function and disease network pinpointing the associated proteins with their regulation pattern by the knowledge base assigning functions related to neurodegeneration, the number of photoreceptors and sensory neurons, degeneration of photoreceptors, neurosensory neurons, and the retina. Blue dashed line: inhibition/decrease; orange dashed line: activation/increase; yellow dashed line: cannot be predicted. (<b>b</b>) An IPA<sup>®</sup> protein interaction network predicted inhibition of eye degeneration mediated by our therapeutic intervention. This IPA<sup>®</sup>-mapped protein interaction network also represents cell death and survival, cellular compromise, and neurological disease, and shows the regulation of different forms of neuroprotective crystallins. (<b>c</b>) An IPA<sup>®</sup> ML protein interaction network predicted inhibition of eye degeneration mediated by DHED-derived E2 treatment. CP—canonical pathway; red—upregulation; green—downregulation; shade of color indicates the extent of change in expression. Solid gray line—direct relationship; dashed gray line—indirect relationship; blue dashed line—inhibition/decrease; orange dashed line—activation/increase; yellow dashed line—cannot be predicted; orange solid line—direct activation/increase. Asterisks denote that multiple protein identifiers (isoforms) in the input file were mapped to the same gene. Other (placed in the rectangular box on the right): functions and diseases associated with the indicated elements of the network. Abbreviations of proteins are listed in <a href="#app1-ijms-26-01846" class="html-app">Table S7c–e</a>.</p>
Full article ">
17 pages, 6315 KiB  
Article
RVM+: An AI-Driven Vision Sensor Framework for High-Precision, Real-Time Video Portrait Segmentation with Enhanced Temporal Consistency and Optimized Model Design
by Na Tang, Yuehui Liao, Yu Chen, Guang Yang, Xiaobo Lai and Jing Chen
Sensors 2025, 25(5), 1278; https://doi.org/10.3390/s25051278 - 20 Feb 2025
Viewed by 450
Abstract
Video portrait segmentation is essential for intelligent sensing systems, including human-computer interaction, autonomous navigation, and augmented reality. However, dynamic video environments introduce significant challenges, such as temporal variations, occlusions, and computational constraints. This study introduces RVM+, an enhanced video segmentation framework based on [...] Read more.
Video portrait segmentation is essential for intelligent sensing systems, including human-computer interaction, autonomous navigation, and augmented reality. However, dynamic video environments introduce significant challenges, such as temporal variations, occlusions, and computational constraints. This study introduces RVM+, an enhanced video segmentation framework based on the Robust Video Matting (RVM) architecture. By incorporating Convolutional Gated Recurrent Units (ConvGRU), RVM+ improves temporal consistency and captures intricate temporal dynamics across video frames. Additionally, a novel knowledge distillation strategy reduces computational demands while maintaining high segmentation accuracy, making the framework ideal for real-time applications in resource-constrained environments. Comprehensive evaluations on challenging datasets show that RVM+ outperforms state-of-the-art methods in both segmentation accuracy and temporal consistency. Key performance indicators such as MIoU, SAD, and dtSSD effectively verify the robustness and efficiency of the model. The integration of knowledge distillation ensures a streamlined and effective design with negligible accuracy trade-offs, highlighting its suitability for practical deployment. This study makes significant strides in intelligent sensor technology, providing a high-performance, efficient, and scalable solution for video segmentation. RVM+ offers potential for applications in fields such as augmented reality, robotics, and real-time video analysis, while also advancing the development of AI-enabled vision sensors. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>Flowchart of our video portrait segmentation structure.</p>
Full article ">Figure 2
<p>Static background sample images.</p>
Full article ">Figure 3
<p>MulData dataset sample images.</p>
Full article ">Figure 4
<p>Results of random rotation.</p>
Full article ">Figure 5
<p>Results of background fitting.</p>
Full article ">Figure 6
<p>RVM model structure.</p>
Full article ">Figure 7
<p>Structure of the ConvGRU module for extracting temporal features.</p>
Full article ">Figure 8
<p>Structure of the RVM+ model.</p>
Full article ">Figure 9
<p>Specific schematic of knowledge distillation.</p>
Full article ">Figure 10
<p>Segmentation results of a three-frame test video.</p>
Full article ">Figure 11
<p>Segmentation results from challenging scenarios.</p>
Full article ">Figure 12
<p>Detailed comparison of segmentation results across different models.</p>
Full article ">Figure 13
<p>Comparative effects of multi-angle portrait segmentation.</p>
Full article ">
16 pages, 2664 KiB  
Article
Development of New Generation Portable Camera-Aided Surgical Simulator for Cognitive Training in Laparoscopic Cholecystectomy
by Yucheng Li, Victoria Nelson, Cuong T. Nguyen, Irene Suh, Suvranu De, Ka-Chun Siu and Carl Nelson
Electronics 2025, 14(4), 793; https://doi.org/10.3390/electronics14040793 - 18 Feb 2025
Viewed by 291
Abstract
Laparoscopic cholecystectomy (LC) is the standard procedure for gallbladder removal, but improper identification of anatomical structures can lead to biliary duct injury (BDI). The critical view of safety (CVS) is a standardized technique designed to mitigate this risk. However, existing surgical training systems [...] Read more.
Laparoscopic cholecystectomy (LC) is the standard procedure for gallbladder removal, but improper identification of anatomical structures can lead to biliary duct injury (BDI). The critical view of safety (CVS) is a standardized technique designed to mitigate this risk. However, existing surgical training systems primarily emphasize haptic feedback and physical skill development, making them expensive and less accessible. This paper presents the next-generation Portable Camera-Aided Surgical Simulator (PortCAS), a cost-effective, portable, vision-based surgical training simulator designed to enhance cognitive skill acquisition in LC. The system consists of an enclosed physical module equipped with a vision system, a single-board computer for real-time instrument tracking, and a virtual simulation interface that runs on a user-provided computer. Unlike traditional simulators, PortCAS prioritizes cognitive training over force-based interactions, eliminating the need for costly haptic components. The system was evaluated through user studies assessing accuracy, usability, and training effectiveness. Results demonstrate that PortCAS provides a sufficiently accurate tracking performance for training surgical skills such as CVS, offering a scalable and accessible solution for surgical education. Full article
(This article belongs to the Special Issue Virtual Reality Applications in Enhancing Human Lives)
Show Figures

Figure 1

Figure 1
<p>Workflow diagram of the portable surgical training simulator. The system comprises three components: (1) three smartphones equipped with cameras, an installed app, and an image segmentation program; (2) a Raspberry Pi running a triangulation program to estimate marker positions in space; and (3) a vision computer that renders the VR environment. Smartphones capture marker pixel coordinates and transmit the data to the Raspberry Pi. The Raspberry Pi processes the data to calculate marker positions and sends the results to the computer, which generates the immersive VR simulation.</p>
Full article ">Figure 2
<p>Portable enclosure design and assembly process. (<b>A</b>) Unfolded enclosure: The piece is laser-cut from <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>/</mo> <msup> <mn>8</mn> <mrow> <mo>″</mo> </mrow> </msup> </mrow> </semantics></math> plywood and connected with <math display="inline"><semantics> <msup> <mn>12</mn> <mrow> <mo>″</mo> </mrow> </msup> </semantics></math> hinges using <math display="inline"><semantics> <mrow> <mn>1</mn> <mo>/</mo> <msup> <mn>8</mn> <mrow> <mo>″</mo> </mrow> </msup> </mrow> </semantics></math> rivets for foldability. (<b>B</b>) Folded enclosure: The compact design achieves a folded volume of <math display="inline"><semantics> <mrow> <mn>12</mn> <mo>.</mo> <msup> <mn>25</mn> <mrow> <mo>″</mo> </mrow> </msup> <mo>×</mo> <msup> <mn>12</mn> <mrow> <mo>″</mo> </mrow> </msup> <mo>×</mo> <mn>1</mn> <mo>.</mo> <msup> <mn>25</mn> <mrow> <mo>″</mo> </mrow> </msup> </mrow> </semantics></math> for portability. (<b>C</b>) Installed enclosure: Tabs and slots securely connect the panels to form the working structure. (<b>D</b>) Fully assembled prototype: The enclosure is equipped with three smartphones and two laparoscopic graspers, ready for simulation use.</p>
Full article ">Figure 3
<p>The schematic of the enclosure shows the enclosure design and layout for camera positioning and triangulation analysis. The Remote Center of Motion (RCM) is the fixed position where the laparoscopic gripper passes through and is secured within the enclosure. The red, green, and blue arrows represent the x-, y-, and z-axes, respectively.</p>
Full article ">Figure 4
<p>Color-based segmentation is applied to identify surgical instrument tips. (<b>A</b>) shows color markers, (<b>B</b>) highlights the segmented colors, and (<b>C</b>) shows centroids for triangulation.</p>
Full article ">Figure 5
<p>(<b>A</b>) Local view showing the rays from three cameras and the estimated target position. (<b>B</b>) Triangulation setup illustrating camera rays and target position. The red, green, and blue arrows represent the <span class="html-italic">x</span>-, <span class="html-italic">y</span>-, and <span class="html-italic">z</span>-axes, respectively.</p>
Full article ">Figure 6
<p>Test scenarios for assessing camera layouts. All camera configurations are positioned on the surface of a sphere centered on the target position. (<b>A</b>,<b>D</b>,<b>G</b>) correspond to <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>=</mo> <msup> <mn>80</mn> <mo>∘</mo> </msup> </mrow> </semantics></math>; (<b>B</b>,<b>E</b>,<b>H</b>) to <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>=</mo> <mn>54</mn> <mo>.</mo> <msup> <mn>8</mn> <mo>∘</mo> </msup> </mrow> </semantics></math>; and (<b>C</b>,<b>F</b>,<b>I</b>) to <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>=</mo> <msup> <mn>85</mn> <mo>∘</mo> </msup> </mrow> </semantics></math>. (<b>A</b>–<b>C</b>) have an azimuthal angle distribution of <math display="inline"><semantics> <mrow> <mi>ϕ</mi> <mo>=</mo> <msup> <mn>150</mn> <mo>∘</mo> </msup> </mrow> </semantics></math>; (<b>D</b>–<b>F</b>) have <math display="inline"><semantics> <mrow> <mi>ϕ</mi> <mo>=</mo> <msup> <mn>120</mn> <mo>∘</mo> </msup> </mrow> </semantics></math>; and (<b>G</b>–<b>I</b>) have <math display="inline"><semantics> <mrow> <mi>ϕ</mi> <mo>=</mo> <msup> <mn>60</mn> <mo>∘</mo> </msup> </mrow> </semantics></math>. The red, green, and blue arrows represent the <span class="html-italic">x</span>-, <span class="html-italic">y</span>-, and <span class="html-italic">z</span>-axes, respectively.</p>
Full article ">Figure 7
<p>Contour map of condition numbers for various camera layout scenarios, computed based on Equation (<a href="#FD9-electronics-14-00793" class="html-disp-formula">9</a>). The map covers a continuous range of <math display="inline"><semantics> <mi>θ</mi> </semantics></math> and <math display="inline"><semantics> <mi>ϕ</mi> </semantics></math>, illustrating the impact of these parameters on the condition number. The nine discrete scenarios (<b>A</b>–<b>I</b>) from <a href="#electronics-14-00793-f006" class="html-fig">Figure 6</a> are marked at their corresponding locations on the map for reference.</p>
Full article ">Figure 8
<p>(<b>A</b>) Illustration of the <math display="inline"><semantics> <mrow> <mn>5</mn> <mo>×</mo> <mn>5</mn> </mrow> </semantics></math> grid of target positions within the enclosure’s workspace, with each square measuring 15 mm <math display="inline"><semantics> <mrow> <mo>×</mo> <mn>15</mn> </mrow> </semantics></math> mm. (<b>B</b>) Positioning of the two symmetric cameras. (<b>C</b>) Positioning of the camera on the symmetric plane. (<b>D</b>) Schematic of the enclosure’s interior showing camera placement and target grid.</p>
Full article ">Figure 9
<p>Accuracy test results comparing estimated target positions (blue points) to ground truth target positions (red grid) across different planes: (<b>A</b>) <math display="inline"><semantics> <mrow> <mi>x</mi> <mi>y</mi> </mrow> </semantics></math>-plane, (<b>B</b>) <math display="inline"><semantics> <mrow> <mi>y</mi> <mi>z</mi> </mrow> </semantics></math>-plane, (<b>C</b>) <math display="inline"><semantics> <mrow> <mi>x</mi> <mi>z</mi> </mrow> </semantics></math>-plane, and (<b>D</b>) 3D view of the workspace. All dimensions are presented in millimeters. The red, green, and blue arrows represent the <span class="html-italic">x</span>-, <span class="html-italic">y</span>-, and <span class="html-italic">z</span>-axes, respectively.</p>
Full article ">Figure 10
<p>Simulation results of the VR environment for laparoscopic training. The system visualizes the gallbladder and surrounding organs with realistic coloration and texture to enhance realism. Interaction with virtual organs demonstrates key procedural steps, including connective tissue dissection and gallbladder isolation. (<b>A</b>) Illustration of the setup with two laparoscopic grippers. (<b>B</b>) Demonstration of both devices grasping either the liver or the gallbladder. (<b>C</b>) Illustration of the left arm grasping the liver while the right arm dissects fat tissue. (<b>D</b>) Depiction of the right arm grasping the liver while the left arm dissects fat tissue.</p>
Full article ">
29 pages, 4045 KiB  
Article
Advanced Digital Solutions for Food Traceability: Enhancing Origin, Quality, and Safety Through NIRS, RFID, Blockchain, and IoT
by Matyas Lukacs, Fruzsina Toth, Roland Horvath, Gyula Solymos, Boglárka Alpár, Peter Varga, Istvan Kertesz, Zoltan Gillay, Laszlo Baranyai, Jozsef Felfoldi, Quang D. Nguyen, Zoltan Kovacs and Laszlo Friedrich
J. Sens. Actuator Netw. 2025, 14(1), 21; https://doi.org/10.3390/jsan14010021 - 17 Feb 2025
Viewed by 383
Abstract
The rapid growth of the human population, the increase in consumer needs regarding food authenticity, and the sub-par synchronization between agricultural and food industry production necessitate the development of reliable track and tracing solutions for food commodities. The present research proposes a simple [...] Read more.
The rapid growth of the human population, the increase in consumer needs regarding food authenticity, and the sub-par synchronization between agricultural and food industry production necessitate the development of reliable track and tracing solutions for food commodities. The present research proposes a simple and affordable digital system that could be implemented in most production processes to improve transparency and productivity. The system combines non-destructive, rapid quality assessment methods, such as near infrared spectroscopy (NIRS) and computer/machine vision (CV/MV), with track and tracing functionalities revolving around the Internet of Things (IoT) and radio frequency identification (RFID). Meanwhile, authenticity is provided by a self-developed blockchain-based solution that validates all data and documentation “from farm to fork”. The system is introduced by taking certified Hungarian sweet potato production as a model scenario. Each element of the proposed system is discussed in detail individually and as a part of an integrated system, capable of automatizing most production flows while maintaining complete transparency and compliance with authority requirements. The results include the data and trust model of the system with sequence diagrams simulating the interactions between participants. The study lays the groundwork for future research and industrial applications combining digital tools to improve the productivity and authenticity of the agri-food industry, potentially increasing the level of trust between participants, most importantly for the consumers. Full article
(This article belongs to the Topic Trends and Prospects in Security, Encryption and Encoding)
Show Figures

Figure 1

Figure 1
<p>Simplified sweet potato supply chain with core material flow and management steps, including the proposed digital technologies. Red arrows indicate measured data, blue arrows indicate manually provided data.</p>
Full article ">Figure 2
<p>Physical components of an RFID reader.</p>
Full article ">Figure 3
<p>The connection of the IoT modules to the internet.</p>
Full article ">Figure 4
<p>Summary of the integrated blockchain-based authentication. Note: the RFID IoT module may be replaced with other IoT modules in the system.</p>
Full article ">Figure 5
<p>Summary of the developed track and tracing solution.</p>
Full article ">Figure 6
<p>The system’s data model.</p>
Full article ">Figure 7
<p>The trust model of the track and tracing solution with actor responsibilities.</p>
Full article ">Figure 8
<p>Simplified sequence diagram showing actor interactions with the system. Dashed lines indicate read-only permissions.</p>
Full article ">Figure 9
<p>The tracing system front-end. (<b>A</b>) latest measurement value; (<b>B</b>) graphical representation of the logged data; (<b>C</b>) the MySQL database of the measured results, (<b>D</b>) consumer front-end.</p>
Full article ">
17 pages, 251 KiB  
Article
Why Kant’s Moral–Religious Project Was Bound to Unravel
by Jaeha Woo
Religions 2025, 16(2), 235; https://doi.org/10.3390/rel16020235 - 14 Feb 2025
Viewed by 340
Abstract
After criticizing the three traditional proofs of divine existence in the first Critique, Kant fills this void with an apologetic argument based on his practical philosophy. However, this moral–religious project has long been charged with various inconsistencies, particularly regarding the tension between [...] Read more.
After criticizing the three traditional proofs of divine existence in the first Critique, Kant fills this void with an apologetic argument based on his practical philosophy. However, this moral–religious project has long been charged with various inconsistencies, particularly regarding the tension between the demand for moral perfection and human limitation. There is even some indication that he becomes aware of these issues, as he later moves away from the vision of endless moral progress that holds his original project together. However, this revision does not resolve all the tensions, as the question of how imperfect humans can be well-pleasing to God remains. I argue that this predicament is a difficult-to-avoid feature of his project given how it interacts with his religious context of Lutheran Christianity. This is because he incorporates some of its elements (particularly its uncompromising moral standard) virtually intact while radically altering others (such as vicarious atonement and imputation of alien righteousness). However, this procedure undermines the coherence of the tradition he inherits because the elements he fully incorporates are meant to lead to the traditional doctrines he leaves behind. I conclude by reflecting on how theists who are sympathetic to Kant should lead his moral–religious project out of its current precarious predicament. Full article
(This article belongs to the Special Issue Theological Reflections on Moral Theories)
25 pages, 2143 KiB  
Article
Does Environmental Disclosure and Corporate Governance Ensure the Financial Sustainability of Islamic Banks?
by Saqib Muneer, Ajay Singh, Mazhar Hussain Choudhary, Awwad Saad Alshammari and Nasir Ali Butt
Adm. Sci. 2025, 15(2), 54; https://doi.org/10.3390/admsci15020054 - 10 Feb 2025
Viewed by 548
Abstract
The purpose of this study is to investigate the influence of environmental disclosure and corporate governance on the financial performance of Islamic banks in Saudi Arabia. This study highlights that sustainable practices are transparent with financial objectives using the religious framework of Islamic [...] Read more.
The purpose of this study is to investigate the influence of environmental disclosure and corporate governance on the financial performance of Islamic banks in Saudi Arabia. This study highlights that sustainable practices are transparent with financial objectives using the religious framework of Islamic finance. This research is based on Worldwide Vision 2030, which covers sustainable development and promotes environmental, social, and governance (ESG) principles, as well as corporate governance factors, such as board composition and Shariah Supervisory Boards (SSBs). We use a hybrid approach for our findings, with a dataset spanning 2011–2023 for the quantitative analysis and 20 semi-structured analyses conducted for a qualitative approach that aligns with objectives. We found that environmental disclosure boosts profits and stakeholder trust. Corporate governance structures, such as environmental boards and sustainability committees, improve the environmental disclosure of financial performance in Islamic banks. In this positive interaction, specialized governance drives Sharia-compliant sustainability initiatives. SSBs help Islamic banks integrate sustainability and meet religious and ESG environmental standards. Board diversity and dedication in the sustainability committee both play important roles in enhancing environmental disclosure practices; in return, these improved financial performances. The interaction of environmental disclosure and board environmental expertise has a positive impact on the overall performance, which indicates that governance structure supports sustainability-related decision-making, aligning with transparency. This study suggests that Islamic banks standardize ESG frameworks, improve board environmental expertise, and invest in real-time sustainability reporting digital solutions. Saudi Islamic banks can lead regional and global sustainable banking by adopting these strategies to align with global sustainability trends, improve financial performance, and meet ethical finance expectations. Full article
Show Figures

Figure 1

Figure 1
<p>Distribution of environmental disclosure scores and scatter plot of environmental disclosure vs. financial performance.</p>
Full article ">Figure 2
<p>Panel data regression model diagnostics.</p>
Full article ">Figure 3
<p>Interaction effects plot for moderating variables.</p>
Full article ">Figure 4
<p>Word cloud of key themes from qualitative analysis.</p>
Full article ">
23 pages, 1202 KiB  
Article
CSP-DCPE: Category-Specific Prompt with Deep Contextual Prompt Enhancement for Vision–Language Models
by Chunlei Wu, Yixiang Wu, Qinfu Xu and Xuebin Zi
Electronics 2025, 14(4), 673; https://doi.org/10.3390/electronics14040673 - 9 Feb 2025
Viewed by 565
Abstract
Recently, prompt learning has emerged as a viable technique for fine-tuning pre-trained vision–language models (VLMs). The use of prompts allows pre-trained VLMs to be quickly adapted to specific downstream tasks, bypassing the necessity to update the original pre-trained weights. Nevertheless, much of the [...] Read more.
Recently, prompt learning has emerged as a viable technique for fine-tuning pre-trained vision–language models (VLMs). The use of prompts allows pre-trained VLMs to be quickly adapted to specific downstream tasks, bypassing the necessity to update the original pre-trained weights. Nevertheless, much of the existing work on prompt learning has focused primarily on the utilization of non-specific prompts, with little attention paid to the category-specific data. In this paper, we present a novel method, the Category-Specific Prompt (CSP), which integrates task-oriented information into our model, thereby augmenting its capacity to comprehend and execute complex tasks. In order to enhance the exploitation of features, thereby optimizing the utilization of the combination of category-specific and non-specific prompts, we introduce a novel deep prompt-learning method, Deep Contextual Prompt Enhancement (DCPE). DCPE outputs features with rich text embedding knowledge that changes in response to input through attention-based interactions, thereby ensuring that our model contains instance-oriented information. Combining the above two methods, our architecture CSP-DCPE contains both task-oriented and instance-oriented information, and achieves state-of-the-art average scores on 11 benchmark image-classification datasets. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Comparison of CSP-DCPE with existing prompt-learning methods. (<b>a</b>) Existing prompt-learning approaches that use only non-specific prompts to fine-tune CLIP, ignoring category-related information. (<b>b</b>) CSP-DCPE introduces Category-Specific Prompts with rich category-related information on the textual encoder and enhances the exploitation of features derived from the preceding layer by introducing DCPE, thereby optimizing the utilization of the combination of category-specific and non-specific prompts. DCPE works by adding attention between encoder layers, combining the deep prompt with the previous layer’s output to serve as input for the next layer. This improves the extraction of feature information and enhances overall model performance.</p>
Full article ">Figure 2
<p>Overall architecture of <b>CSP-DCPE.</b> Our architecture introduces a <b>C</b>ategory-<b>S</b>pecific <b>P</b>rompt (<b>CSP</b>), which provides category information for image classification and interacts fully with image features obtained from the visual encoder. This allows our model to contain task-oriented information. To extract textual features more effectively from the combination of non-specific and Category-Specific Prompts, we have introduced <b>D</b>eep <b>C</b>ontextual <b>P</b>rompt <b>E</b>nhancement (<b>DCPE</b>). DCPE functions by inserting attention modules between encoder layers. It merges the deep prompts with the output from the preceding layer, which then becomes the input for the subsequent layer. This mechanism enhances the extraction of feature information and boosts the overall performance of the model.</p>
Full article ">Figure 3
<p>A thorough comparison of CSP-DCPE with prior methods in FSL shows that CSP-DCPE significantly improves performance in eight datasets out of eleven, resulting in a remarkable boost in overall average performance.</p>
Full article ">Figure 4
<p>Investigations on prompt’s hyper-parameters.</p>
Full article ">
17 pages, 2608 KiB  
Article
Colorimetric and Photobiological Properties of Light Transmitted Through Low-Vision Filters: Simulated Potential Impact on ipRGCs Responses Considering Crystalline Lens Aging
by Ana Sanchez-Cano, Elvira Orduna-Hospital and Justiniano Aporta
Life 2025, 15(2), 261; https://doi.org/10.3390/life15020261 - 8 Feb 2025
Viewed by 506
Abstract
This study aims to investigate the potential impact of commercial low-vision filters on intrinsically photosensitive retinal ganglion cells (ipRGCs), which have significantly advanced our understanding of non-image-forming visual functions. A comprehensive analysis by modeling the potential responses of ipRGCs to commercially available low-vision [...] Read more.
This study aims to investigate the potential impact of commercial low-vision filters on intrinsically photosensitive retinal ganglion cells (ipRGCs), which have significantly advanced our understanding of non-image-forming visual functions. A comprehensive analysis by modeling the potential responses of ipRGCs to commercially available low-vision filters was conducted, focusing on how the spectral properties of these filters could alter ipRGC function. Additionally, the influence of aging on the crystalline lens was considered. Colorimetric changes in the transmitted light by these filters were also analyzed, highlighting variations based on the manufacturer. The study uncovered the diverse responses of ipRGCs to fifty low-vision filters, shedding light on the potential modifications in ipRGC stimulation and visual function. Notably, the consideration of aging in the crystalline lens revealed significant alterations in ipRGC response. Furthermore, the analysis of colorimetric changes demonstrated substantial differences in the light transmitted by these filters, with variations dependent on the manufacturer. This research underscores the nuanced relationship between low-vision filters and ipRGCs, providing insights into their potential impact on visual function. The varying responses observed, coupled with the influence of aging on the crystalline lens, emphasize the complexity of this interaction. Additionally, the distinct colorimetric changes based on filter manufacturer suggest the need for tailored approaches in enhancing visual perception for individuals with visual impairments. Full article
(This article belongs to the Special Issue Feature Paper in Physiology and Pathology: 2nd Edition)
Show Figures

Figure 1

Figure 1
<p><b>Left</b>: The relative spectral sensitivities of melanopic (blue) and photopic (orange) light. <b>Right</b>: The spectral transmittances of the crystalline lens depending on age according to the NPR-CEN/TR 16791 document [<a href="#B37-life-15-00261" class="html-bibr">37</a>]. This figure shows the transmittance for each decade of life, with a 32-year-old subject as the standard. It can be observed that the older the individual is, the greater the absorption of lower wavelengths by the crystalline lens.</p>
Full article ">Figure 2
<p>The spectral transmittance from 380 nm to 780 nm for the different low-vision filters analyzed. In order from top to bottom, the graphs correspond to AVS, ESSILOR, HOYA, ML, PRATS, and ZEISS.</p>
Full article ">Figure 3
<p>A chromatic diagram representing the location of D65 and the low-vision filters evaluated. The numbers “#n” are the different filters enumerated in <a href="#life-15-00261-t002" class="html-table">Table 2</a>.</p>
Full article ">Figure 4
<p>Melanopic lighting (mel-EDI) values as a function of age at the retinal plane after light passes through the filters, analyzed by filters, and compared to the values for the standard illuminant D65; from a 10-year-old observer to an 80-year-old observer, this is a representation of each decade of life, indicating in increments of 10 years.</p>
Full article ">
Back to TopTop