[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,312)

Search Parameters:
Keywords = privacy preservation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
35 pages, 1589 KiB  
Review
Federated Learning in Smart Healthcare: A Comprehensive Review on Privacy, Security, and Predictive Analytics with IoT Integration
by Syed Raza Abbas, Zeeshan Abbas, Arifa Zahir and Seung Won Lee
Healthcare 2024, 12(24), 2587; https://doi.org/10.3390/healthcare12242587 (registering DOI) - 22 Dec 2024
Abstract
Federated learning (FL) is revolutionizing healthcare by enabling collaborative machine learning across institutions while preserving patient privacy and meeting regulatory standards. This review delves into FL’s applications within smart health systems, particularly its integration with IoT devices, wearables, and remote monitoring, which empower [...] Read more.
Federated learning (FL) is revolutionizing healthcare by enabling collaborative machine learning across institutions while preserving patient privacy and meeting regulatory standards. This review delves into FL’s applications within smart health systems, particularly its integration with IoT devices, wearables, and remote monitoring, which empower real-time, decentralized data processing for predictive analytics and personalized care. It addresses key challenges, including security risks like adversarial attacks, data poisoning, and model inversion. Additionally, it covers issues related to data heterogeneity, scalability, and system interoperability. Alongside these, the review highlights emerging privacy-preserving solutions, such as differential privacy and secure multiparty computation, as critical to overcoming FL’s limitations. Successfully addressing these hurdles is essential for enhancing FL’s efficiency, accuracy, and broader adoption in healthcare. Ultimately, FL offers transformative potential for secure, data-driven healthcare systems, promising improved patient outcomes, operational efficiency, and data sovereignty across the healthcare ecosystem. Full article
(This article belongs to the Special Issue Artificial Intelligence in Healthcare: Opportunities and Challenges)
15 pages, 557 KiB  
Review
Federated Learning: Breaking Down Barriers in Global Genomic Research
by Giulia Calvino, Cristina Peconi, Claudia Strafella, Giulia Trastulli, Domenica Megalizzi, Sarah Andreucci, Raffaella Cascella, Carlo Caltagirone, Stefania Zampatti and Emiliano Giardina
Genes 2024, 15(12), 1650; https://doi.org/10.3390/genes15121650 (registering DOI) - 22 Dec 2024
Abstract
Recent advancements in Next-Generation Sequencing (NGS) technologies have revolutionized genomic research, presenting unprecedented opportunities for personalized medicine and population genetics. However, issues such as data silos, privacy concerns, and regulatory challenges hinder large-scale data integration and collaboration. Federated Learning (FL) has emerged as [...] Read more.
Recent advancements in Next-Generation Sequencing (NGS) technologies have revolutionized genomic research, presenting unprecedented opportunities for personalized medicine and population genetics. However, issues such as data silos, privacy concerns, and regulatory challenges hinder large-scale data integration and collaboration. Federated Learning (FL) has emerged as a transformative solution, enabling decentralized data analysis while preserving privacy and complying with regulations such as the General Data Protection Regulation (GDPR). This review explores the potential use of FL in genomics, detailing its methodology, including local model training, secure aggregation, and iterative improvement. Key challenges, such as heterogeneous data integration and cybersecurity risks, are examined alongside regulations like GDPR. In conclusion, successful implementations of FL in global and national initiatives demonstrate its scalability and role in supporting collaborative research. Finally, we discuss future directions, including AI integration and the necessity of education and training, to fully harness the potential of FL in advancing precision medicine and global health initiatives. Full article
(This article belongs to the Special Issue Bioinformatics and Computational Genomics)
22 pages, 1890 KiB  
Article
FedSparse: A Communication-Efficient Federated Learning Framework Based on Sparse Updates
by Jiachen Li, Yuchao Zhang, Yiping Li, Xiangyang Gong and Wendong Wang
Electronics 2024, 13(24), 5042; https://doi.org/10.3390/electronics13245042 (registering DOI) - 22 Dec 2024
Viewed by 122
Abstract
Federated learning (FL) strikes a balance between privacy preservation and collaborative model training. However, the periodic transmission of model updates or parameters from each client to the federated server incurs substantial communication overhead, especially for participants with limited network bandwidth. This overhead significantly [...] Read more.
Federated learning (FL) strikes a balance between privacy preservation and collaborative model training. However, the periodic transmission of model updates or parameters from each client to the federated server incurs substantial communication overhead, especially for participants with limited network bandwidth. This overhead significantly hampers the practical applicability of FL in real-world scenarios. To address this challenge, we propose FedSparse, an innovative sparse communication framework designed to enhance communication efficiency. The core idea behind FedSparse is to introduce a communication overhead regularization term into the client’s objective function, thereby reducing the number of parameters that need to be transmitted. FedSparse incorporates a Resource Optimization Proximal (ROP) term and an Importance-based Regularization Weighting (IRW) mechanism into the client update objective function. The local update process optimizes both the empirical risk and communication overhead by applying a sparse regularization weighted by update importance. By making minimal modifications to traditional FL approaches, FedSparse effectively reduces the number of parameters transmitted, thereby decreasing the communication overhead. We evaluate the effectiveness of FedSparse through experiments on various datasets under non-independent and identically distributed (non-IID) conditions, demonstrating its flexibility in resource-constrained environments. On the MNIST, Fashion-MNIST, and CIFAR datasets, FedSparse reduces the communication overhead by 24%, 17%, and 5%, respectively, compared to the baseline algorithm, while maintaining similar model performance. Additionally, on simulated non-IID datasets, FedSparse achieves a 6% to 8% reduction in communication resource consumption. By adjusting the sparsity intensity hyperparameter, we demonstrate that FedSparse can be tailored to different FL applications with varying communication resource constraints. Finally, ablation studies highlight the individual contributions of the ROP and IRW modules to the overall improvement in communication efficiency. Full article
Show Figures

Figure 1

Figure 1
<p>The framework of federated learning.</p>
Full article ">Figure 2
<p>Comparison of the convergence processes of the model across three datasets.</p>
Full article ">Figure 3
<p>A comparison of the average communication sparsity rates and the sparsity rate variations during the convergence process across the MNIST, Fashion, and CIFAR datasets.</p>
Full article ">Figure 4
<p>Comparison of convergence performance under different non-IID settings.</p>
Full article ">Figure 5
<p>Comparison of communication sparsity rates under different non-IID settings.</p>
Full article ">Figure 6
<p>Model convergence performance under different sparsity intensity (SI) settings.</p>
Full article ">Figure 7
<p>Model accuracy across convergence process at different SI settings.</p>
Full article ">Figure 8
<p>The variation of sparsity rates during the convergence process under different SI settings.</p>
Full article ">Figure 9
<p>Average communication sparsity rate under different SI settings.</p>
Full article ">Figure 10
<p>On the Fashion-MNIST and MNIST datasets, we conduct ablation studies to evaluate the impact of ROP and the additional introduction of the IRW module on the convergence performance and communication sparsity rates. Figure (<b>a</b>) illustrates the convergence performance of the different methods, while (<b>b</b>,<b>c</b>) show the communication sparsity rates during the convergence process and the average communication sparsity rates for each method, respectively.</p>
Full article ">
28 pages, 1588 KiB  
Article
Sybil Attack-Resistant Blockchain-Based Proof-of-Location Mechanism with Privacy Protection in VANET
by Narayan Khatri, Sihyung Lee and Seung Yeob Nam
Sensors 2024, 24(24), 8140; https://doi.org/10.3390/s24248140 - 20 Dec 2024
Viewed by 265
Abstract
In this paper, we propose a Proof-of-Location (PoL)-based location verification scheme for mitigating Sybil attacks in vehicular ad hoc networks (VANETs). For this purpose, we employ smart contracts for storing the location information of the vehicles. This smart contract is maintained by Road [...] Read more.
In this paper, we propose a Proof-of-Location (PoL)-based location verification scheme for mitigating Sybil attacks in vehicular ad hoc networks (VANETs). For this purpose, we employ smart contracts for storing the location information of the vehicles. This smart contract is maintained by Road Side Units (RSUs) and acts as a ground truth for verifying the position information of the neighboring vehicles. To avoid the storage of fake location information inside the smart contract, vehicles need to solve unique computational puzzles generated by the neighboring RSUs in a limited time frame whenever they need to report their location information. Assuming a vehicle has a single Central Processing Unit (CPU) and parallel processing is not allowed, it can solve a single computational puzzle in a given time period. With this approach, the vehicles with multiple fake identities are prevented from solving multiple puzzles at a time. In this way, we can mitigate a Sybil attack and avoid the storage of fake location information in a smart contract table. Furthermore, the RSUs maintain a dedicated blockchain for storing the location information of neighboring vehicles. They take part in mining for the purpose of storing the smart contract table in the blockchain. This scheme guarantees the privacy of the vehicles, which is achieved with the help of a PoL privacy preservation mechanism. The verifier can verify the locations of the vehicles without revealing their privacy. Experimental results show that the proposed mechanism is effective in mitigating Sybil attacks in VANET. According to the experiment results, our proposed scheme provides a lower fake location registration probability, i.e., lower than 10%, compared to other existing approaches. Full article
(This article belongs to the Special Issue AI-Based Security and Privacy for IoT Applications)
Show Figures

Figure 1

Figure 1
<p>The system framework of the proposed work.</p>
Full article ">Figure 2
<p>Flowchart describing the outline of the proposed PoL scheme.</p>
Full article ">Figure 3
<p>Message exchanges between vehicle, RSU, smart contract, and verifier.</p>
Full article ">Figure 4
<p>Distance traversed by a vehicle when the communication range of the RSU and the distance between the RSU and the road is given.</p>
Full article ">Figure 5
<p>Vehicle registration in the Ethereum test network.</p>
Full article ">Figure 6
<p>Remix IDE environment for VANET blockchain.</p>
Full article ">Figure 7
<p>Execution logs for vehicle registration.</p>
Full article ">Figure 8
<p>Transaction cost and Gas cost for smart contract functions.</p>
Full article ">Figure 9
<p>Map of Erlangen used for VANET traffic generation.</p>
Full article ">Figure 10
<p>Sampling probability vs. polling interval (sampling time = 1 s, D = number of leading zeros of the difficulty target).</p>
Full article ">Figure 11
<p>Sampling probability (polling interval = 30 s).</p>
Full article ">Figure 12
<p>Fake location registration probability comparison (polling interval = 30 s) [<a href="#B20-sensors-24-08140" class="html-bibr">20</a>].</p>
Full article ">Figure 13
<p>Malicious block insertion probability comparison [<a href="#B20-sensors-24-08140" class="html-bibr">20</a>].</p>
Full article ">Figure 14
<p>Event message propagation delay [<a href="#B17-sensors-24-08140" class="html-bibr">17</a>,<a href="#B19-sensors-24-08140" class="html-bibr">19</a>].</p>
Full article ">
17 pages, 2997 KiB  
Article
Private Data Protection with Machine Unlearning in Contrastive Learning Networks
by Kongyang Chen, Zixin Wang and Bing Mi
Mathematics 2024, 12(24), 4001; https://doi.org/10.3390/math12244001 - 20 Dec 2024
Viewed by 286
Abstract
The security of AI models poses significant challenges, as sensitive user information can potentially be inferred from the models, leading to privacy breaches. To address this, machine unlearning methods aim to remove specific data from a trained model, effectively eliminating the training traces [...] Read more.
The security of AI models poses significant challenges, as sensitive user information can potentially be inferred from the models, leading to privacy breaches. To address this, machine unlearning methods aim to remove specific data from a trained model, effectively eliminating the training traces of those data. However, most existing approaches focus primarily on supervised learning scenarios, leaving the unlearning of contrastive learning models underexplored. This paper proposes a novel fine-tuning-based unlearning method tailored for contrastive learning models. The approach introduces a third-party dataset to ensure that the model’s outputs for forgotten data align with those of the third-party dataset, thereby removing identifiable training traces. A comprehensive loss function is designed, encompassing three objectives: preserving model accuracy, constraining gradients to make forgotten and third-party data indistinguishable, and reducing model confidence on the third-party dataset. The experimental results demonstrate the effectiveness of the proposed method. Membership inference attacks conducted before and after unlearning show that the forgotten data’s prediction distribution becomes indistinguishable from that of the third-party data, validating the success of the unlearning process. Moreover, the proposed method achieves this with minimal performance degradation, making it suitable for practical applications in privacy-preserving AI. Full article
(This article belongs to the Special Issue AI Security and Edge Computing in Distributed Edge Systems)
Show Figures

Figure 1

Figure 1
<p>Use of gradient penalty before.</p>
Full article ">Figure 2
<p>Use of gradient penalty after.</p>
Full article ">Figure 3
<p>The graph represents the shape of the distribution of the predicted probabilities of the data for models with different degrees of overfitting.</p>
Full article ">Figure 4
<p>The graph represents the change in cosine similarity between the training and non-training data in the later stages before performing machine unlearning.</p>
Full article ">Figure 5
<p>This figure shows the change in loss of a batch of training and non-training data before and after performing machine unlearning.</p>
Full article ">Figure 6
<p>After using our unlearning method, the training data become as dispersed in space as, and highly differentiated from, the non-training data.</p>
Full article ">Figure 7
<p>The graph represents the same t-SNE dimensionality reduction for the member data to observe the distribution of the training and non-training data in each category before and after machine unlearning.</p>
Full article ">Figure 8
<p>The graph represents the same t-SNE dimensionality reduction to observe the distribution of the training and non-training data in each category before and after machine unlearning.</p>
Full article ">
20 pages, 2278 KiB  
Article
Enhanced Fall Detection Using YOLOv7-W6-Pose for Real-Time Elderly Monitoring
by Eugenia Tîrziu, Ana-Mihaela Vasilevschi, Adriana Alexandru and Eleonora Tudora
Future Internet 2024, 16(12), 472; https://doi.org/10.3390/fi16120472 - 19 Dec 2024
Viewed by 291
Abstract
This study aims to enhance elderly fall detection systems by using the YOLO (You Only Look Once) object detection algorithm with pose estimation, improving both accuracy and efficiency. Utilizing YOLOv7-W6-Pose’s robust real-time object detection and pose estimation capabilities, the proposed system can effectively [...] Read more.
This study aims to enhance elderly fall detection systems by using the YOLO (You Only Look Once) object detection algorithm with pose estimation, improving both accuracy and efficiency. Utilizing YOLOv7-W6-Pose’s robust real-time object detection and pose estimation capabilities, the proposed system can effectively identify falls in video feeds by using a webcam and process them in real-time on a high-performance computer equipped with a GPU to accelerate object detection and pose estimation algorithms. YOLO’s single-stage detection mechanism enables quick processing and analysis of video frames, while pose estimation refines this process by analyzing body positions and movements to accurately distinguish falls from other activities. Initial validation was conducted using several free videos sourced online, depicting various types of falls. To ensure real-time applicability, additional tests were conducted with videos recorded live using a webcam, simulating dynamic and unpredictable conditions. The experimental results demonstrate significant advancements in detection accuracy and robustness compared to traditional methods. Furthermore, the approach ensures data privacy by processing only skeletal points derived from pose estimation, with no personal data stored. This approach, integrated into the NeuroPredict platform developed by our team, advances fall detection technology, supporting better care and safety for older adults. Full article
(This article belongs to the Special Issue Artificial Intelligence-Enabled Smart Healthcare)
Show Figures

Figure 1

Figure 1
<p>Recent approaches to fall detection.</p>
Full article ">Figure 2
<p>Flowchart of the fall detection system.</p>
Full article ">Figure 3
<p>Real-time fall detection alerts.</p>
Full article ">Figure 4
<p>Captured frames from videos in diverse conditions: (<b>a</b>) identification of a person; (<b>b</b>) identification of a person bending over; (<b>c</b>) identification of a person sitting on the chair; (<b>d</b>) no person present; (<b>e</b>) identification of falling; (<b>f</b>) fall detection in environments with very low light levels; (<b>g</b>) fall detection in environments with intense lighting conditions.</p>
Full article ">Figure 5
<p>Confusion matrix.</p>
Full article ">
17 pages, 1183 KiB  
Article
GAN-Based Novel Approach for Generating Synthetic Medical Tabular Data
by Rashid Nasimov, Nigorakhon Nasimova, Sanjar Mirzakhalilov, Gul Tokdemir, Mohammad Rizwan, Akmalbek Abdusalomov and Young-Im Cho
Bioengineering 2024, 11(12), 1288; https://doi.org/10.3390/bioengineering11121288 - 18 Dec 2024
Viewed by 361
Abstract
The generation of synthetic medical data has become a focal point for researchers, driven by the increasing demand for privacy-preserving solutions. While existing generative methods heavily rely on real datasets for training, access to such data is often restricted. In contrast, statistical information [...] Read more.
The generation of synthetic medical data has become a focal point for researchers, driven by the increasing demand for privacy-preserving solutions. While existing generative methods heavily rely on real datasets for training, access to such data is often restricted. In contrast, statistical information about these datasets is more readily available, yet current methods struggle to generate tabular data solely from statistical inputs. This study addresses the gaps by introducing a novel approach that converts statistical data into tabular datasets using a modified Generative Adversarial Network (GAN) architecture. A custom loss function was incorporated into the training process to enhance the quality of the generated data. The proposed method is evaluated using fidelity and utility metrics, achieving “Good” similarity and “Excellent” utility scores. While the generated data may not fully replace real databases, it demonstrates satisfactory performance for training machine-learning algorithms. This work provides a promising solution for synthetic data generation when real datasets are inaccessible, with potential applications in medical data privacy and beyond. Full article
(This article belongs to the Special Issue Medical Artificial Intelligence and Data Analysis)
Show Figures

Figure 1

Figure 1
<p>Steps of generating synthetic tabular data.</p>
Full article ">Figure 2
<p>Architecture of modified GAN.</p>
Full article ">Figure 3
<p>Results of generator and discriminator loss during training.</p>
Full article ">Figure 4
<p>Results of TSTR test carried out with different ML algorithms.</p>
Full article ">
23 pages, 1238 KiB  
Article
Leveraging Multiple Adversarial Perturbation Distances for Enhanced Membership Inference Attack in Federated Learning
by Fan Xia, Yuhao Liu, Bo Jin, Zheng Yu, Xingwei Cai, Hao Li, Zhiyong Zha, Dai Hou and Kai Peng
Symmetry 2024, 16(12), 1677; https://doi.org/10.3390/sym16121677 - 18 Dec 2024
Viewed by 249
Abstract
In recent years, federated learning (FL) has gained significant attention for its ability to protect data privacy during distributed training. However, it also introduces new privacy leakage risks. Membership inference attacks (MIAs), which aim to determine whether a specific sample is part of [...] Read more.
In recent years, federated learning (FL) has gained significant attention for its ability to protect data privacy during distributed training. However, it also introduces new privacy leakage risks. Membership inference attacks (MIAs), which aim to determine whether a specific sample is part of the training dataset, pose a significant threat to federated learning. Existing research on membership inference attacks in federated learning has primarily focused on leveraging intrinsic model parameters or manipulating the training process. However, the widespread adoption of privacy-preserving frameworks in federated learning has significantly diminished the effectiveness of traditional attack methods. To overcome this limitation, this paper aims to explore an efficient Membership Inference Attack algorithm tailored for encrypted federated learning scenarios, providing new perspectives for optimizing privacy-preserving technologies. Specifically, this paper proposes a novel Membership Inference Attack algorithm based on multiple adversarial perturbation distances (MAPD_MIA) by leveraging the asymmetry in adversarial perturbation distributions near decision boundaries between member and non-member samples. By analyzing these asymmetric perturbation characteristics, the algorithm achieves accurate membership identification. Experimental results demonstrate that the proposed algorithm achieves accuracy rates of 63.0%, 68.7%, and 59.5%, and precision rates of 59.0%, 65.9%, and 55.8% on CIFAR10, CIFAR100, and MNIST datasets, respectively, outperforming three mainstream Membership Inference Attack methods. Furthermore, the algorithm exhibits robust attack performance against two common defense mechanisms, MemGuard and DP-SGD. This study provides new benchmarks and methodologies for evaluating membership privacy leakage risks in federated learning scenarios. Full article
(This article belongs to the Section Engineering and Materials)
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) The minimum perturbation distances of training and testing samples near the decision boundary in CIFAR10; (<b>b</b>) Boxplot of multiple adversarial perturbation distances for training and testing samples in CIFAR10 with similar minimum perturbation distances (difference less than 0.02) near the decision boundary. The figure shows five groups of adversarial perturbations, with each group containing 100 adversarial points. The adversarial points in the same group have the same Euclidean distance from the minimum adversarial sample, and the Euclidean distances between adjacent groups differ by 1.5.</p>
Full article ">Figure 2
<p>Flowchart of the attack algorithm.</p>
Full article ">Figure 3
<p>Generate multiple adversarial perturbations near the boundary.</p>
Full article ">Figure 4
<p>Binary search algorithm.</p>
Full article ">Figure 5
<p>The effect of random noise magnitude on attack performance.</p>
Full article ">Figure 6
<p>Comparison of multiple adversarial perturbation variations in MNIST and CIFAR10 datasets. The figure shows five groups of adversarial perturbations with noise magnitudes of 1.5, 3.0, 4.5, 6.0, and 7.5. Each group contains 100 perturbation values.</p>
Full article ">Figure 7
<p>The effect of perturbation quantity within an adversarial perturbation group on attack performance.</p>
Full article ">Figure 8
<p>The effect of the number of adversarial perturbation groups on attack performance. When the number of groups is 1, it represents the perturbation group corresponding to a noise magnitude of 7.5; when the number of groups is 2, it represents the groups corresponding to noise magnitudes of 7.5 and 6.0, and so on.</p>
Full article ">
36 pages, 3115 KiB  
Article
The Role of Artificial Intelligence and Big Data Analytics in Shaping the Future of Professions in Industry 6.0: Perspectives from an Emerging Market
by Delia Deliu and Andrei Olariu
Electronics 2024, 13(24), 4983; https://doi.org/10.3390/electronics13244983 - 18 Dec 2024
Viewed by 466
Abstract
Digital technologies are fundamentally transforming professions by altering roles and redefining competencies across all sectors. The progression from computerization to digitization, digitalization, and now digital transformation has been driven by the widespread integration of artificial intelligence (AI) and big data analytics (BDA). Industry [...] Read more.
Digital technologies are fundamentally transforming professions by altering roles and redefining competencies across all sectors. The progression from computerization to digitization, digitalization, and now digital transformation has been driven by the widespread integration of artificial intelligence (AI) and big data analytics (BDA). Industry 4.0 introduced smart automation and connectivity, Industry 5.0 emphasized human–machine collaboration and personalization, and Industry 6.0 now integrates advanced technologies with sustainability and ethical considerations, exerting a profound influence on many professions. This transformation is especially significant in emerging markets, where AI and BDA are overhauling traditional practices and enhancing efficiency but also introducing new challenges. Focusing on the accounting profession, this paper examines AI’s and BDA’s dual impact on the roles and skill sets of professional accountants (PAs). Specifically, it addresses how these technologies shape the activities, interactions, roles, and competencies of PAs in an Industry 6.0 context, as well as the opportunities and challenges that arise. Given the public interest role of PAs in ensuring accuracy and transparency in financial reporting, understanding their perceptions and experiences of digital transformation is essential. The findings reveal that while AI and BDA drive efficiency gains and open strategic pathways, they also risk eroding core traditional accounting competencies, reducing client engagement, and raising ethical concerns such as data security and privacy—all of which can undermine service quality and, ultimately, public trust. These insights underscore the need for responsible AI and BDA integration, particularly in emerging markets, where digital literacy gaps and regulatory limitations may slow adoption. This study offers actionable recommendations for policymakers, educators, and organizations, highlighting the importance of ethical standards, targeted training, and sustainable practices to preserve the relevance and integrity of the accounting profession in an increasingly technology-driven era. Full article
(This article belongs to the Special Issue Future Trends of Artificial Intelligence (AI) and Big Data)
Show Figures

Figure 1

Figure 1
<p>Theoretical framework.</p>
Full article ">Figure 2
<p>Research workflow.</p>
Full article ">Figure 3
<p>Linear regression graph: experiences regarding the support provided by AI and BDA in performing the profession.</p>
Full article ">Figure 4
<p>The level of involvement of the Chamber of Financial Auditors of Romania in providing support for understanding and using AI and BDA in auditing.</p>
Full article ">Figure 5
<p>Respondents’ perception of the current level of professional preparedness of auditors.</p>
Full article ">Figure 6
<p>The most important tasks in the future considering the impact of AI and BDA.</p>
Full article ">Figure 7
<p>The most important skills in the context of the accelerated digitalization of the profession through AI and BDA.</p>
Full article ">Figure 8
<p>The extent to which the modification of International Auditing Standards is considered necessary.</p>
Full article ">
29 pages, 9712 KiB  
Article
Cloud–Edge–End Collaborative Federated Learning: Enhancing Model Accuracy and Privacy in Non-IID Environments
by Ling Li, Lidong Zhu and Weibang Li
Sensors 2024, 24(24), 8028; https://doi.org/10.3390/s24248028 - 16 Dec 2024
Viewed by 307
Abstract
Cloud–edge–end computing architecture is crucial for large-scale edge data processing and analysis. However, the diversity of terminal nodes and task complexity in this architecture often result in non-independent and identically distributed (non-IID) data, making it challenging to balance data heterogeneity and privacy protection. [...] Read more.
Cloud–edge–end computing architecture is crucial for large-scale edge data processing and analysis. However, the diversity of terminal nodes and task complexity in this architecture often result in non-independent and identically distributed (non-IID) data, making it challenging to balance data heterogeneity and privacy protection. To address this, we propose a privacy-preserving federated learning method based on cloud–edge–end collaboration. Our method fully considers the three-tier architecture of cloud–edge–end systems and the non-IID nature of terminal node data. It enhances model accuracy while protecting the privacy of terminal node data. The proposed method groups terminal nodes based on the similarity of their data distributions and constructs edge subnetworks for training in collaboration with edge nodes, thereby mitigating the negative impact of non-IID data. Furthermore, we enhance WGAN-GP with attention mechanism to generate balanced synthetic data while preserving key patterns from original datasets, reducing the adverse effects of non-IID data on global model accuracy while preserving data privacy. In addition, we introduce data resampling and loss function weighting strategies to mitigate model bias caused by imbalanced data distribution. Experimental results on real-world datasets demonstrate that our proposed method significantly outperforms existing approaches in terms of model accuracy, F1-score, and other metrics. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

Figure 1
<p>Federated learning framework for cloud–edge–end architecture.</p>
Full article ">Figure 2
<p>Illustration of non-IID client data in federated learning.</p>
Full article ">Figure 3
<p>Generator structure of WGAN-GP after adding the self-attention layer.</p>
Full article ">Figure 4
<p>Discriminator structure of WGAN-GP after adding the self-attention layer.</p>
Full article ">Figure 5
<p>Examples of original MNIST dataset and WGAN-GP generated dataset.</p>
Full article ">Figure 6
<p>Examples of AnnualCrop label category from original EuroSAT and WGAN-GP generated dataset of the same label category.</p>
Full article ">Figure 7
<p>Performance of CEECFed, FedGS, FedAvg, and FedSGD based on the original MNIST dataset. (<b>a</b>) Accuracy, (<b>b</b>) Precision, (<b>c</b>) Recall, (<b>d</b>) F1-Score, (<b>e</b>) Average Loss.</p>
Full article ">Figure 8
<p>Performance of CEECFed, FedGS, FedAvg, and FedSGD based on the original EuroSAT dataset. (<b>a</b>) Accuracy, (<b>b</b>) Precision, (<b>c</b>) Recall, (<b>d</b>) F1-Score, (<b>e</b>) Average Loss.</p>
Full article ">Figure 9
<p>Performance of CEECFed based on original MNIST dataset and WGAN-GP generated datasets. (<b>a</b>) Accuracy, (<b>b</b>) Precision, (<b>c</b>) Recall, (<b>d</b>) F1-Score, (<b>e</b>) Average Loss.</p>
Full article ">Figure 10
<p>Performance of FedAvg based on original MNIST dataset and WGAN-GP generated datasets. (<b>a</b>) Accuracy, (<b>b</b>) Precision, (<b>c</b>) Recall, (<b>d</b>) F1-Score, (<b>e</b>) Average Loss.</p>
Full article ">Figure 11
<p>Performance of FedSGD based on original MNIST dataset and WGAN-GP generated datasets. (<b>a</b>) Accuracy, (<b>b</b>) Precision, (<b>c</b>) Recall, (<b>d</b>) F1-Score, (<b>e</b>) Average Loss.</p>
Full article ">
16 pages, 2125 KiB  
Article
Doubly Structured Data Synthesis for Time-Series Energy-Use Data
by Jiwoo Kim, Changhoon Lee, Jehoon Jeon, Jungwoong Choi and Joseph H. T. Kim
Sensors 2024, 24(24), 8033; https://doi.org/10.3390/s24248033 - 16 Dec 2024
Viewed by 313
Abstract
As the demand for efficient energy management increases, the need for extensive, high-quality energy data becomes critical. However, privacy concerns and insufficient data volume pose significant challenges. To address these issues, data synthesis techniques are employed to augment and replace real data. This [...] Read more.
As the demand for efficient energy management increases, the need for extensive, high-quality energy data becomes critical. However, privacy concerns and insufficient data volume pose significant challenges. To address these issues, data synthesis techniques are employed to augment and replace real data. This paper introduces Doubly Structured Data Synthesis (DS2), a novel method to tackle privacy concerns in time-series energy-use data. DS2 synthesizes rate changes to maintain longitudinal information and uses calibration techniques to preserve the cross-sectional mean structure at each time point. Numerical analyses reveal that DS2 surpasses existing methods, such as Conditional Tabular GAN (CTGAN) and Transformer-based Time-Series Generative Adversarial Network (TTS-GAN), in capturing both time-series and cross-sectional characteristics. We evaluated our proposed method using metrics for data similarity, utility, and privacy. The results indicate that DS2 effectively retains the underlying characteristics of real datasets while ensuring adequate privacy protection. DS2 is a valuable tool for sharing and utilizing energy data, significantly enhancing energy demand prediction and management. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Figure 1
<p>Diagram illustrating the overall process of the proposed methods.</p>
Full article ">Figure 2
<p>Illustrative examples of density similarity in monthly households’ electricity use (<b>Top</b>: Condominium 1, <b>Middle</b>: Condominium 2, <b>Bottom</b>: Condominium 3).</p>
Full article ">Figure 3
<p>Data similarity: monthly electricity usage in kWh for Condominium 1.</p>
Full article ">Figure 4
<p>Illustrative examples of density similarity in monthly households’ electricity bills (<b>Top</b>: Condominium 1, <b>Middle</b>: Condominium 2, <b>Bottom</b>: Condominium 3).</p>
Full article ">Figure 5
<p>DUPI plot (<b>Left</b>: Condominium 1, <b>Middle</b>: Condominium 2, <b>Right</b>: Condominium 3).</p>
Full article ">
32 pages, 3135 KiB  
Review
Non-IID Medical Imaging Data on COVID-19 in the Federated Learning Framework: Impact and Directions
by Fatimah Saeed Alhafiz and Abdullah Ahmad Basuhail
COVID 2024, 4(12), 1985-2016; https://doi.org/10.3390/covid4120140 - 16 Dec 2024
Viewed by 444
Abstract
After first appearing in December 2019, coronavirus disease 2019 (COVID-19) spread rapidly, leading to global effects and significant risks to health systems. The virus’s high replication competence in the human lung accelerated the severity of lung pneumonia cases, resulting in a catastrophic death [...] Read more.
After first appearing in December 2019, coronavirus disease 2019 (COVID-19) spread rapidly, leading to global effects and significant risks to health systems. The virus’s high replication competence in the human lung accelerated the severity of lung pneumonia cases, resulting in a catastrophic death rate. Variable observations in the clinical testing of virus-related and patient-related cases across different populations led to ambiguous results. Medical and epidemiological studies on the virus effectively use imaging and scanning devices to help explain the virus’s behavior and its impact on the lungs. Varying equipment resources and a lack of uniformity in medical imaging acquisition led to disorganized and widely dispersed data collection worldwide, while high heterogeneity in datasets caused a poor understanding of the virus and related strains, consequently leading to unstable results that could not be generalized. Hospitals and medical institutions, therefore, urgently need to collaborate to share and extract useful knowledge from these COVID-19 datasets while preserving the privacy of medical records. Researchers are turning to an emerging technology that enhances the reliability and accessibility of information without sharing actual patient data. Federated learning (FL) is a technique that learns distributed data locally, sharing only the weights of each local model to compute a global model, and has the potential to improve the generalization of diagnosis and treatment decisions. This study investigates the applicability of FL for COVID-19 under the impact of data heterogeneity, defining the lung imaging characteristics and identifying the practical constraints of FL in medical fields. It describes the challenges of implementation from a technical perspective, with reference to valuable research directions, and highlights the research challenges that present opportunities for further efforts to overcome the pitfalls of distributed learning performance. The primary objective of this literature review is to provide valuable insights that will aid in the formulation of effective technical strategies to mitigate the impact of data heterogeneity on the generalization of FL results, particularly in light of the ongoing and evolving COVID-19 pandemic. Full article
Show Figures

Figure 1

Figure 1
<p>Training techniques for distributed data: (<b>a</b>) individual training technique, (<b>b</b>) centralizing technique, and (<b>c</b>) federated learning technique.</p>
Full article ">Figure 2
<p>The algorithm of central FL architecture.</p>
Full article ">Figure 3
<p>The algorithm of peer-to-peer architecture.</p>
Full article ">Figure 4
<p>Skewness type examples including (<b>a</b>) quantity skew example, (<b>b</b>) label distribution skew example, (<b>c</b>) extreme label skew example, (<b>d</b>) acquisition protocol skew example, (<b>e</b>) modality skew, and (<b>f</b>) feature skew example.</p>
Full article ">Figure 5
<p>The number of investigations of skewness types and the impact of each on the FL performance (collected from considered papers, as referred to in each skewness-type section).</p>
Full article ">Figure 6
<p>Type of lung imaging dataset modalities used in FL framework for COVID-19.</p>
Full article ">Figure 7
<p>The average accuracy of the models that correspond to the skewness type.</p>
Full article ">
20 pages, 4577 KiB  
Article
FedLSTM: A Federated Learning Framework for Sensor Fault Detection in Wireless Sensor Networks
by Rehan Khan, Umer Saeed and Insoo Koo
Electronics 2024, 13(24), 4907; https://doi.org/10.3390/electronics13244907 - 12 Dec 2024
Viewed by 472
Abstract
The rapid growth of Internet of Things (IoT) devices has significantly increased reliance on sensor-generated data, which are essential to a wide range of systems and services. Wireless sensor networks (WSNs), crucial to this ecosystem, are often deployed in diverse and challenging environments, [...] Read more.
The rapid growth of Internet of Things (IoT) devices has significantly increased reliance on sensor-generated data, which are essential to a wide range of systems and services. Wireless sensor networks (WSNs), crucial to this ecosystem, are often deployed in diverse and challenging environments, making them susceptible to faults such as software bugs, communication breakdowns, and hardware malfunctions. These issues can compromise data accuracy, stability, and reliability, ultimately jeopardizing system security. While advanced sensor fault detection methods in WSNs leverage a machine learning approach to achieve high accuracy, they typically rely on centralized learning, and face scalability and privacy challenges, especially when transferring large volumes of data. In our experimental setup, we employ a decentralized approach using federated learning with long short-term memory (FedLSTM) for sensor fault detection in WSNs, thereby preserving client privacy. This study utilizes temperature data enhanced with synthetic sensor data to simulate various common sensor faults: bias, drift, spike, erratic, stuck, and data-loss. We evaluate the performance of FedLSTM against the centralized approach based on accuracy, precision, sensitivity, and F1-score. Additionally, we analyze the impacts of varying the client participation rates and the number of local training epochs. In federated learning environments, comparative analysis with established models like the one-dimensional convolutional neural network and multilayer perceptron demonstrate the promising results of FedLSTM in maintaining client privacy while reducing communication overheads and the server load. Full article
(This article belongs to the Special Issue Advances in Cyber-Security and Machine Learning)
Show Figures

Figure 1

Figure 1
<p>IoT applications connected to a central base station in various smart environments.</p>
Full article ">Figure 2
<p>Communication setups in IoT-based wireless sensor networks: (<b>a</b>) single-hop and (<b>b</b>) multi-hop scenario.</p>
Full article ">Figure 3
<p>Representative plots of the various faults monitored by the proposed FedLSTM for distributed sensor fault detection and employed across multiple clients (sensors).</p>
Full article ">Figure 4
<p>Framework of the proposed FedLSTM for distributed sensor fault detection. Each client trains its local model and collaborates with a central server to build the global model.</p>
Full article ">Figure 5
<p>The proposed workflow for sensor fault diagnosis in WSNs depicts the following stages: data acquisition from multiple sensors, data preprocessing, including generating synthetic data for various common sensor faults, data partitioning for distributed storage, local model training, where clients train models locally using their respective datasets, and global model aggregation using FL for fault detection and classification.</p>
Full article ">Figure 6
<p>Comparison of FedLSTM and the centralized model for sensor fault detection in WSNs.</p>
Full article ">Figure 7
<p>Performance of the FedLSTM model in terms of (<b>a</b>) accuracy and (<b>b</b>) loss over 50 communication rounds, in single-hop, multi-hop, and combined (single-hop and multi-hop) scenarios.</p>
Full article ">Figure 8
<p>Confusion matrices for (<b>a</b>) FedLSTM and (<b>b</b>) the centralized model from multiclass sensor-fault detection.</p>
Full article ">Figure 9
<p>Impact from varying the number of local epochs <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math> on (<b>a</b>) accuracy and (<b>b</b>) loss convergence of the FedLSTM model across 50 communication rounds.</p>
Full article ">Figure 10
<p>(<b>a</b>) Accuracy and (<b>b</b>) loss from FedLSTM, the 1D-CNN, and MLP.</p>
Full article ">
27 pages, 1826 KiB  
Article
Backdoor Attack Against Dataset Distillation in Natural Language Processing
by Yuhao Chen, Weida Xu, Sicong Zhang and Yang Xu
Appl. Sci. 2024, 14(23), 11425; https://doi.org/10.3390/app142311425 - 9 Dec 2024
Viewed by 640
Abstract
Dataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep neural network [...] Read more.
Dataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep neural network (DNN) model, which remain susceptible to security and privacy vulnerabilities (e.g., backdoor attacks). Existing studies have primarily focused on optimizing the balance between computational efficiency and model performance, overlooking the accompanying security and privacy risks. This study presents the first backdoor attack targeting NLP models trained on distilled datasets. We introduce malicious triggers into synthetic data during the distillation phase to execute a backdoor attack on downstream models trained with these data. We employ several widely used datasets to assess how different architectures and dataset distillation techniques withstand our attack. The experimental findings reveal that the attack achieves strong performance with a high (above 0.9 and up to 1.0) attack success rate (ASR) in most cases. For backdoor attacks, high attack performance often comes at the cost of reduced model utility. Our attack maintains high ASR while maximizing the preservation of downstream model utility, as evidenced by results showing that the clean test accuracy (CTA) of the backdoored model is very close to that of the clean model. Additionally, we performed comprehensive ablation studies to identify key factors affecting attack performance. We tested our attack method against five defense strategies, including NAD, Neural Cleanse, ONION, SCPD, and RAP. The experimental results show that these defense methods are unable to reduce the attack success rate without compromising the model’s performance on normal tasks. Therefore, these methods cannot effectively defend against our attack. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Overview of dataset distillation.</p>
Full article ">Figure 2
<p>Process of backdoor attack in NLP.</p>
Full article ">Figure 3
<p>Overview of BAMDD-NLP against DwAL.</p>
Full article ">Figure 4
<p>Overview of BAMDD-NLP against DiLM.</p>
Full article ">Figure 5
<p>(<b>a</b>) The <span class="html-italic">ASR</span> scores of BAMDD-NLP across various poisoning ratios and model architectures for DwAL and (<b>b</b>) the <span class="html-italic">CTA</span> scores of BAMDD-NLP across various poisoning ratios and model architectures for DwAL.</p>
Full article ">Figure 6
<p>(<b>a</b>) The <span class="html-italic">ASR</span> scores of BAMDD-NLP across various poisoning ratios and model architectures for DiLM and (<b>b</b>) the <span class="html-italic">CTA</span> scores of BAMDD-NLP across various poisoning ratios and model architectures for DiLM.</p>
Full article ">Figure 7
<p>(<b>a</b>) The <span class="html-italic">ASR</span> scores of BAMDD-NLP with DPC <math display="inline"><semantics> <mrow> <mo>∈</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>5</mn> <mo>,</mo> <mn>10</mn> <mo>,</mo> <mn>20</mn> <mo>,</mo> <mn>50</mn> <mo>,</mo> <mn>100</mn> <mo>,</mo> <mn>200</mn> <mo>}</mo> </mrow> </semantics></math> for DiLM and (<b>b</b>) the <span class="html-italic">CTA</span> scores of BAMDD-NLP with DPC <math display="inline"><semantics> <mrow> <mo>∈</mo> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>5</mn> <mo>,</mo> <mn>10</mn> <mo>,</mo> <mn>20</mn> <mo>,</mo> <mn>50</mn> <mo>,</mo> <mn>100</mn> <mo>,</mo> <mn>200</mn> <mo>}</mo> </mrow> </semantics></math> for DiLM.</p>
Full article ">
28 pages, 952 KiB  
Article
Using UMAP for Partially Synthetic Healthcare Tabular Data Generation and Validation
by Carla Lázaro and Cecilio Angulo
Sensors 2024, 24(23), 7843; https://doi.org/10.3390/s24237843 - 8 Dec 2024
Viewed by 517
Abstract
In healthcare, vast amounts of data are increasingly collected through sensors for smart health applications and patient monitoring or diagnosis. However, such medical data often comprise sensitive patient information, posing challenges regarding data privacy, and are resource-intensive to acquire for significant research purposes. [...] Read more.
In healthcare, vast amounts of data are increasingly collected through sensors for smart health applications and patient monitoring or diagnosis. However, such medical data often comprise sensitive patient information, posing challenges regarding data privacy, and are resource-intensive to acquire for significant research purposes. In addition, the common case of lack of information due to technical issues, transcript errors, or differences between descriptors considered in different health centers leads to the need for data imputation and partial data generation techniques. This study introduces a novel methodology for partially synthetic tabular data generation, designed to reduce the reliance on sensor measurements and ensure secure data exchange. Using the UMAP (Uniform Manifold Approximation and Projection) visualization algorithm to transform the original, high-dimensional reference data set into a reduced-dimensional space, we generate and validate synthetic values for incomplete data sets. This approach mitigates the need for extensive sensor readings while addressing data privacy concerns by generating realistic synthetic samples. The proposed method is validated on prostate and breast cancer data sets, showing its effectiveness in completing and augmenting incomplete data sets using fully available references. Furthermore, our results demonstrate superior performance in comparison to state-of-the-art imputation techniques. This work makes a dual contribution by not only proposing an innovative method for synthetic data generation, but also studying and establishing a formal framework to understand and solve synthetic data generation and imputation problems in sensor-driven environments. Full article
Show Figures

Figure 1

Figure 1
<p>PI-CAI and BC-MLR visualizations using different dimensionality reduction techniques.</p>
Full article ">Figure 2
<p>Representation of real (light gray) and artificial (dark gray) values for missing data (white) on fully synthetic, partially synthetic, and imputed data sets.</p>
Full article ">Figure 3
<p>Fully synthetic and imputed data generation from partially synthetic data generation (PSDG) methods. Real values (light gray), artificial values (dark gray), and missing data (white).</p>
Full article ">Figure 4
<p>Synthetic data generation framework proposal.</p>
Full article ">Figure 5
<p>Workflow options according to data privacy.</p>
Full article ">Figure 6
<p>Percentage of cluster-validated samples for different density thresholds.</p>
Full article ">Figure 7
<p>Cluster areas for density threshold equal to 10 (inner circle) and 8 (outer circle). Cluster centroid marked with a black ×.</p>
Full article ">Figure 8
<p>Percentage of samples by reliability score for different number of considered neighbors.</p>
Full article ">Figure 9
<p>UMAP projection and cluster areas for the reference data in the first iteration. Cluster centroid marked with a black ×.</p>
Full article ">Figure 10
<p>Real samples from the incomplete database (light gray) and correctly imputed samples for different reliability values.</p>
Full article ">Figure 11
<p>Number of synthetic samples per age.</p>
Full article ">Figure 12
<p>Imputation performance of mean and <span class="html-italic">k</span>-NN (<math display="inline"><semantics> <mrow> <mi>k</mi> <mo>=</mo> <mn>10</mn> </mrow> </semantics></math>) vs. proposed methodology.</p>
Full article ">
Back to TopTop