: Toward Heterogeneous Federated Learning via Global Knowledge Distillation<sc/>
Federated learning, as one enabling technology of edge intelligence, has gained substantial attention due to its efficacy in training deep learning models without data privacy and network bandwidth concerns. However, due to the heterogeneity of the edge ...
Revisit and Benchmarking of Automated Quantization Toward Fair Comparison
Automated quantization has emerged as an entirely new design paradigm to automate the optimal configuration of bitwidth for deep neural networks (DNNs), making the DNN more memory-efficient and faster to execute on hardware with limited resources. ...
ASHL: An Adaptive Multi-Stage Distributed Deep Learning Training Scheme for Heterogeneous Environments
- Zhaoyan Shen,
- Qingxiang Tang,
- Tianren Zhou,
- Yuhao Zhang,
- Zhiping Jia,
- Dongxiao Yu,
- Zhiyong Zhang,
- Bingzhe Li
With the increment of data sets and models sizes, distributed deep learning has been proposed to accelerate training and improve the accuracy of DNN models. The parameter server framework is a popular collaborative architecture for data-parallel training, ...
Toward an SGX-Friendly Java Runtime
Hardware enclaves assist in constructing a trusted execution environment (TEE) to store private code and data and thus become an appealing solution to enhance applications’ security. Nevertheless, state-of-the-art enclave implementations like Intel ...
A Secure and Robust Knowledge Transfer Framework via Stratified-Causality Distribution Adjustment in Intelligent Collaborative Services
The rapid development of device-edge-cloud collaborative computing techniques has actively contributed to the popularization and application of intelligent service models. The intensity of knowledge transfer plays a vital role in enhancing the performance ...
Applying Delta Compression to Packed Datasets for Efficient Data Reduction
Backup systems often adopt deduplication techniques for data reduction. Real-world backup products often group files into larger units (called packed files) before deduplicating them. The grouping entails inserting metadata immediately before the contents ...
General Bootstrapping Approach for RLWE-Based Homomorphic Encryption
Homomorphic Encryption (HE) makes it possible to compute on encrypted data without decryption. In lattice-based HE, a ciphertext contains noise, which accumulates along with homomorphic computations. Bootstrapping refreshes the noise and it is possible to ...
Split-Radix Based Compact Hardware Architecture for CRYSTALS-Kyber
Facing the threat of large-scale quantum computers to traditional public-key cryptography, the National Institute of Standards and Technology has conducted Post-Quantum Cryptography algorithms evaluation for a long time, and CRYSTALS-Kyber has been ...
Computation Off-Loading in Resource-Constrained Edge Computing Systems Based on Deep Reinforcement Learning
Edge computing is a computational paradigm that brings resources closer to the network edge, such as base stations or gateways, in order to provide quick and efficient computing services for mobile devices while relieving pressure on the core network. ...
A Reliability-Critical Path Identifying Method With Local and Global Adjacency Probability Matrix in Combinational Circuits
Accurate and efficient identification of reliability-critical paths (RCPs) not only facilitates fault localization and troubleshooting but also allows circuit designers to improve circuit reliability at a low cost. This article proposes a local and global ...
Enabling HW-Based Task Scheduling in Large Multicore Architectures
- Lucas Morais,
- Carlos Álvarez,
- Daniel Jiménez-González,
- Juan Miguel de Haro,
- Guido Araujo,
- Michael Frank,
- Alfredo Goldman,
- Xavier Martorell
Dynamic Task Scheduling is an enticing programming model aiming to ease the development of parallel programs with intrinsically irregular or data-dependent parallelism. The performance of such solutions relies on the ability of the Task Scheduling HW/SW ...
Bit-Balance: Model-Hardware Codesign for Accelerating NNs by Exploiting Bit-Level Sparsity
Bit-serial architectures can handle Neural Networks (NNs) with different weight precision, achieving higher resource efficiency compared with bit-parallel architectures. Besides, the weights contain abundant zero bits owing to the fault tolerance of NNs, ...
An Efficient Deep Reinforcement Learning-Based Automatic Cache Replacement Policy in Cloud Block Storage Systems
With the popularity of cloud services, cloud block storage (CBS) systems have been widely deployed by cloud providers. Cloud cache plays a vital role in maintaining high and stable performance in cloud block storage systems. In the past few decades, much ...
An Efficient and Secure Data Sharing Scheme for Edge-Enabled IoT
Sharing the big data generated by IoT via cloud is slow and expensive. Besides, transmitting and sharing data among IoT devices via cloud may be insecure. To address these issues, a novel efficient and secure data sharing scheme termed EB-SDSS (Edge ...
Hybrid Edge-Cloud Collaborator Resource Scheduling Approach Based on Deep Reinforcement Learning and Multiobjective Optimization
Collaborative resource scheduling between edge terminals and cloud centers is regarded as a promising means of effectively completing computing tasks and enhancing quality of service. In this paper, to further improve the achievable performance, the edge ...
SplitDB: Closing the Performance Gap for LSM-Tree-Based Key-Value Stores
Log Structured Merge Tree (LSM tree) serves as the core data storage engine in modern key-value stores. Its adoption is rapidly accelerated with cloud computing and data center development. Acknowledging its widespread use, the LSM tree still faces severe ...
Cyclebite: Extracting Task Graphs From Unstructured Compute-Programs
Extracting portable performance in an application requires structuring that program into a data-flow graph of coarse-grained tasks (CGTs). Structuring applications that interconnect multiple external libraries and custom code (i.e., “Code From The ...
SafeDRL: Dynamic Microservice Provisioning With Reliability and Latency Guarantees in Edge Environments
As a key technology of 5G, network function virtualization enables each monolithic service to be divided into microservices, facilitating their deployment and management in edge environments. One of the most critical issues in 5G is how to support ...
Zero and Narrow-Width Value-Aware Compression for Quantized Convolutional Neural Networks
Convolutional neural networks are normally used in systems with dedicated neural processing units for CNN-related computations. For high performance and low hardware overheads, CNN datatype quantization is applied. As an additional optimization, to ...
A High-Performance, Energy-Efficient Modular DMA Engine Architecture
- Thomas Benz,
- Michael Rogenmoser,
- Paul Scheffler,
- Samuel Riedel,
- Alessandro Ottaviano,
- Andreas Kurth,
- Torsten Hoefler,
- Luca Benini
Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAES) are critically needed to transfer data independently of the ...
Correct-by-Construction Design of Custom Accelerator Microarchitectures
Modern application-specific System-on-Chip designs include a variety of accelerator blocks that customize microcontrollers with domain-specific instruction sets and optimized microarchitectures. Unfortunately, accelerator implementations can be highly ...
Unified Digit Selection for Radix-4 Recurrence Division and Square Root
Division and square root are fundamental operations required by most computer systems. They are commonly implemented in hardware using radix-4 recurrence, which produces a 2-bit result digit on each step. Unified digit selection logic chooses the next ...