Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
DeepClair: Utilizing Market Forecasts for Effective Portfolio Selection
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 4414–4422https://doi.org/10.1145/3627673.3680008Utilizing market forecasts is pivotal in optimizing portfolio selection strategies. We introduce DeepClair, a novel framework for portfolio selection. DeepClair leverages a transformer-based time-series forecasting model to predict market trends, ...
- research-articleOctober 2024
GraNNDis: Fast Distributed Graph Neural Network Training Framework for Multi-Server Clusters
PACT '24: Proceedings of the 2024 International Conference on Parallel Architectures and Compilation TechniquesPages 91–107https://doi.org/10.1145/3656019.3676892Graph neural networks (GNNs) are one of the rapidly growing fields within deep learning. While many distributed GNN training frameworks have been proposed to increase the training throughput, they face three limitations when applied to multi-server ...
- research-articleOctober 2024
Bandit-NAS: Bandit sampling and training method for Neural Architecture Search
AbstractExisting Neural Architecture Search algorithms achieve a low error rate in vision tasks, such as image classification, by training child networks with equal resources during the search. However, it is unnecessary to allocate equal resources or ...
- research-articleJanuary 2025
DataFreeShield: defending adversarial attacks without training data
- Hyeyoon Lee,
- Kanghyun Choi,
- Dain Kwon,
- Sunjong Park,
- Mayoore Selvarasa Jaiswal,
- Noseong Park,
- Jonghyun Choi,
- Jinho Lee
ICML'24: Proceedings of the 41st International Conference on Machine LearningArticle No.: 1057, Pages 26515–26545Recent advances in adversarial robustness rely on an abundant set of training data, where using external or additional datasets has become a common setting. However, in real life, the training data is often kept private for security and privacy issues, ...
- research-articleFebruary 2024
AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingPages 431–444https://doi.org/10.1145/3627535.3638474With the advance in genome sequencing technology, the lengths of deoxyribonucleic acid (DNA) sequencing results are rapidly increasing at lower prices than ever. However, the longer lengths come at the cost of a heavy computational burden on aligning ...
-
- research-articleMarch 2024
A Case for In-Memory Random Scatter-Gather for Fast Graph Processing
IEEE Computer Architecture Letters (ICAL), Volume 23, Issue 1Pages 73–77https://doi.org/10.1109/LCA.2024.3376680Because of the widely recognized memory wall issue, modern DRAMs are increasingly being assigned innovative functionalities beyond the basic read and write operations. Often referred to as “function-in-memory”, these techniques are crafted ...
- research-articleNovember 2023
SandGAN: Style-Mix Assisted Noise Distortion for Imbalanced Conditional Image Synthesis
AbstractConditional Generative Adversarial Networks (CGANs) are well developed on balanced datasets as default standards for generating high-quality images of expected classes. However, the common problem in practice is that training datasets are ...
- research-articleSeptember 2023
Artistic Line Drawing Rendering With Priors of Depth and Edge Density
Line drawing is a form of painting that uses lines as expressive elements and often employs the combination of abstraction and figuration (CAF) technique to enhance artistic expression. However, existing methods tend to focus on generating semantically ...
- research-articleAugust 2023
Enabling Fine-Grained Spatial Multitasking on Systolic-Array NPUs Using Dataflow Mirroring
IEEE Transactions on Computers (ITCO), Volume 72, Issue 12Pages 3383–3398https://doi.org/10.1109/TC.2023.3299030Neural Processing Units (NPUs) frequently suffer from low hardware utilization as the efficiency of their systolic arrays heavily depends on the characteristics of a deep neural network (DNN). Spatial multitasking is a promising solution to overcome the ...
- research-articleJune 2023
Design and Analysis of a Processing-in-DIMM Join Algorithm: A Case Study with UPMEM DIMMs
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 113, Pages 1–27https://doi.org/10.1145/3589258Modern dual in-line memory modules (DIMMs) support processing-in-memory (PIM) by implementing in-DIMM processors (IDPs) located near memory banks. PIM can greatly accelerate in-memory join, whose performance is frequently bounded by main-memory accesses, ...
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 560–573https://doi.org/10.1145/3575693.3575712In training of modern large natural language processing (NLP) models, it has become a common practice to split models using 3D parallelism to multiple GPUs. Such technique, however, suffers from a high overhead of inter-node communication. Compressing ...
- research-articleOctober 2022
GuardiaNN: Fast and Secure On-Device Inference in TrustZone Using Embedded SRAM and Cryptographic Hardware
Middleware '22: Proceedings of the 23rd ACM/IFIP International Middleware ConferencePages 15–28https://doi.org/10.1145/3528535.3531513As more and more mobile/embedded applications employ Deep Neural Networks (DNNs) involving sensitive user data, mobile/embedded devices must provide a highly secure DNN execution environment to prevent privacy leaks. Aimed at securing DNN data, recent ...
- research-articleJanuary 2023Best Paper
Slice-and-Forge: Making Better Use of Caches for Graph Convolutional Network Accelerators
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 40–53https://doi.org/10.1145/3559009.3569693Graph convolutional networks (GCNs) are becoming increasingly popular as they can process a wide variety of data formats that prior deep neural networks cannot easily support. One key challenge in designing hardware accelerators for GCNs is the vast ...
Decoupling Schedule, Topology Layout, and Algorithm to Easily Enlarge the Tuning Space of GPU Graph Processing
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 198–210https://doi.org/10.1145/3559009.3569686Only with a right schedule and a right topology layout, a graph algorithm can be efficiently processed on GPUs. Existing GPU graph processing frameworks try to find an optimal schedule and topology layout for an algorithm via iterative search, but they ...
- research-articleAugust 2022
Enabling hard constraints in differentiable neural network and accelerator co-exploration
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation ConferencePages 589–594https://doi.org/10.1145/3489517.3530507Co-exploration of an optimal neural architecture and its hardware accelerator is an approach of rising interest which addresses the computational cost problem, especially in low-profile systems. The large co-exploration space is often handled by adopting ...
- research-articleJune 2022
GCoM: a detailed GPU core model for accurate analytical modeling of modern GPUs
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitecturePages 424–436https://doi.org/10.1145/3470496.3527384Analytical models can greatly help computer architects perform orders of magnitude faster early-stage design space exploration than using cycle-level simulators. To facilitate rapid design space exploration for graphics processing units (GPUs), prior ...
- research-articleMay 2022
GraphWave: a highly-parallel compute-at-memory graph processing accelerator
DATE '22: Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in EuropePages 256–261The fast, efficient processing of graphs is needed to quickly analyze and understand connected data, from large social network graphs, to edge devices performing timely, local data analytics. But, as graph data tends to exhibit poor locality, designing ...
- research-articleJune 2024
Qimera: data-free quantization with synthetic boundary supporting samples
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing SystemsArticle No.: 1137, Pages 14835–14847Model quantization is known as a promising method to compress deep neural networks, especially for inferences on lightweight mobile or edge devices. However, model quantization usually requires access to the original training data to maintain the ...
- research-articleDecember 2021
Dataflow Mirroring: Architectural Support for Highly Efficient Fine-Grained Spatial Multitasking on Systolic-Array NPUs
2021 58th ACM/IEEE Design Automation Conference (DAC)Pages 247–252https://doi.org/10.1109/DAC18074.2021.9586312We present dataflow mirroring, architectural support for low-overhead fine-grained systolic array allocation which overcomes the limitations of prior coarse-grained spatial-multitasking Neural Processing Unit (NPU) architectures. The key idea of dataflow ...
- research-articleDecember 2021
Ultra-Fast CGRA Scheduling to Enable Run Time, Programmable CGRAs
2021 58th ACM/IEEE Design Automation Conference (DAC)Pages 1207–1212https://doi.org/10.1109/DAC18074.2021.9586255Coarse-Grained Reconfigurable Arrays (CGRAs) can offer both energy-efficiency and high-throughput for embedded systems today. But, one limitation of CGRAs is the extremely long mapping time that can take many hours to complete for a typical workload. This ...