Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Distributed Training of Large Language Models on AWS Trainium
- Xinwei Fu,
- Zhen Zhang,
- Haozheng Fan,
- Guangtai Huang,
- Mohammad El-Shabani,
- Randy Huang,
- Rahul Solanki,
- Fei Wu,
- Ron Diamant,
- Yida Wang
SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud ComputingPages 961–976https://doi.org/10.1145/3698038.3698535Large language models (LLMs) are ubiquitously powerful but prohibitively expensive to train, often requiring thousands of compute devices, typically GPUs. To reduce the cost of training LLMs for customers, Amazon Web Services (AWS) launched the Amazon ...
Counterfactual Explanation Analytics: Empowering Lay Users to Take Action Against Consequential Automated Decisions
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4349–4352https://doi.org/10.14778/3685800.3685872Machine learning is routinely used to automate consequential decisions about users in domains such as finance and healthcare, raising concerns of transparency and recourse for negative outcomes. Existing Explainable AI techniques generate a static ...
- research-articleFebruary 2017
Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
- Eriko Nurvitadhi,
- Ganesh Venkatesh,
- Jaewoong Sim,
- Debbie Marr,
- Randy Huang,
- Jason Ong Gee Hock,
- Yeong Tat Liew,
- Krishnan Srivatsan,
- Duncan Moss,
- Suchit Subhaschandra,
- Guy Boudoukh
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysPages 5–14https://doi.org/10.1145/3020078.3021740Current-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular parallelism, high TFLOP/s). Because of this, GPUs are widely used for ...
- invited-talkAugust 2016
Dissecting Xeon + FPGA: Why the integration of CPUs and FPGAs makes a power difference for the datacenter: Invited Paper
ISLPED '16: Proceedings of the 2016 International Symposium on Low Power Electronics and DesignPages 152–153https://doi.org/10.1145/2934583.2953983Intel's Xeon roadmap includes package-integrated FPGAs in every new generation. In this talk, we will dissect why this is such a powerful combination at this time of great change in datacenter workloads. We will show how power savings within the CPU ...