More Web Proxy on the site http://driver.im/

research-article

MC3A: Markov Chain Monte Carlo ManyCore Accelerator

Authors:

Morteza Hosseini,

Tinoosh MohseninAuthors Info & Claims

GLSVLSI '18: Proceedings of the 2018 Great Lakes Symposium on VLSI

Pages 165 - 170

https://doi.org/10.1145/3194554.3194577

Published: 30 May 2018 Publication History

Abstract

The paper presents "MC3A"- Markov Chain Monte Carlo Many Core Accelerator, a high-throughput, domain-specific, programmable manycore accelerator, which effectively generates samples from a provided target distribution. MCMC samplers are used in machine learning, image and signal processing applications that are computationally intensive. In such scenarios, high-throughput samplers are of paramount importance. To achieve a high-throughput platform, we add two domain-specific instructions with dedicated hardware whose functions are extensively used in MCMC algorithms. These instructions bring down the number of clock cycles needed to implement the respective functions by 10x and 21x. A 64-cluster architecture of the MC3A is fully placed and routed in 65 nm, TSMC CMOS technology, where the VLSI layout of each cluster occupies an area of 0.577 mm^2 while consuming a power of 247 mW running at 1 GHz clock frequency. Our proposed MC3A achieves 6x higher throughput than its equivalent predecessor (PENC) and consumes 4x lower energy per sample. Also, when compared to other off-the-shelf platforms, such as Jetson TX1 and TX2 SoC, MC3A results in 195x and 191x higher throughput and consumes 808x and 726x lower energy per sample generation, respectively.

References

[1]

T. Abtahi, A. Kulkarni, and T. Mohsenin. 2017. Accelerating convolutional neural network with FFT on tiny cores. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.

[2]

T. Abtahi, C. shea, A. Kulkarni, and T. Mohsenin. 2018. Accelerating Convolutional Neural Network with FFT on Embedded Hardware. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2018).

[3]

Narges Bani Asadi et al. 2008. Reconfigurable computing for learning Bayesian networks. In Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays. ACM, 203--211.

Digital Library

[4]

N. Attaran, A. Puranik, J. Brooks, and T. Mohsenin. 2018. Embedded Low-Power Processor for Personalized Stress Detection. IEEE Transactions on Circuits and Systems II: Express Briefs PP, 99 (2018), 1--1.

[5]

Ter Braak and Cajo JF. 2006. A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces. Statistics and Computing 16, 3 (2006), 239--249.

Digital Library

[6]

R. Gutierrez, V. Torres, and J. Valls. 2012. Hardware Architecture of a Gaussian Noise Generator Based on the Inversion Method. IEEE Transactions on Circuits and Systems-II 8 (2012), 501--505.

[7]

Morteza Hosseini et al. 2017. A Scalable FPGA-based Accelerator for HighThroughput MCMC Algorithms. In IEEE Symposium on Field- Programmable Custom Computing Machines (FCCM).

[8]

Z. Ji, Y. Xia, Q. Sun, Q. Chen, D. Xia, and D. D. Feng. 2012. Fuzzy Local Gaussian Mixture Model for Brain MR Image Segmentation. IEEE Transactions on Information Technology in Biomedicine 16, 3 (2012), 339--347.

Digital Library

[9]

A. Kulkarni, T. Abtahi, C. Shea, A. Kulkarni, and T. Mohsenin. 2017. PACENet: Energy efficient acceleration for convolutional network on embedded platform. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.

[10]

A. Kulkarni, T. Abtahi, E. Smith, and T. Mohsenin. 2016. Low Energy Sketching Engines on Many-Core Platform for Big Data Acceleration. In Proceedings of the 26th Edition on Great Lakes Symposium on VLSI (GLSVLSI '16). ACM, New York, NY, USA, 57--62.

Digital Library

[11]

A. Kulkarni, A. Page, N. Attaran, A. Jafari, M. Malik, H. Homayoun, and T. Mohsenin. 2017. An Energy-Efficient Programmable Manycore Accelerator for Personalized Biomedical Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems PP, 99 (2017), 1--14.

[12]

Scott M. Lynch. 2007. Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. (2007), 107--130.

[13]

Alireza S. Mahani and Mansour T.A. Sharabiani. 2014. SIMD Parallel MCMC Sampling with Applications for Big-Data Bayesian Analytics. Computational Statistics and Data Analysis (2014), 1--41.

Digital Library

[14]

Lahir Marni, Morteza Hosseini, Hopp Jennifer, Mohseni Pedram, and Tinoosh Mohsenin. 2018. A Real-Time Wearable FPGA-based Seizure Detection Processor Using MCMC. In IEEE proceedings of International Symposium on Circuits and Systems (ISCAS).

[15]

G. Mingas and C. S. Bouganis. 2012. A Custom Precision Based Architecture for Accelerating Parallel Tempering MCMC on FPGAs without Introducing Sampling Error. In 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. 153--156.

Digital Library

[16]

Grigorios Mingas and Christos-Savvas Bouganis. 2012. Parallel tempering MCMC acceleration using reconfigurable hardware. In International Symposium on Applied Reconfigurable Computing. Springer, 227--238.

Digital Library

[17]

A. Page, N. Attaran, C. Shea, H. Homayoun, and T. Mohsenin. 2016. Low-Power Manycore Accelerator for Personalized Biomedical Applications. In Proceedings of the 26th Edition on Great Lakes Symposium on VLSI (GLSVLSI '16). ACM, New York, NY, USA, 63--68.

Digital Library

[18]

Grigorios Mingas Shuanglong Liu and Christos-Savvas Bouganis. 2016. An Unbiased MCMC FPGA-based Accelerator in the Land of Custom Precision Arithmetic. IEEE TRANSACTIONS ON COMPUTERS PP, 99 (2016), 1--1.

Cited By

Ni YDeng YLi S(2021)PMBA: A Parallel MCMC Bayesian Computing AcceleratorIEEE Access10.1109/ACCESS.2021.30762079(65536-65546)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3076207

Index Terms

MC3A: Markov Chain Monte Carlo ManyCore Accelerator

Recommendations

A Hardware/Software Co-Design of MP3 Audio Decoder

The Moving Picture Experts Group (MPEG) audio coding standard offers three levels of compression algorithms where the MPEG Layer III (MP3) has the best quality but with the most complexity. There are several complex coding techniques involved in MP3 ...
A 3.84 gbits/s AES crypto coprocessor with modes of operation in a 0.18-μm CMOS technology
GLSVLSI '05: Proceedings of the 15th ACM Great Lakes symposium on VLSI

In this paper an AES crypto coprocessor that is fabricated using a 0.18-μm CMOS technology is presented. This crypto coprocessor performs the AES-128 encryption in both feedback and non-feedback modes of operation. A maximum throughput of 3.84 Gbits/s ...
High-Performance Hardware Architectures for Galois Counter Mode

Various high-performance hardware architectures for Galois Counter Mode (GCM) in conjunction with various Advanced Encryption Standard (AES) circuits and multiplier-adders are proposed. A total of 17 GCM-AES circuits were synthesized by using a 130-nm ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '18: Proceedings of the 2018 Great Lakes Symposium on VLSI

May 2018

533 pages

ISBN:9781450357241

DOI:10.1145/3194554

General Chair:
Deming Chen
University of Illinois, USA
,
Program Chairs:
Houman Homayoun
George Mason University, USA
,
Baris Taskin
Drexel University, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CEDA
IEEE CASS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GLSVLSI '18

Sponsor:

SIGDA

GLSVLSI '18: Great Lakes Symposium on VLSI 2018

May 23 - 25, 2018

IL, Chicago, USA

Acceptance Rates

GLSVLSI '18 Paper Acceptance Rate 48 of 197 submissions, 24%;

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
116
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)4

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ni YDeng YLi S(2021)PMBA: A Parallel MCMC Bayesian Computing AcceleratorIEEE Access10.1109/ACCESS.2021.30762079(65536-65546)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3076207

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents