[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3194554.3194577acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

MC3A: Markov Chain Monte Carlo ManyCore Accelerator

Published: 30 May 2018 Publication History

Abstract

The paper presents "MC3A"- Markov Chain Monte Carlo Many Core Accelerator, a high-throughput, domain-specific, programmable manycore accelerator, which effectively generates samples from a provided target distribution. MCMC samplers are used in machine learning, image and signal processing applications that are computationally intensive. In such scenarios, high-throughput samplers are of paramount importance. To achieve a high-throughput platform, we add two domain-specific instructions with dedicated hardware whose functions are extensively used in MCMC algorithms. These instructions bring down the number of clock cycles needed to implement the respective functions by 10x and 21x. A 64-cluster architecture of the MC3A is fully placed and routed in 65 nm, TSMC CMOS technology, where the VLSI layout of each cluster occupies an area of 0.577 mm^2 while consuming a power of 247 mW running at 1 GHz clock frequency. Our proposed MC3A achieves 6x higher throughput than its equivalent predecessor (PENC) and consumes 4x lower energy per sample. Also, when compared to other off-the-shelf platforms, such as Jetson TX1 and TX2 SoC, MC3A results in 195x and 191x higher throughput and consumes 808x and 726x lower energy per sample generation, respectively.

References

[1]
T. Abtahi, A. Kulkarni, and T. Mohsenin. 2017. Accelerating convolutional neural network with FFT on tiny cores. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.
[2]
T. Abtahi, C. shea, A. Kulkarni, and T. Mohsenin. 2018. Accelerating Convolutional Neural Network with FFT on Embedded Hardware. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2018).
[3]
Narges Bani Asadi et al. 2008. Reconfigurable computing for learning Bayesian networks. In Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays. ACM, 203--211.
[4]
N. Attaran, A. Puranik, J. Brooks, and T. Mohsenin. 2018. Embedded Low-Power Processor for Personalized Stress Detection. IEEE Transactions on Circuits and Systems II: Express Briefs PP, 99 (2018), 1--1.
[5]
Ter Braak and Cajo JF. 2006. A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces. Statistics and Computing 16, 3 (2006), 239--249.
[6]
R. Gutierrez, V. Torres, and J. Valls. 2012. Hardware Architecture of a Gaussian Noise Generator Based on the Inversion Method. IEEE Transactions on Circuits and Systems-II 8 (2012), 501--505.
[7]
Morteza Hosseini et al. 2017. A Scalable FPGA-based Accelerator for HighThroughput MCMC Algorithms. In IEEE Symposium on Field- Programmable Custom Computing Machines (FCCM).
[8]
Z. Ji, Y. Xia, Q. Sun, Q. Chen, D. Xia, and D. D. Feng. 2012. Fuzzy Local Gaussian Mixture Model for Brain MR Image Segmentation. IEEE Transactions on Information Technology in Biomedicine 16, 3 (2012), 339--347.
[9]
A. Kulkarni, T. Abtahi, C. Shea, A. Kulkarni, and T. Mohsenin. 2017. PACENet: Energy efficient acceleration for convolutional network on embedded platform. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.
[10]
A. Kulkarni, T. Abtahi, E. Smith, and T. Mohsenin. 2016. Low Energy Sketching Engines on Many-Core Platform for Big Data Acceleration. In Proceedings of the 26th Edition on Great Lakes Symposium on VLSI (GLSVLSI '16). ACM, New York, NY, USA, 57--62.
[11]
A. Kulkarni, A. Page, N. Attaran, A. Jafari, M. Malik, H. Homayoun, and T. Mohsenin. 2017. An Energy-Efficient Programmable Manycore Accelerator for Personalized Biomedical Applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems PP, 99 (2017), 1--14.
[12]
Scott M. Lynch. 2007. Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. (2007), 107--130.
[13]
Alireza S. Mahani and Mansour T.A. Sharabiani. 2014. SIMD Parallel MCMC Sampling with Applications for Big-Data Bayesian Analytics. Computational Statistics and Data Analysis (2014), 1--41.
[14]
Lahir Marni, Morteza Hosseini, Hopp Jennifer, Mohseni Pedram, and Tinoosh Mohsenin. 2018. A Real-Time Wearable FPGA-based Seizure Detection Processor Using MCMC. In IEEE proceedings of International Symposium on Circuits and Systems (ISCAS).
[15]
G. Mingas and C. S. Bouganis. 2012. A Custom Precision Based Architecture for Accelerating Parallel Tempering MCMC on FPGAs without Introducing Sampling Error. In 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. 153--156.
[16]
Grigorios Mingas and Christos-Savvas Bouganis. 2012. Parallel tempering MCMC acceleration using reconfigurable hardware. In International Symposium on Applied Reconfigurable Computing. Springer, 227--238.
[17]
A. Page, N. Attaran, C. Shea, H. Homayoun, and T. Mohsenin. 2016. Low-Power Manycore Accelerator for Personalized Biomedical Applications. In Proceedings of the 26th Edition on Great Lakes Symposium on VLSI (GLSVLSI '16). ACM, New York, NY, USA, 63--68.
[18]
Grigorios Mingas Shuanglong Liu and Christos-Savvas Bouganis. 2016. An Unbiased MCMC FPGA-based Accelerator in the Land of Custom Precision Arithmetic. IEEE TRANSACTIONS ON COMPUTERS PP, 99 (2016), 1--1.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '18: Proceedings of the 2018 Great Lakes Symposium on VLSI
May 2018
533 pages
ISBN:9781450357241
DOI:10.1145/3194554
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. asic
  2. manycore accelerator
  3. mcmc
  4. metropolis-hastings (mh)
  5. pdf
  6. uniform random number generator
  7. vlsi

Qualifiers

  • Research-article

Conference

GLSVLSI '18
Sponsor:
GLSVLSI '18: Great Lakes Symposium on VLSI 2018
May 23 - 25, 2018
IL, Chicago, USA

Acceptance Rates

GLSVLSI '18 Paper Acceptance Rate 48 of 197 submissions, 24%;
Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)4
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media