[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3123939.3124544acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Public Access

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology

Published: 14 October 2017 Publication History

Abstract

Many important applications trigger bulk bitwise operations, i.e., bitwise operations on large bit vectors. In fact, recent works design techniques that exploit fast bulk bitwise operations to accelerate databases (bitmap indices, BitWeaving) and web search (BitFunnel). Unfortunately, in existing architectures, the throughput of bulk bitwise operations is limited by the memory bandwidth available to the processing unit (e.g., CPU, GPU, FPGA, processing-in-memory).
To overcome this bottleneck, we propose Ambit, an Accelerator-in-Memory for bulk bitwise operations. Unlike prior works, Ambit exploits the analog operation of DRAM technology to perform bitwise operations completely inside DRAM, thereby exploiting the full internal DRAM bandwidth. Ambit consists of two components. First, simultaneous activation of three DRAM rows that share the same set of sense amplifiers enables the system to perform bitwise AND and OR operations. Second, with modest changes to the sense amplifier, the system can use the inverters present inside the sense amplifier to perform bitwise NOT operations. With these two components, Ambit can perform any bulk bitwise operation efficiently inside DRAM. Ambit largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area). Importantly, Ambit uses the modern DRAM interface without any changes, and therefore it can be directly plugged onto the memory bus.
Our extensive circuit simulations show that Ambit works as expected even in the presence of significant process variation. Averaged across seven bulk bitwise operations, Ambit improves performance by 32X and reduces energy consumption by 35X compared to state-of-the-art systems. When integrated with Hybrid Memory Cube (HMC), a 3D-stacked DRAM with a logic layer, Ambit improves performance of bulk bitwise operations by 9.7X compared to processing in the logic layer of the HMC. Ambit improves the performance of three real-world data-intensive applications, 1) database bitmap indices, 2) BitWeaving, a technique to accelerate database scans, and 3) bit-vector-based implementation of sets, by 3X-7X compared to a state-of-the-art baseline using SIMD optimizations. We describe four other applications that can benefit from Ambit, including a recent technique proposed to speed up web search. We believe that large performance and energy improvements provided by Ambit can enable other applications to use bulk bitwise operations.

References

[1]
Belly Card Engineering. https://tech.bellycard.com/.
[2]
bitmapist. https://github.com/Doist/bitmapist.
[3]
FastBit: An Efficient Compressed Bitmap Index Technology. https://sdm.lbl.gov/fastbit/.
[4]
GeForce GTX 745. http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-745-oem/specifications.
[5]
High Bandwidth Memory DRAM. http://www.jedec.org/standards-documents/docs/jesd235.
[6]
Hybrid Memory Cube Specification 2.0. http://www.hybridmemorycube.org/files/SiteDownloads/HMC-30G-VSR_HMCC_Specification_Rev2.0_Public.pdf.
[7]
6th Generation Intel Core Processor Family Datasheet. http://www.intel.com/content/www/us/en/processors/core/desktop-6th-gen-core-family-datasheet-vol-1.html.
[8]
Using Bitmap Indexes in Data Warehouses. https://docs.oracle.com/cd/B28359_01/server.111/b28313/indexes.htm.
[9]
Predictive Technology Model. http://ptm.asu.edu/.
[10]
Redis - bitmaps. http://redis.io/topics/data-types-intro.
[11]
rlite. https://github.com/seppo0010/rlite.
[12]
Spool. http://www.getspool.com/.
[13]
std::set, std::bitset. http://en.cppreference.com/w/cpp/.
[14]
DRAM Power Model. https://www.rambus.com/energy/, 2010.
[15]
S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy D. Blaauw, and R. Das. Compute Caches. In HPCA, 2017.
[16]
J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. In ISCA, 2015.
[17]
J. Ahn, S. Yoo, O. Mutlu, and K. Choi. PIM-enabled Instructions: A Low-overhead, Locality-aware Processing-in-memory Architecture. In ISCA, 2015.
[18]
A. Akerib, O. Agam, E. Ehrman, and M. Meyassed. Using Storage Cells to Perform Computation. US Patent 8908465, 2014.
[19]
A. Akerib and E. Ehrman. In-memory Computational Device. US Patent 9653166, 2015.
[20]
M. Alser, H. Hassan, H. Xin, O. Ergin, O. Mutlu, and C. Alkan. GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping. Bioinformatics, 2017.
[21]
G. Benson, Y. Hernandez, and J. Loving. A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm. In CPM, 2013.
[22]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The Gem5 Simulator. SIGARCH CAN, 2011.
[23]
B. H. Bloom. Space/time Trade-offs in Hash Coding with Allowable Errors. ACM Communications, 13, July 1970.
[24]
A. Boroumand, S. Ghose, B. Lucia, K. Hsieh, K. Malladi, H. Zheng, and O. Mutlu. LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory. IEEE CAL, 2017.
[25]
A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, N. Hajinazar, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu. LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures. arXiv preprint arXiv:1706.03162, 2017.
[26]
C.-Y. Chan and Y. E. Ioannidis. Bitmap Index Design and Evaluation. In SIGMOD, 1998.
[27]
K. K. Chang, D. Lee, Z. Chisti, A. R. Alameldeen, C. Wilkerson, Y. Kim, and O. Mutlu. Improving DRAM Performance by Parallelizing Refreshes with Accesses. In HPCA, 2014.
[28]
K. K. Chang, A. Kashyap, H. Hassan, S. Ghose, K. Hsieh, D. Lee, T. Li, G. Pekhimenko, S. Khan, and O. Mutlu. Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization. In SIGMETRICS, 2016.
[29]
K. K. Chang, P. J. Nair, D. Lee, S. Ghose, M. K. Qureshi, and O. Mutlu. Low-cost Inter-linked Subarrays (LISA): Enabling Fast Inter-subarray Data Movement in DRAM. In HPCA, 2016.
[30]
K. K. Chang, A. G. Yaălikçi, S. Ghose, A. Agrawal, N. Chatterjee, A. Kashyap, D. Lee, M. O'Connor, H. Hassan, and O. Mutlu. Understanding Reduced-voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms. SIGMETRICS, 2017.
[31]
J. Corbet, A. Rubini, and G. Kroah-Hartman. Linux Device Drivers, page 445. O'Reilly Media, 2005.
[32]
D. Denir, I. AbdelRahman, L. He, and Y. Gao. Audience Insights Query Engine. https://www.facebook.com/business/news/audience-insights.
[33]
P. Dlugosch, D. Brown, P. Glendenning, M. Leventhal, and H. Noyes. An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing. IEEE TPDS, 2014.
[34]
J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, and G. Daglikoca. The Architecture of the DIVA Processing-in-memory Chip. In ICS, 2002.
[35]
D. Elliott, M. Stumm, W. M. Snelgrove, C. Cojocaru, and R. McKenzie. Computational RAM: Implementing Processors in Memory. IEEE DT, 1999.
[36]
C. F. Falconer, C. P. Mozak, and A. J. Normal. Suppressing Power Supply Noise Using Data Scrambling in Double Data Rate Memory Systems. US Patent 8503678, 2009.
[37]
A. Farmahini-Farahani, J. H. Ahn, K. Morrow, and N. S. Kim. NDA: Near-DRAM Acceleration Architecture Leveraging Commodity DRAM Devices and Standard Memory Modules. In HPCA, 2015.
[38]
B. B. Fraguela, J. Renau, P. Feautrier, D. Padua, and J. Torrellas. Programming the FlexRAM Parallel Intelligent Memory System. In PPoPP, 2003.
[39]
M. Gokhale, B. Holmes, and K. Iobst. Processing in Memory: The Terasys Massively Parallel PIM Array. Computer, 1995.
[40]
B. Goodwin, M. Hopcroft, D. Luu, A. Clemmer, M. Curmei, S. Elnikety, and Y. He. BitFunnel: Revisiting Signatures for Search. In SIGIR, 2017.
[41]
L. J. Guibas and R. Sedgewick. A Dichromatic Framework for Balanced Trees. In SFCS, 1978.
[42]
Q. Guo, N. Alachiotis, B. Akin, F. Sadi, G. Xu, T. M. Low, L. Pileggi, J. C. Hoe, and F. Franchetti. 3D-stacked Memory-side Acceleration: Accelerator and System Design. In WoNDP, 2013.
[43]
R. W. Hamming. Error Detecting and Error Correcting Codes. BSTJ, 1950.
[44]
J.-W. Han, C.-S. Park, D.-H. Ryu, and E.-S. Kim. Optical Image Encryption Based on XOR Operations. SPIE OE, 1999.
[45]
H. Hassan, G. Pekhimenko, N. Vijaykumar, V. Seshadri, D. Lee, O. Ergin, and O. Mutlu. ChargeCache: Reducing DRAM Latency by Exploiting Row Access Locality. In HPCA, 2016.
[46]
H. Hassan, N. Vijaykumar, S. Khan, S. Ghose, K. Chang, G. Pekhimenko, D. Lee, O. Ergin, and O. Mutlu. SoftMC: A Flexible and Practical Open-source Infrastructure for Enabling Experimental DRAM Studies. In HPCA, 2017.
[47]
K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. O'Connor, N. Vijaykumar, O. Mutlu, and S. W. Keckler. Transparent Offloading and Mapping (TOM): Enabling Programmer-transparent Near-data Processing in GPU Systems. In ISCA, 2016.
[48]
K. Hsieh, S. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu. Accelerating Pointer Chasing in 3D-stacked Memory: Challenges, Mechanisms, Evaluation. In ICCD, 2016.
[49]
Intel. Intel Instruction Set Architecture Extensions. https://software.intel.com/en-us/intel-isa-extensions.
[50]
K. Itoh. VLSI Memory Chip Design, volume 5. Springer Science & Business Media, 2013.
[51]
J. Jeddeloh and B. Keeth. Hybrid Memory Cube: New DRAM Architecture Increases Density and Performance. In VLSIT, 2012.
[52]
JEDEC. DDR3 SDRAM Standard, JESD79-3D. http://www.jedec.org/sites/default/files/docs/JESD79-3D.pdf, 2009.
[53]
H. Kang and S. Hong. One-Transistor Type DRAM. US Patent 7701751, 2009.
[54]
M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, and K. Curewitz. An Energy-efficient VLSI Architecture for Pattern Recognition via Deep Embedding of Computation in SRAM. In ICASSP, 2014.
[55]
U. Kang, H.-s. Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, and J. S. Choi. Co-architecting Controllers and DRAM to Enhance DRAM Process Scaling. In The Memory Forum, 2014.
[56]
Y. Kang, W. Huang, S.-M. Yoo, D. Keen, Z. Ge, V. Lam, P. Pattnaik, and J. Torrellas. FlexRAM: Toward an Advanced Intelligent Memory System. In ICCD, 1999.
[57]
B. Keeth, R. J. Baker, B. Johnson, and F. Lin. DRAM Circuit Design: Fundamental and High-Speed Topics. Wiley-IEEE Press, 2007.
[58]
J. S. Kim, D. Senol, H. Xin, D. Lee, S. Ghose, M. Alser, H. Hassan, O. Ergin, C. Alkan, and O. Mutlu. GRIM-filter: Fast Seed Filtering in Read Mapping Using Emerging Memory Technologies. arXiv preprint arXiv:1708.04329, 2017.
[59]
Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu. A Case for Exploiting Subarray-level Parallelism (SALP) in DRAM. In ISCA, 2012.
[60]
Y. Kim, W. Yang, and O. Mutlu. Ramulator: A Fast and Extensible DRAM Simulator. IEEE CAL, 2016.
[61]
D. E. Knuth. The Art of Computer Programming. Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams, 2009.
[62]
P. M. Kogge. EXECUBE: A New Architecture for Scaleable MPPs. In ICPP, 1994.
[63]
S. Kvatinsky, A. Kolodny, U. C. Weiser, and E. G. Friedman. Memristor-based IMPLY Logic Design Procedure. In ICCD, 2011.
[64]
S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser. MAGIC ---Memristor-Aided Logic. IEEE TCAS II: Express Briefs, 2014.
[65]
S. Kvatinsky, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser. Memristor-Based Material Implication (IMPLY) Logic: Design Principles and Methodologies. IEEE TVLSI, 2014.
[66]
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. Architecting Phase Change Memory As a Scalable DRAM Alternative. In ISCA, 2009.
[67]
D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. K. Chang, and O. Mutlu. Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-case. In HPCA, 2015.
[68]
D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu. Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture. In HPCA, 2013.
[69]
D. Lee, F. Hormozdiari, H. Xin, F. Hach, O. Mutlu, and C. Alkan. Fast and Accurate Mapping of Complete Genomics Reads. Methods, 2015.
[70]
D. Lee, S. Ghose, G. Pekhimenko, S. Khan, and O. Mutlu. Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost. ACM TACO, 2016.
[71]
D. Lee, S. Khan, L. Subramanian, S. Ghose, R. Ausavarungnirun, G. Pekhimenko, V. Seshadri, and O. Mutlu. Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms. In SIGMETRICS, 2017.
[72]
Y. Levy, J. Bruck, Y. Cassuto, E. G. Friedman, A. Kolodny, E. Yaakobi, and S. Kvatinsky. Logic Operations in Memory Using a Memristive Akers Array. Microelectronics Journal, 2014.
[73]
H. Li and R. Durbin. Fast and Accurate Long-read Alignment with Burrows-Wheeler Transform. Bioinformatics, 2010.
[74]
S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories. In DAC, 2016.
[75]
Y. Li and J. M. Patel. BitWeaving: Fast Scans for Main Memory Data Processing. In SIGMOD, 2013.
[76]
Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. Proc. VLDB Endow., 2014.
[77]
E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture. IEEE Micro, 2008.
[78]
J. Liu, B. Jaiyen, R. Veras, and O. Mutlu. RAIDR: Retention-Aware Intelligent DRAM Refresh. In ISCA, 2012.
[79]
J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, and O. Mutlu. An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms. In ISCA, 2013.
[80]
Z. Liu, I. Calciu, M. Herlihy and O. Mutlu. Concurrent Data Structures for Near-Memory Computing. In SPAA, 2017.
[81]
S.-L. Lu, Y.-C. Lin, and C.-L. Yang. Improving DRAM Latency with Dynamic Asymmetric Subarray. In MICRO, 2015.
[82]
R. E. Lyons and W. Vanderkulk. The Use of Triple-Modular Redundancy to Improve Computer Reliability. IBM JRD, 1962.
[83]
S. A. Manavski. CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography. In ICSPC, 2007.
[84]
G. Myers. A Fast Bit-vector Algorithm for Approximate String Matching Based on Dynamic Programming. JACM, 1999.
[85]
E. O'Neil, P. O'Neil, and K. Wu. Bitmap Index Design Choices and Their Performance Implications. In IDEAS, 2007.
[86]
M. Oskin, F. T. Chong, and T. Sherwood. Active Pages: A Computation Model for Intelligent Memory. In ISCA, 1998.
[87]
M. Patel, J. S. Kim, and O. Mutlu. The Reach Profiler (REAPER): Enabling the Mitigation of DRAM Retention Failures via Profiling at Aggressive Conditions. In ISCA, 2017.
[88]
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. A Case for Intelligent RAM. IEEE Micro, 1997.
[89]
A. Pattnaik, X. Tang, A. Jog, O. Kayıran, A. K. Mishra, M. T. Kandemir, O. Mutlu, and C. R. Das. SchEduling Techniques for GPU Architectures with Processing-in-memory Capabilities. In PACT, 2016.
[90]
A. Peleg and U. Weiser. MMX Technology Extension to the Intel Architecture. IEEE Micro, 1996.
[91]
K. R. Rasmussen, J. Stoye, and E. W. Myers. Efficient Q-gram Filters for Finding All ε-matches Over a Given Length. JCB, 2006.
[92]
P. J. Restle, J. W. Park, and B. F. Lloyd. DRAM Variable Retention Time. In IEDM, 1992.
[93]
R. L. Rivest, L. Adleman, and M. L. Dertouzos. On Data Banks and Privacy Homomorphisms. FSC, 1978.
[94]
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory Access Scheduling. In ISCA, 2000.
[95]
S. M. Rumble, P. Lacroute, A. V. Dalca, M. Fiume, A. Sidow, and M. Brudno. SHRiMP: Accurate Mapping of Short Color-space Reads. PLOS Computational Biology, 2009.
[96]
V. Seshadri and O. Mutlu. Simple Operations in Memory to Reduce Data Movement, ADCOM, Chapter 5. Elsevier, 2017.
[97]
V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. RowClone: Fast and Energy-efficient In-DRAM Bulk Data Copy and Initialization. In MICRO, 2013.
[98]
V. Seshadri, A. Bhowmick, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. The Dirty-block Index. In ISCA, 2014.
[99]
V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Fast Bulk Bitwise AND and OR in DRAM. IEEE CAL, 2015.
[100]
V. Seshadri, T. Mullins, A. Boroumand, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Gather-scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses. In MICRO, 2015.
[101]
V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM. arXiv preprint arXiv:1611.09988, 2016.
[102]
A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In ISCA, 2016.
[103]
D. E. Shaw, S. Stolfo, H. Ibrahim, B. K. Hillyer, J. Andrews, and G. Wiederhold. The NON-VON Database Machine: An Overview. http://hdl.handle.net/10022/AC:P:11530, 1981.
[104]
R. Sikorski. Boolean Algebras, volume 2. Springer, 1969.
[105]
H. S. Stone. A Logic-in-Memory Computer. IEEE Trans. Comput., 1970.
[106]
A. Subramaniyan and R. Das. Parallel Automata Processor. In ISCA, 2017.
[107]
P. Tuyls, H. D. L. Hollmann, J. H. V. Lint, and L. Tolhuizen. XOR-based Visual Cryptography Schemes. Designs, Codes and Cryptography.
[108]
H. S. Warren. Hacker's Delight. Addison-Wesley Professional, 2nd edition, 2012. ISBN 0321842685, 9780321842688.
[109]
D. Weese, A.-K. Emde, T. Rausch, A. Döring, and K. Reinert. RazerS - fast Read Mapping with Sensitivity Control. Genome research, 2009.
[110]
T. Willhalm, I. Oukid, I. Muller, and F. Faerber. Vectorizing Database Column Scans with Complex Predicates. In ADMS, 2013.
[111]
K. Wu, E. J. Otoo, and A. Shoshani. Compressing Bitmap Indexes for Faster Search Operations. In SSDBM, 2002.
[112]
H. Xin, D. Lee, F. Hormozdiari, S. Yedkar, O. Mutlu, and C. Alkan. Accelerating Read Mapping with FastHASH. BMC Genomics, 2013.
[113]
H. Xin, J. Greth, J. Emmons, G. Pekhimenko, C. Kingsford, C. Alkan, and O. Mutlu. Shifted Hamming Distance: A Fast and Accurate SIMD-friendly Filter to Accelerate Alignment Verification in Read Mapping. Bioinformatics, 2015.
[114]
D. S. Yaney, C. Y. Lu, R. A. Kohler, M. J. Kelly, and J. T. Nelson. A Meta-stable Leakage Phenomenon in DRAM Charge Storage - Variable Hold Time. In IEDM, 1987.
[115]
D. Zhang, N. Jayasena, A. Lyashevsky, J. L. Greathouse, L. Xu, and M. Ignatowski. TOP-PIM: Throughput-oriented Programmable Processing in Memory. In HPDC, 2014.
[116]
T. Zhang, K. Chen, C. Xu, G. Sun, T. Wang, and Y. Xie. Half-DRAM: A High-bandwidth and Low-power DRAM Architecture from the Rethinking of Fine-grained Activation. In ISCA, 2014.
[117]
W. Zhao and Y. Cao. New Generation of Predictive Technology Model for Sub-45 nm Early Design Exploration. IEEE TED, 2006.
[118]
W. K. Zuravleff and T. Robinson. Controller for a Synchronous DRAM that Maximizes Throughput by Allowing Memory Requests and Commands to be Issued Out of Order. US Patent 5630096, 1997.

Cited By

View all
  • (2024)On Gate Flip Errors in Computing-In-Memory2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546875(1-6)Online publication date: 25-Mar-2024
  • (2024)PIM-Potential: Broadening the Acceleration Reach of PIM ArchitecturesProceedings of the International Symposium on Memory Systems10.1145/3695794.3695795(1-12)Online publication date: 30-Sep-2024
  • (2024)PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware ComputationACM Transactions on Architecture and Code Optimization10.1145/369082421:4(1-27)Online publication date: 20-Nov-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture
October 2017
850 pages
ISBN:9781450349529
DOI:10.1145/3123939
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DRAM
  2. bulk bitwise operations
  3. databases
  4. energy
  5. memory bandwidth
  6. performance
  7. processing-in-memory

Qualifiers

  • Research-article

Funding Sources

  • NSF
  • SRC
  • Intel Science and Technology Center for Cloud Computing

Conference

MICRO-50
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,050
  • Downloads (Last 6 weeks)123
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On Gate Flip Errors in Computing-In-Memory2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546875(1-6)Online publication date: 25-Mar-2024
  • (2024)PIM-Potential: Broadening the Acceleration Reach of PIM ArchitecturesProceedings of the International Symposium on Memory Systems10.1145/3695794.3695795(1-12)Online publication date: 30-Sep-2024
  • (2024)PIMSAB: A Processing-In-Memory System with Spatially-Aware Communication and Bit-Serial-Aware ComputationACM Transactions on Architecture and Code Optimization10.1145/369082421:4(1-27)Online publication date: 20-Nov-2024
  • (2024)PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory SystemProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676947(201-218)Online publication date: 14-Oct-2024
  • (2024)Energy Harvesting-assisted Ultra-Low-Power Processing-in-Memory Accelerator for ML ApplicationsProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3660392(633-638)Online publication date: 12-Jun-2024
  • (2024)Integrated Netlist Synthesis and In-Memory Mapping for Memristor-Aided LogicProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658758(38-43)Online publication date: 12-Jun-2024
  • (2024)SHERLOCK: Scheduling Efficient and Reliable Bulk Bitwise Operations in NVMsProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3658485(1-6)Online publication date: 23-Jun-2024
  • (2024)DNN-Defender: A Victim-Focused In-DRAM Defense Mechanism for Taming Adversarial Weight Attack on DNNsProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3656222(1-6)Online publication date: 23-Jun-2024
  • (2024)SPIMulator: A Spintronic Processing-in-memory Simulator for RacetracksACM Transactions on Embedded Computing Systems10.1145/364511223:6(1-27)Online publication date: 11-Sep-2024
  • (2024)AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model InferenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3620665.3640422(103-119)Online publication date: 27-Apr-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media