[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3352460.3358282acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article
Open access

Dynamic Multi-Resolution Data Storage

Published: 12 October 2019 Publication History

Abstract

Approximate computing that works on less precise data leads to significant performance gains and energy-cost reductions for compute kernels. However, without leveraging the full-stack design of computer systems, modern computer architectures undermine the potential of approximate computing.
In this paper, we present Varifocal Storage, a dynamic multi-resolution storage system that tackles challenges in performance, quality, flexibility and cost for computer systems supporting diverse application demands. Varifocal Storage dynamically adjusts the dataset resolution within a storage device, thereby mitigating the performance bottleneck of exchanging/preparing data for approximate compute kernels. Varifocal Storage introduces Autofocus and iFilter mechanisms to provide quality control inside the storage device and make programs more adaptive to diverse datasets. Varifocal Storage also offers flexible, efficient support for approximate and exact computing without exceeding the costs of conventional storage systems by (1) saving the raw dataset in the storage device, and (2) targeting operators that complement the power of existing SSD controllers to dynamically generate lower-resolution datasets.
We evaluate the performance of Varifocal Storage by running applications on a heterogeneous computer with our prototype SSD. The results show that Varifocal Storage can speed up data resolution adjustments by 2.02× or 1.74× without programmer input. Compared to conventional approximate-computing architectures, Varifocal Storage speeds up the overall execution time by 1.52×.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. http://tensorflow.org/ Software available from tensorflow.org.
[2]
S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, and R. Das. 2017. Compute Caches. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). 481--492. https://doi.org/10.1109/HPCA.2017.21
[3]
C. Alvarez, J. Corbal, and M. Valero. 2005. Fuzzy memoization for floating-point multimedia applications. IEEE Trans. Comput. 54, 7 (July 2005), 922--927. https://doi.org/10.1109/TC.2005.119
[4]
Amber Huffman. 2012. NVM Express Revision 1.1. http://nvmexpress.org/wp-content/uploads/2013/05/NVM_Express_1_1.pdf.
[5]
Woongki Baek and Trishul M Chilimbi. 2010. Green: a framework for supporting energy-conscious programming using controlled approximation. In ACM Sigplan Notices, Vol. 45. ACM, 198--209.
[6]
S. Boboila, Youngjae Kim, S.S. Vazhkudai, P. Desnoyers, and G.M. Shipman. 2012. Active Flash: Out-of-core data analytics on flash storage. In Mass Storage Systems and Technologies (MSST), 2012 IEEE 28th Symposium on. 1--12. https://doi.org/10.1109/MSST.2012.6232366
[7]
Brett Boston, Adrian Sampson, Dan Grossman, and Luis Ceze. 2015. Probability Type Inference for Flexible Approximate Programming. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, New York, NY, USA, 470--487. https://doi.org/10.1145/2814270.2814301
[8]
M. Burtscher and P. Ratanaworabhan. 2009. FPC: A High-Speed Compressor for Double-Precision Floating-Point Data. IEEE Trans. Comput. 58, 1 (Jan 2009), 18--31. https://doi.org/10.1109/TC.2008.131
[9]
Adrian M. Caulfield, Arup De, Joel Coburn, Todor I. Mollow, Rajesh K. Gupta, and Steven Swanson. 2010. Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '43). IEEE Computer Society, Washington, DC, USA, 385--395. https://doi.org/10.1109/MICRO.2010.33
[10]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. CoRR abs/1603.02754 (2016). arXiv:1603.02754 http://arxiv.org/abs/1603.02754
[11]
X. Chen, L. Yang, R. P. Dick, L. Shang, and H. Lekatsas. 2010. C-Pack: A High-Performance Microprocessor Cache Compression Algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18, 8 (Aug 2010), 1196--1208. https://doi.org/10.1109/TVLSI.2009.2020989
[12]
V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC). 1--9. https://doi.org/10.1145/2463209.2488873
[13]
H. Cho, L. Leem, and S. Mitra. 2012. ERSA: Error Resilient System Architecture for Probabilistic Applications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 4 (April 2012), 546--558. https://doi.org/10.1109/TCAD.2011.2179038
[14]
I. Stephen Choi and Yang-Suk Kee. 2015. Energy Efficient Scale-In Clusters with In-Storage Processing for Big-Data Analytics. In Proceedings of the 2015 International Symposium on Memory Systems (MEMSYS '15). ACM, New York, NY, USA, 265--273. https://doi.org/10.1145/2818950.2818983
[15]
Marc de Kruijf, Shuou Nomura, and Karthikeyan Sankaralingam. 2010. Relax: An Architectural Framework for Software Recovery of Hardware Faults. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). ACM, New York, NY, USA, 497--508. https://doi.org/10.1145/1815961.1816026
[16]
Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park, and David J. DeWitt. 2013. Query Processing on Smart SSDs: Opportunities and Challenges. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD '13). ACM, New York, NY, USA, 1221--1230. https://doi.org/10.1145/2463676.2465295
[17]
Jeff Draper, Jacqueline Chame, Mary Hall, Craig Steele, Tim Barrett, Jeff LaCoss, John Granacki, Jaewook Shin, Chun Chen, Chang Woo Kang, Ihn Kim, and Gokhan Daglikoca. 2002. The Architecture of the DIVA Processing-in-memory Chip. In Proceedings of the 16th International Conference on Supercomputing (ICS '02). 14--25.
[18]
Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural Acceleration for General-Purpose Approximate Programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, USA, 449--460. https://doi.org/10.1109/MICRO.2012.48
[19]
Y. Fang, H. Li, and X. Li. 2011. A Fault Criticality Evaluation Framework of Digital Systems for Error Tolerant Video Applications. In 2011 Asian Test Symposium. 329--334. https://doi.org/10.1109/ATS.2011.72
[20]
Marc E. Fiuczynski, Richard P. Martin, Tsutomu Owa, and Brian N. Bershad. 1998. SPINE: A Safe Programmable and Integrated Network Environment. In Proceedings of the 8th ACM SIGOPS European Workshop on Support for Composing Distributed Applications (EW 8). 7--12.
[21]
S. Ganapathy, A. Teman, R. Giterman, A. Burg, and G. Karakonstantis. 2015. Approximate computing with unreliable dynamic memories. In 2015 IEEE 13th International New Circuits and Systems Conference (NEWCAS). 1--4. https://doi.org/10.1109/NEWCAS.2015.7182027
[22]
L.M. Grupp, A.M. Caulfield, J. Coburn, S. Swanson, E. Yaakobi, P.H. Siegel, and J.K. Wolf. 2009. Characterizing flash memory: Anomalies, observations, and applications. In 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42). 24 --33.
[23]
Boncheol Gu, Andre S. Yoon, Duck-Ho Bae, Insoon Jo, Jinyoung Lee, Jonghyun Yoon, Jeong-Uk Kang, Moonsang Kwon, Chanho Yoon, Sangyeun Cho, Jaeheon Jeong, and Duckhyun Chang. 2016. Biscuit: A Framework for Near-data Processing of Big Data Workloads. SIGARCH Comput. Archit. News 44, 3 (June 2016), 153--165. https://doi.org/10.1145/3007787.3001154
[24]
V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, and K. Roy. 2011. IMPACT: IMPrecise adders for low-power approximate computing. In IEEE/ACM International Symposium on Low Power Electronics and Design. 409--414. https://doi.org/10.1109/ISLPED.2011.5993675
[25]
Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard. 2011. Dynamic Knobs for Responsive Power-aware Computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, 199--212. https://doi.org/10.1145/1950365.1950390
[26]
Intel Corporation. 2018. INTEL(R) CORE(TM) i7-7700K PROCESSOR. https://www.intel.com/content/www/us/en/products/processors/core/i7-processors/i7-7700k.html.
[27]
Intel Corporation. 2018. Intel(R) Optane(TM) Technology. https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html.
[28]
Djordje Jevdjic, Karin Strauss, Luis Ceze, and Henrique S. Malvar. 2017. Approximate Storage of Compressed and Encrypted Videos. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 361--373. https://doi.org/10.1145/3037697.3037718
[29]
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 1--12. https://doi.org/10.1145/3079856.3080246
[30]
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M. Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks. In Proceedings of the 2016 International Conference on Supercomputing (ICS '16). ACM, New York, NY, USA, Article 23, 12 pages. https://doi.org/10.1145/2925426.2926294
[31]
Sang-Woo Jun, Ming Liu, Sungjin Lee, Jamey Hicks, John Ankcorn, Myron King, Shuotao Xu, and Arvind. 2015. BlueDBM: An Appliance for Big Data Analytics. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 1--13. https://doi.org/10.1145/2749469.2750412
[32]
A. B. Kahng and S. Kang. 2012. Accuracy-configurable adder for approximate arithmetic designs. In DAC Design Automation Conference 2012. 820--825. https://doi.org/10.1145/2228360.2228509
[33]
Yangwook Kang, Yang-Suk Kee, Ethan L. Miller, and Chanik Park. 2013. Enabling cost-effective data processing with smart SSD. In Mass Storage Systems and Technologies (MSST).
[34]
Z. M. Kedem, V. J. Mooney, K. K. Muntimadugu, and K. V. Palem. 2011. An approach to energy-error tradeoffs in approximate ripple carry adders. In IEEE/ACM International Symposium on Low Power Electronics and Design. 211--216. https://doi.org/10.1109/ISLPED.2011.5993638
[35]
Kimberly Keeton, David A. Patterson, and Joseph M. Hellerstein. 1998. A Case for Intelligent Disks (IDISKs). SIGMOD Rec. 27, 3 (Sept. 1998), 42--52. https://doi.org/10.1145/290593.290602
[36]
D. S. Khudia and S. Mahlke. 2014. Harnessing Soft Computations for Low-Budget Fault Tolerance. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. 319--330. https://doi.org/10.1109/MICRO.2014.33
[37]
D. S. Khudia, B. Zamirai, M. Samadi, and S. Mahlke. 2015. Rumba: An online quality management system for approximate computing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 554--566. https://doi.org/10.1145/2749469.2750371
[38]
Y. Kim, S. Behroozi, V. Raghunathan, and A. Raghunathan. 2017. AXSERBUS: A quality-configurable approximate serial bus for energy-efficient sensing. In 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 1--6. https://doi.org/10.1109/ISLPED.2017.8009172
[39]
P.M. Kogge. 1994. EXECUBE-A New Architecture for Scaleable MPPs. In Parallel Processing, 1994. Vol. 1. ICPP 1994. International Conference on, Vol. 1. 77--84.
[40]
Gunjae Koo, Kiran Kumar Matam, Te I, Hema Venkata Krishna Giri Narra, Jing Li, Steven Swanson, Hung-Wei Tseng, and Murali Annavaram. 2017. Summarizer: Trading Bandwidth with Computing Near Storage. In 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2017).
[41]
Michael A. Laurenzano, Parker Hill, Mehrzad Samadi, Scott Mahlke, Jason Mars, and Lingjia Tang. 2016. Input Responsiveness: Using Canary Inputs to Dynamically Steer Approximation. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '16). ACM, New York, NY, USA, 161--176. https://doi.org/10.1145/2908080.2908087
[42]
I. Lazaridis and S. Mehrotra. 2004. Approximate selection queries over imprecise data. In Proceedings. 20th International Conference on Data Engineering. 140--151. https://doi.org/10.1109/ICDE.2004.1319991
[43]
Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting Phase Change Memory As a Scalable Dram Alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA '09). ACM, New York, NY, USA, 2--13. https://doi.org/10.1145/1555754.1555758
[44]
Xuanhua Li and Donald Yeung. 2007. Application-Level Correctness and Its Impact on Fault Tolerance. In Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture (HPCA '07). IEEE Computer Society, Washington, DC, USA, 181--192. https://doi.org/10.1109/HPCA.2007.346196
[45]
Youjie Li, Jongse Park, Mohammad Alian, Yifan Yuan, Zheng Qu, Peitian Pan, Ren Wang, Alexander Gerhard Schwing, Hadi Esmaeilzadeh, and Nam Sung Kim. 2018. A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks. In 51th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2018).
[46]
J. Liang, J. Han, and F. Lombardi. 2013. New Metrics for the Reliability of Approximate and Probabilistic Adders. IEEE Trans. Comput. 62, 9 (Sept 2013), 1760--1771. https://doi.org/10.1109/TC.2012.146
[47]
Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, and Haiyong Wang. 2013. String similarity measures and joins with synonyms. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 373--384.
[48]
A.B. Maccabe, W. Zhu, J. Otto, and R. Riesen. 2002. Experience in offloading protocol processing to a programmable NIC. In Cluster Computing, 2002. Proceedings. 2002 IEEE International Conference on. 67--74.
[49]
D. Mahajan, A. Yazdanbaksh, J. Park, B. Thwaites, and H. Esmaeilzadeh. 2016. Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 66--77. https://doi.org/10.1109/ISCA.2016.16
[50]
Ken Mai, T. Paaske, N. Jayasena, R. Ho, W.J. Dally, and M. Horowitz. 2000. Smart Memories: a modular reconfigurable architecture. In Computer Architecture, 2000. Proceedings of the 27th International Symposium on. 161--171.
[51]
Gurmeet Singh Manku and Rajeev Motwani. 2002. Approximate frequency counts over data streams. In Proceedings of the 28th international conference on Very Large Data Bases. VLDB Endowment, 346--357.
[52]
Vadim Markovtsev and Máximo Cuadros. 2017. src-d/kmcuda: 6.0.0-1. https://doi.org/10.5281/zenodo.286944
[53]
Jiayuan Meng, S. Chakradhar, and A. Raghunathan. 2009. Best-effort parallel execution framework for Recognition and mining applications. In 2009 IEEE International Symposium on Parallel Distributed Processing. 1--12. https://doi.org/10.1109/IPDPS.2009.5160991
[54]
Jin Miao, Ku He, Andreas Gerstlauer, and Michael Orshansky. 2012. Modeling and Synthesis of Quality-energy Optimal Approximate Adders. In Proceedings of the International Conference on Computer-Aided Design (ICCAD '12). ACM, New York, NY, USA, 728--735. https://doi.org/10.1145/2429384.2429542
[55]
Micron Technology, Inc. 2010. MT29F256G08 Datasheet. https://www.micron.com/products/nand-flash/mlc-nand/part-catalog.
[56]
Joshua San Miguel, Mario Badr, and Natalie Enright Jerger. 2014. Load value approximation. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 127--139.
[57]
S. Misailovic, S. Sidiroglou, H. Hoffmann, and M. Rinard. 2010. Quality of service profiling. In 2010 ACM/IEEE 32nd International Conference on Software Engineering, Vol. 1. 25--34. https://doi.org/10.1145/1806799.1806808
[58]
T. Moreau, A. Sampson, and L. Ceze. 2015. Approximate Computing: Making Mobile Systems More Efficient. IEEE Pervasive Computing 14, 2 (Apr 2015), 9--13. https://doi.org/10.1109/MPRV.2015.25
[59]
Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, and Mark Oskin. 2015. SNNAP: Approximate computing on programmable SoCs via neural acceleration. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture, HPCA 2015. Institute of Electrical and Electronics Engineers Inc., 603--614. https://doi.org/10.1109/HPCA.2015.7056066
[60]
NVIDIA Corporation. 2019. NVIDIA T4 TENSOR CORE GPU. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/tesla-t4/t4-tensor-core-datasheet-951643.pdf.
[61]
H. Omar, M. Ahmad, and O. Khan. 2017. GraphTuner: An Input Dependence Aware Loop Perforation Scheme for Efficient Execution of Approximated Graph Algorithms. In 2017 IEEE International Conference on Computer Design (ICCD). 201--208. https://doi.org/10.1109/ICCD.2017.38
[62]
Daniele Jahier Pagliari, Enrico Macii, and Massimo Poncino. 2017. Approximate Energy-Efficient Encoding for Serial Interfaces. ACM Trans. Des. Autom. Electron. Syst. 22, 4, Article 64 (May 2017), 25 pages. https://doi.org/10.1145/3041220
[63]
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. 1997. Intelligent RAM (IRAM): chips that remember and compute. In Solid-State Circuits Conference, 1997. Digest of Technical Papers. 43rd ISSCC., 1997 IEEE International. 224--225. https://doi.org/10.1109/ISSCC.1997.585348
[64]
G. Pekhimenko, V. Seshadri, O. Mutlu, M. A. Kozuch, P. B. Gibbons, and T. C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). 377--388.
[65]
Colin Perciva. 2006. Matching with Mismatches and Assorted Applications. (2006).
[66]
PMC-Sierra. 2014. Flashtec NVMe Controllers. http://pmcs.com/products/storage/flashtec_nvme_controllers/.
[67]
Erik Riedel, Christos Faloutsos, Garth A. Gibson, and David Nagle. 2001. Active Disks for Large-Scale Data Processing. Computer 34, 6 (June 2001), 68--74. https://doi.org/10.1109/2.928624
[68]
Michael Ringenburg, Adrian Sampson, Isaac Ackerman, Luis Ceze, and Dan Grossman. 2015. Monitoring and Debugging the Quality of Results in Approximate Programs. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '15). ACM, New York, NY, USA, 399--411. https://doi.org/10.1145/2694344.2694365
[69]
M. Boyer S. Che, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. 2009. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC '09). 44--54.
[70]
Mehrzad Samadi, Davoud Anoushe Jamshidi, Janghaeng Lee, and Scott Mahlke. 2014. Paraprox: Pattern-based Approximation for Data Parallel Applications. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 35--50. https://doi.org/10.1145/2541940.2541948
[71]
Mehrzad Samadi, Janghaeng Lee, D Anoushe Jamshidi, Amir Hormati, and Scott Mahlke. 2013. Sage: Self-tuning approximation for graphics engines. In Microarchitecture (MICRO), 2013 46th Annual IEEE/ACM International Symposium on. IEEE, 13--24.
[72]
Adrian Sampson, Werner Dietl, Emily Fortuna, Danushen Gnanapragasam, Luis Ceze, and Dan Grossman. 2011. EnerJ: Approximate Data Types for Safe and General Low-power Computation. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11). ACM, New York, NY, USA, 164--174. https://doi.org/10.1145/1993498.1993518
[73]
Adrian Sampson, Jacob Nelson, Karin Strauss, and Luis Ceze. 2013. Approximate Storage in Solid-state Memories. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46). ACM, New York, NY, USA, 25--36. https://doi.org/10.1145/2540708.2540712
[74]
Samsung Electronics, Co. Ltd. 2017. Ultra-Low Latency with Samsung Z-NAND SSD. https://www.samsung.com/semiconductor/global.semi.static/Ultra-Low_Latency_with_Samsung_Z-NAND_SSD-0.pdf.
[75]
Mohit Saxena, Michael M. Swift, and Yiying Zhang. 2012. FlashTier: A Lightweight, Consistent and Durable Storage Cache. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys '12). ACM, New York, NY, USA, 267--280. https://doi.org/10.1145/2168836.2168863
[76]
Sudharsan Seshadri, Mark Gahagan, Sundaram Bhaskaran, Trevor Bunker, Arup De, Yanqin Jin, Yang Liu, and Steven Swanson. 2014. Willow: A User-Programmable SSD. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 67--80. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/seshadri
[77]
Muhammad Shafique, Waqas Ahmad, Rehan Hafiz, and Jörg Henkel. 2015. A Low Latency Generic Accuracy Configurable Adder. In Proceedings of the 52Nd Annual Design Automation Conference (DAC '15). ACM, New York, NY, USA, Article 86, 6 pages. https://doi.org/10.1145/2744769.2744778
[78]
Stelios Sidiroglou-Douskos, Sasa Misailovic, Henry Hoffmann, and Martin Rinard. 2011. Managing Performance vs. Accuracy Trade-offs with Loop Perforation. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE '11). ACM, New York, NY, USA, 124--134. https://doi.org/10.1145/2025113.2025133
[79]
Patrice Simard, Bernard Victorri, Yann LeCun, and John Denker. 1992. Tangent Prop - A formalism for specifying selected invariances in an adaptive network. In Advances in Neural Information Processing Systems 4, J. E. Moody, S. J. Hanson, and R. P. Lippmann (Eds.). Morgan-Kaufmann, 895--903. http://papers.nips.cc/paper/536-tangent-prop-a-formalism-for-specifying-selected-invariances-in-an-adaptive-network.pdf
[80]
Arun Subramaniyan and Reetuparna Das. 2017. Parallel Automata Processor. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 600--612. https://doi.org/10.1145/3079856.3080207
[81]
Xin Sui, Andrew Lenharth, Donald S. Fussell, and Keshav Pingali. 2016. Proactive Control of Approximate Programs. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 607--621. https://doi.org/10.1145/2872362.2872402
[82]
Devesh Tiwari, Sudharshan S. Vazhkudai, Youngjae Kim, Xiaosong Ma, Simona Boboila, and Peter J. Desnoyers. 2012. Reducing Data Movement Costs Using Energy Efficient, Active Computation on SSD. In Proceedings of the 2012 USENIX Conference on Power-Aware Computing and Systems (HotPower'12). USENIX Association, Berkeley, CA, USA, 4--4. http://dl.acm.org/citation.cfm?id=2387869.2387873
[83]
Jonathan Ying Fai Tong, David Nagle, and Rob. A. Rutenbar. 2000. Reducing Power by Optimizing the Necessary Precision/Range of Floating-point Arithmetic. IEEE Trans. Very Large Scale Integr. Syst. 8, 3 (June 2000), 273--285. https://doi.org/10.1109/92.845894
[84]
J. Torrellas. 2012. FlexRAM: Toward an advanced Intelligent Memory system: A retrospective paper. In Computer Design (ICCD), 2012 IEEE 30th International Conference on. 3--4.
[85]
Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhou, Mark Gahagan, and Steven Swanson. 2016. Morpheus: Creating Application Objects Efficiently for Heterogeneous Computing. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 53--65. https://doi.org/10.1109/ISCA.2016.15
[86]
A. K. Verma, P. Brisk, and P. Ienne. 2008. Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design. In 2008 Design, Automation and Test in Europe. 1250--1255. https://doi.org/10.1109/DATE.2008.4484850
[87]
Zeyi Wen, Jiashuai Shi, Bingsheng He, Qinbin Li, and Jian Chen. 2018. Thunder-SVM: A Fast SVM Library on GPUs and CPUs. To appear in arxiv (2018).
[88]
Zeyi Wen, Jiashuai Shi, Bingsheng He, Qinbin Li, and Jian Chen. 2019. ThunderGBM: Fast GBDTs and Random Forests on GPUs. To appear in arXiv (2019).
[89]
Louis Woods, Zsolt István, and Gustavo Alonso. 2014. Ibex: An Intelligent Storage Engine with Support for Advanced SQL Offloading. Proc. VLDB Endow. 7, 11 (July 2014), 963--974. https://doi.org/10.14778/2732967.2732972
[90]
Yann Collet. 2018. Zstandard - Fast real-time compression algorithm. https://github.com/facebook/zstd/releases/.
[91]
A. Yazdanbakhsh, D. Mahajan, H. Esmaeilzadeh, and P. Lotfi-Kamran. 2017. AxBench: A Multiplatform Benchmark Suite for Approximate Computing. IEEE Design Test 34, 2 (April 2017), 60--68. https://doi.org/10.1109/MDAT.2016.2630270
[92]
Amir Yazdanbakhsh, Jongse Park, Hardik Sharma, Pejman Lotfi-Kamran, and Hadi Esmaeilzadeh. 2015. Neural Acceleration for GPU Throughput Processors. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, New York, NY, USA, 482--493. https://doi.org/10.1145/2830772.2830810
[93]
Rong Ye, Ting Wang, Feng Yuan, Rakesh Kumar, and Qiang Xu. 2013. On Reconfiguration-oriented Approximate Adder Design and Its Application. In Proceedings of the International Conference on Computer-Aided Design (ICCAD '13). IEEE Press, Piscataway, NJ, USA, 48--54. http://dl.acm.org/citation.cfm?id=2561828.2561838
[94]
Thomas Y. Yeh, Glenn Reinman, Sanjay J. Patel, and Petros Faloutsos. 2009. Fool Me Twice: Exploring and Exploiting Error Tolerance in Physics-based Animation. ACM Trans. Graph. 29, 1, Article 5 (Dec. 2009), 11 pages. https://doi.org/10.1145/1640443.1640448
[95]
Yong Ho Song. 2017. The OpenSSD Project. http://www.openssd-project.org/wiki/The_OpenSSD_Project.
[96]
Jie Zhang and Myoungsoo Jung. 2018. Flashabacus: A Self-governing Flash-based Accelerator for Low-power Systems. In Proceedings of the Thirteenth EuroSys Conference (EuroSys '18). ACM, New York, NY, USA, Article 15, 15 pages. https://doi.org/10.1145/3190508.3190544
[97]
Ning Zhu, W. L. Goh, and K. S. Yeo. 2009. An enhanced low-power high-speed Adder For Error-Tolerant application. In Proceedings of the 2009 12th International Symposium on Integrated Circuits. 69--72.

Cited By

View all
  • (2024)PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00033(340-353)Online publication date: 29-Jun-2024
  • (2023)Abakus: Accelerating k-mer Counting with Storage TechnologyACM Transactions on Architecture and Code Optimization10.1145/363295221:1(1-26)Online publication date: 21-Nov-2023
  • (2023)Rethinking Programming Frameworks for In-Storage Processing2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247919(1-6)Online publication date: 9-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture
October 2019
1104 pages
ISBN:9781450369381
DOI:10.1145/3352460
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Approximate Computing
  2. Heterogeneous Computer Architectures/Systems
  3. In-Storage Processing
  4. Intelligent Storage Systems
  5. Near-Data Processing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

MICRO '52
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)183
  • Downloads (Last 6 weeks)18
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00033(340-353)Online publication date: 29-Jun-2024
  • (2023)Abakus: Accelerating k-mer Counting with Storage TechnologyACM Transactions on Architecture and Code Optimization10.1145/363295221:1(1-26)Online publication date: 21-Nov-2023
  • (2023)Rethinking Programming Frameworks for In-Storage Processing2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247919(1-6)Online publication date: 9-Jul-2023
  • (2022)SmartSAGEProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527391(932-945)Online publication date: 18-Jun-2022
  • (2022)RM-SSD: In-Storage Computing for Large-Scale Recommendation Inference2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00081(1056-1070)Online publication date: Apr-2022
  • (2021)Understanding the Implication of Non-Volatile Memory for Large-Scale Graph Neural Network TrainingIEEE Computer Architecture Letters10.1109/LCA.2021.309894320:2(118-121)Online publication date: 1-Jul-2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media