More Web Proxy on the site http://driver.im/

research-article

Accelerating generalized linear models with MLWeaving: a one-size-fits-all system for any-precision learning

Authors:

Gustavo Alonso,

Ce ZhangAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 12, Issue 7

Pages 807 - 821

https://doi.org/10.14778/3317315.3317322

Published: 01 March 2019 Publication History

Abstract

Learning from the data stored in a database is an important function increasingly available in relational engines. Methods using lower precision input data are of special interest given their overall higher efficiency. However, in databases, these methods have a hidden cost: the quantization of the real value into a smaller number is an expensive step. To address this issue, we present ML-Weaving, a data structure and hardware acceleration technique intended to speed up learning of generalized linear models over low precision data. MLWeaving provides a compact in-memory representation that enables the retrieval of data at any level of precision. MLWeaving also provides a highly efficient implementation of stochastic gradient descent on FPGAs and enables the dynamic tuning of precision, instead of using a fixed precision level during learning. Experimental results show that MLWeaving converges up to 16 x faster than low-precision implementations of first-order methods on CPUs.

References

[1]

S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, and R. Das. Compute Caches. In HPCA, 2017.

[2]

V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, H. Esmaeilzadeh, and R. Gupta. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks. In ISCA, 2018.

Digital Library

[3]

J. Albericio, A. Delmás, P. Judd, S. Sharify, G. O'Leary, R. Genov, and A. Moshovos. Bit-pragmatic Deep Neural Network Computing. In MICRO, 2017.

Digital Library

[4]

J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos. Cnvlutin: Ineffectual-neuron-free Deep Neural Network Computing. In ISCA, 2016.

Digital Library

[5]

A. Boutros, S. Yazdanshenas, and V. Betz. Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs. In FPL, 2018.

[6]

S. Cadambi, I. Durdanovic, V. Jakkula, M. Sankaradass, E. Cosatto, S. Chakradhar, and H. P. Graf. A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines. In FCCM, 2009.

Digital Library

[7]

R. Cai, A. Ren, N. Liu, C. Ding, L. Wang, X. Qian, M. Pedram, and Y. Wang. VIBNN: Hardware Acceleration of Bayesian Neural Networks. In ASPLOS, 2018.

Digital Library

[8]

A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. Burger. A Cloud-Scale Acceleration Architecture. In MICRO, 2016.

Digital Library

[9]

C. Chang and C. Lin. LIBSVM: A Library for Support Vector Machines. TIST, 2(3):27:1--27:27, 2011.

Digital Library

[10]

Y. Chen, J. Emer, and V. Sze. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In ISCA, 2016.

Digital Library

[11]

Y. Chen, T. Krishna, J. S. Emer, and V. Sze. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. JSSC, 2017.

[12]

Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. DaDianNao: A Machine-Learning Supercomputer. In MICRO, 2014.

Digital Library

[13]

Z. Chen, J. Gehrke, and F. Korn. Query Optimization in Compressed Database Systems. In SIGMOD, 2001.

Digital Library

[14]

G. R. Chiu, A. C. Ling, D. Capalija, A. Bitar, and M. S. Abdelfattah. Flexibility: FPGAs and CAD in Deep Learning Acceleration. In ISPD, 2018.

Digital Library

[15]

E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, M. Abeydeera, L. Adams, H. Angepat, C. Boehn, D. Chiou, O. Firestein, A. Forin, K. S. Gatlin, M. Ghandi, S. Heil, K. Holohan, A. E. Husseini, T. Juhasz, K. Kagi, R. Kovvuri, S. Lanka, F. van Megen, D. Mukhortov, P. Patel, B. Perez, A. Rapsang, S. Reinhardt, B. Rouhani, A. Sapek, R. Seera, S. Shekar, B. Sridharan, G. Weisz, L. Woods, P. Y. Xiao, D. Zhang, R. Zhao, and D. Burger. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro, 38(2):8--20, 2018.

[16]

M. Courbariaux, Y. Bengio, and J. David. Low Precision Arithmetic For Deep Learning. CoRR, abs/1412.7024, 2014.

[17]

C. De Sa, M. Feldman, C. Ré, and K. Olukotun. Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent. In ISCA, 2017.

Digital Library

[18]

C. De Sa, M. Leszczynski, J. Zhang, A. Marzoev, C. R. Aberger, K. Olukotun, and C. Ré. High-Accuracy Low-Precision Training. ArXiv, 2018.

[19]

A. Delmas, S. Sharify, P. Judd, and A. Moshovos. Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability. CoRR, abs/1707.09068, 2017.

[20]

Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam. ShiDianNao: Shifting Vision Processing Closer to the Sensor. In ISCA, 2015.

Digital Library

[21]

C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. Iyer, D. Sylvester, D. Blaauw, and R. Das. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks. In ISCA, 2018.

Digital Library

[22]

A. Elgohary, M. Boehm, P. J. Haas, F. R. Reiss, and B. Reinwald. Compressed Linear Algebra for Large-scale Machine Learning. PVLDB, 9(12):960--971, 2016.

Digital Library

[23]

R. Espasa, F. Ardanaz, J. Emer, S. Felix, J. Gago, R. Gramunt, I. Hernandez, T. Juan, G. Lowney, M. Mattina, and A. Seznec. Tarantula: a vector extension to the alpha architecture. In ISCA, 2002.

Digital Library

[24]

F. Farber, N. May, W. Lehner, I. Muller, H. Rauhe, J. Dees, and S. Ag. The SAP HANA Database: An Architecture Overview. In IEEE Data Eng. Bull., 2012.

[25]

Z. Feng, E. Lo, B. Kao, and W. Xu. ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout. In SIGMOD, 2015.

Digital Library

[26]

J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi, S. Heil, P. Patel, A. Sapek, G. Weisz, L. Woods, S. Lanka, S. K. Reinhardt, A. M. Caulfield, E. S. Chung, and D. Burger. A Configurable Cloud-Scale DNN Processor for Real-Time AI. In ISCA, 2018.

Digital Library

[27]

M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: A Main Memory Hybrid Storage Engine. PVLDB, 4(2):105--116, 2010.

Digital Library

[28]

P. Gupta. Accelerating Datacenter Workloads. FPL, 2016.

[29]

S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. Deep Learning with Limited Numerical Precision. In ICML, 2015.

Digital Library

[30]

S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In ISCA, 2016.

Digital Library

[31]

S. Han, H. Mao, and W. J. Dally. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In ICLR, 2015.

[32]

J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, and A. Kumar. The MADlib Analytics Library: Or MAD Skills, the SQL. PVLDB, 5(12):1700--1711, 2012.

Digital Library

[33]

W. Hillis. The Connection Machine. MIT Press, 1986.

Digital Library

[34]

A. L. Holloway, V. Raman, G. Swart, and D. J. DeWitt. How to Barter Bits for Chronons: Compression and Bandwidth Trade Offs for Database Scans. In SIGMOD, 2007.

Digital Library

[35]

S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, and M. Kersten. MonetDB: Two Decades of Research in Column-oriented Database. IEEE Data Engineering Bulletin, 2012.

[36]

Z. Istvan, D. Sidler, and G. Alonso. Runtime Parameterizable Regular Expression Operators for Databases. In FCCM, 2016.

[37]

Z. Istvan, L. Woods, and G. Alonso. Histograms As a Side Effect of Data Movement for Big Data. In SIGMOD, 2014.

Digital Library

[38]

A. Jain, A. Phanishayee, J. Mars, L. Tang, and G. Pekhimenko. Gist: Efficient Data Encoding for Deep Neural Network Training. In ISCA, 2018.

Digital Library

[39]

Y. Ji, Y. Zhang, W. Chen, and Y. Xie. Bridge the Gap Between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler. In ASPLOS, 2018.

Digital Library

[40]

R. Johnson and I. Pandis. The Bionic DBMS is Coming, but What Will It Look Like? In CIDR, 2013.

[41]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. In-Datacenter Performance Analysis of a Tensor Processing Unit. In ISCA, 2017.

Digital Library

[42]

P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, N. E. Jerger, and A. Moshovos. Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks. In ICS, 2016.

Digital Library

[43]

P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. Stripes: Bit-serial deep neural network computing. In MICRO, 2016.

Digital Library

[44]

T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An Efficient k-Means Clustering Algorithm: Analysis and Implementation. TPAMI, 24:881--892, 2002.

Digital Library

[45]

K. Kara, D. Alistarh, G. Alonso, O. Mutlu, and C. Zhang. FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off. In FCCM, 2017.

[46]

K. Kara, K. Eguro, C. Zhang, and G. Alonso. ColumnML: Column-Store Machine Learning with On-the-Fly Data Transformation. PVLDB, 12(4):348--361, 2018.

Digital Library

[47]

K. Kara, J. Giceva, and G. Alonso. Fpga-Based Data Partitioning. In SIGMOD, 2017.

Digital Library

[48]

V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. TPDS, 24(10):1930--1940, 2013.

Digital Library

[49]

J. Kim, M. Sullivan, E. Choukse, and M. Erez. Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures. In ISCA, 2016.

Digital Library

[50]

U. Köster, T. Webb, X. Wang, M. Nassar, A. K. Bansal, W. Constable, O. Elibol, S. Gray, S. Hall, L. Hornof, A. Khosrowshahi, C. Kloss, R. J. Pai, and N. Rao. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In NIPS. 2017.

Digital Library

[51]

C. Kozyrakis and D. Patterson. Vector vs. Superscalar and VLIW architectures For Embedded Multimedia Benchmarks. In MICRO, 2002.

Digital Library

[52]

C. E. Kozyrakis and D. A. Patterson. Scalable, Vector Processors For Embedded Systems. IEEE Micro, 2003.

Digital Library

[53]

A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS. 2012.

Digital Library

[54]

H. Kwon, A. Samajdar, and T. Krishna. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. In ASPLOS, 2018.

Digital Library

[55]

L. Lamport. Multiple Byte Processing with Full-word Instructions. Commun. ACM, 18(8):471--475, 1975.

Digital Library

[56]

D. Lee and S. Sebastian. Algorithms for Non-negative Matrix Factorization. In NIPS. 2001.

Digital Library

[57]

M. Li, T. Zhang, Y. Chen, and A. J. Smola. Efficient Mini-batch Training for Stochastic Optimization. In SIGKDD, 2014.

Digital Library

[58]

S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories. In DAC, 2016.

Digital Library

[59]

Y. Li and J. M. Patel. BitWeaving: Fast Scans for Main Memory Data Processing. In SIGMOD, 2013.

Digital Library

[60]

Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. PVLDB, 7(10):907--918, 2014.

Digital Library

[61]

Z. Li, C. Ding, S. Wang, W. Wen, Y. Zhuo, C. Liu, Q. Qiu, W. Xu, X. Lin, X. Qian, and Y. Wang. E-RNN: Design Optimization for Efficient Recurrent Neural Networksin FPGAs. In HPCA, 2019.

[62]

D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen. PuDianNao: A Polyvalent Machine Learning Accelerator. In ASPLOS, 2015.

Digital Library

[63]

Y. Liu, H. Zhang, L. Zeng, W. Wu, and C. Zhang. MLbench: Benchmarking Machine Learning Services Against Human Experts. PVLDB, 11(10):1220--1232, 2018.

Digital Library

[64]

D. Mahajan, J. K. Kim, J. Sacks, A. Ardalan, A. Kumar, and H. Esmaeilzadeh. In-RDBMS Hardware Acceleration of Advanced Analytics. PVLDB, 11(11):1317--1331, 2018.

Digital Library

[65]

D. Mahajan, J. Park, E. Amaro, H. Sharma, A. Yazdanbakhsh, J. K. Kim, and H. Esmaeilzadeh. TABLA: A Unified Template-based Framework For Accelerating Statistical Machine Learning. In HPCA, 2016.

[66]

B. Moons and M. Verhelst. A 0.3 ×2013;2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets. In VLSI-Circuits, 2016.

[67]

T. Moreau, M. Wyse, J. Nelson, A. Sampson, H. Esmaeilzadeh, L. Ceze, and M. Oskin. SNNAP: Approximate Computing on Programmable SoCs via Neural Acceleration. In HPCA, 2015.

[68]

R. Mueller, J. Teubner, and G. Alonso. Data Processing on FPGAs. PVLDB, 2(1):910--921, 2009.

Digital Library

[69]

R. Mueller, J. Teubner, and G. Alonso. Streams on Wires: A Query Compiler for FPGAs. PVLDB, 2(1):229--240, 2009.

Digital Library

[70]

R. Mueller, J. Teubner, and G. Alonso. Glacier: A Query-to-hardware Compiler. In SIGMOD, 2010.

Digital Library

[71]

F. Niu, B. Recht, C. Re, and S. Wright. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In NIPS. 2011.

Digital Library

[72]

P. O'Neil and D. Quass. Improved Query Performance with Variant Indexes. In SIGMOD, pages 38--49, 1997.

Digital Library

[73]

M. Owaida, D. Sidler, K. Kara, and G. Alonso. Centaur: A Framework for Hybrid CPU-FPGA Databases. In FCCM, 2017.

[74]

G. Pekhimnko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Linearly Compressed Pages: A Low-Complexity, Low-Latency Main Memory Compression Framework. In MICRO, 2013.

Digital Library

[75]

O. Polychroniou and K. A. Ross. Efficient Lightweight Compression Alongside Fast Scans. In DaMoN, 2015.

Digital Library

[76]

V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-Time Query Processing. In ICDE, 2008.

Digital Library

[77]

B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G.-Y. Wei, and D. Brooks. Minerva: Enabling Low-power, Highly-accurate Deep Neural Network Accelerators. In ISCA, 2016.

Digital Library

[78]

R. Rifkin and A. Klautau. In Defense of One-Vs-All Classification. In JMLR. 2004.

Digital Library

[79]

D. Rinfret, P. O'Neil, and E. O'Neil. Bit-Sliced Index Arithmetic. In SIGMOD, 2001.

Digital Library

[80]

R. M. Russell. The CRAY-1 Computer System. Commun. ACM, 21(1):63--72, 1978.

Digital Library

[81]

B. Schlegel, R. Gemulla, and W. Lehner. Fast Integer Compression Using SIMD Instructions. In DaMoN, 2010.

Digital Library

[82]

V. Seshadri, K. Hsieh, A. Boroum, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Fast Bulk Bitwise AND and OR in DRAM. IEEE CAL, 2015.

Digital Library

[83]

V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Ambit: In-memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. In MICRO, 2017.

Digital Library

[84]

H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. From High-Level Deep Neural Models to FPGAs. In MICRO, 2016.

Digital Library

[85]

H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, J. Kim, V. Chandra, and H. Esmaeilzadeh. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks. In ISCA, 2018.

Digital Library

[86]

D. Sidler, Z. István, M. Owaida, and G. Alonso. Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures. In SIGMOD, 2017.

Digital Library

[87]

D. Sidler, Z. István, M. Owaida, K. Kara, and G. Alonso. doppioDB: A Hardware Accelerated Database. In SIGMOD, 2017.

Digital Library

[88]

A. Sinha and A. P. Chandrakasan. Energy Efficient Filtering Using Adaptive Precision and Variable Voltage. In IEEE International ASIC/SOC Conference, 1999.

[89]

V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proceedings of the IEEE, 2017.

[90]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception Architecture For Computer Vision. In CVPR. 2016.

[91]

J. Teubner and L. Woods. Data Processing on FPGAs Synthesis Lectures on Data Management. 2013.

Digital Library

[92]

Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. Finn: A framework For Fast, Scalable Binarized Neural Network Inference. In FPGA, 2017.

Digital Library

[93]

Y. Umuroglu, L. Rasnayake, and M. Sjlander. BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing. In FPL, 2018.

[94]

J. Wang and G. Joshi. Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD. CoRR, abs/1810.08313, 2018.

[95]

Z. Wang, B. He, and W. Zhang. A Study of Data Partitioning on OpenCL-based FPGAs. In FPL, 2015.

[96]

Z. Wang, K. Kara, H. Zhang, G. Alonso, O. Mutlu, and C. Zhang. Accelerating Generalized Linear Models with MLWeaving:A One-Size-Fits-All System for Any-Precision Learning (Technical Report). CoRR, 2019.

[97]

Z. Wang, J. Paul, H. Y. Cheah, B. He, and W. Zhang. Relational Query Processing on OpenCL-based FPGAs. In FPL, 2016.

[98]

Z. Wang, J. Paul, B. He, and W. Zhang. Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs. TVLSI, 25(6):1906--1918, 2017.

[99]

Z. Wang, K. Zhang, H. Zhou, X. Liu, and B. He. Hebe: An Order-Oblivious and High-Performance Execution Scheme for Conjunctive Predicates. In ICDE, 2018.

[100]

K. Q. Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A. J. Smola. Feature Hashing for Large Scale Multitask Learning. CoRR, 2009.

Digital Library

[101]

T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The Implementation and Performance of Compressed Databases. SIGMOD Rec., 29(3):55--67, 2000.

Digital Library

[102]

S. White. Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review. IEEE ASSP Magazine, 6(3):4--19, 1989.

[103]

T. Willhalm, R. Dementiev, and P. Fay. Intel Performance Counter Monitor - A better way to measure CPU utilization, https://software.intel.com/en-us/articles/intel-performance-counter-monitor, 2016.

[104]

L. Woods, Z. István, and G. Alonso. Ibex: An Intelligent Storage Engine with Support for Advanced SQL Offloading. VLDB, 7(11):963--974, 2014.

Digital Library

[105]

T. Xanthopoulos and A. Chandrakasan. A Low-power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization. In Symposium on VLSI Circuits, 1999.

[106]

J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In ISCA, 2017.

Digital Library

[107]

C. Zhang and C. Ré. DimmWitted: A Study of Main-memory Statistical Analytics. PVLDB, 7(12):1283--1294, 2014.

Digital Library

[108]

H. Zhang, J. Li, K. Kara, D. Alistarh, J. Liu, and C. Zhang. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning. In ICML, volume 70, pages 4035--4043, 2017.

Digital Library

[109]

M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized Stochastic Gradient Descent. In NIPS. 2010.

Digital Library

Cited By

Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639307
Liu CPavlenko AInterlandi MHaynes B(2023)A Deep Dive into Common Open Formats for Analytical DBMSsProceedings of the VLDB Endowment10.14778/3611479.361150716:11(3044-3056)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611507
Barbarioli BMersy GSintos SKrishnan S(2023)Hierarchical Residual Encoding for Multiresolution Time Series CompressionProceedings of the ACM on Management of Data10.1145/35889531:1(1-26)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588953
Show More Cited By

Recommendations

Generalized Linear Complementarity Problems

We introduce the concept of the generalized monotone linear complementarity problem GLCP in order to unify LP, convex QP, monotone LCP, and mixed monotone LCP. We establish the basic properties of GLCP and develop canonical forms for its representation. ...
Learning Generalized Linear Models Over Normalized Data
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Enterprise data analytics is a booming area in the data management industry. Many companies are racing to develop toolkits that closely integrate statistical and machine learning techniques with data management systems. Almost all such toolkits assume ...
Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions

We propose a general algorithm for solving an $n\times n$ nonsingular linear system $Ax = b$ based on iterative refinement with three precisions. The working precision is combined with possibly different precisions for solving for the correction term and for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 12, Issue 7

March 2019

112 pages

ISSN:2150-8097

Editors:
Lei Chen
HKUST
,
Fatma Özcan
IBM Research - Almaden

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 March 2019

Published in PVLDB Volume 12, Issue 7

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
264
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)8

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sirin UIdreos S(2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639307
Liu CPavlenko AInterlandi MHaynes B(2023)A Deep Dive into Common Open Formats for Analytical DBMSsProceedings of the VLDB Endowment10.14778/3611479.361150716:11(3044-3056)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611507
Barbarioli BMersy GSintos SKrishnan S(2023)Hierarchical Residual Encoding for Multiresolution Time Series CompressionProceedings of the ACM on Management of Data10.1145/35889531:1(1-26)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588953
Baunsgaard SBoehm M(2023)AWARE: Workload-aware, Redundancy-exploiting Linear AlgebraProceedings of the ACM on Management of Data10.1145/35886821:1(1-28)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588682
Huang HLi YSun JZhu XZhang JLuo LLi JWang Z(2023)P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.327925534:8(2311-2324)Online publication date: 1-Aug-2023
https://dl.acm.org/doi/10.1109/TPDS.2023.3279255
Shi RKara KHagleitner CDiamantopoulos DSyrivelis DAlonso G(2022)Exploiting HBM on FPGAs for Data ProcessingACM Transactions on Reconfigurable Technology and Systems10.1145/349123815:4(1-27)Online publication date: 9-Dec-2022
https://dl.acm.org/doi/10.1145/3491238
Kuchnik MAmvrosiadis GSmith V(2021)Progressive compressed recordsProceedings of the VLDB Endowment10.14778/3476249.347630814:11(2627-2641)Online publication date: 27-Oct-2021
https://dl.acm.org/doi/10.14778/3476249.3476308
Liu CJiang HPaparrizos JElmore A(2021)Decomposed bounded floats for fast compression and queriesProceedings of the VLDB Endowment10.14778/3476249.347630514:11(2586-2598)Online publication date: 27-Oct-2021
https://dl.acm.org/doi/10.14778/3476249.3476305
Wasay AChatterjee SIdreos SLi GLi ZIdreos SSrivastava D(2021)Deep Learning: Systems and ResponsibilityProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457541(2867-2875)Online publication date: 9-Jun-2021
https://dl.acm.org/doi/10.1145/3448016.3457541
Zhang TWang JCheng XXu HYu NHuang GZhang THe DLi FCao WHuang ZSun JNoh SWelch B(2020)FPGA-accelerated compactions for LSM-based key-value storeProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386713(225-238)Online publication date: 24-Feb-2020
https://dl.acm.org/doi/10.5555/3386691.3386713
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents