[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Accelerating generalized linear models with MLWeaving: a one-size-fits-all system for any-precision learning

Published: 01 March 2019 Publication History

Abstract

Learning from the data stored in a database is an important function increasingly available in relational engines. Methods using lower precision input data are of special interest given their overall higher efficiency. However, in databases, these methods have a hidden cost: the quantization of the real value into a smaller number is an expensive step. To address this issue, we present ML-Weaving, a data structure and hardware acceleration technique intended to speed up learning of generalized linear models over low precision data. MLWeaving provides a compact in-memory representation that enables the retrieval of data at any level of precision. MLWeaving also provides a highly efficient implementation of stochastic gradient descent on FPGAs and enables the dynamic tuning of precision, instead of using a fixed precision level during learning. Experimental results show that MLWeaving converges up to 16 x faster than low-precision implementations of first-order methods on CPUs.

References

[1]
S. Aga, S. Jeloka, A. Subramaniyan, S. Narayanasamy, D. Blaauw, and R. Das. Compute Caches. In HPCA, 2017.
[2]
V. Akhlaghi, A. Yazdanbakhsh, K. Samadi, H. Esmaeilzadeh, and R. Gupta. SnaPEA: Predictive Early Activation for Reducing Computation in Deep Convolutional Neural Networks. In ISCA, 2018.
[3]
J. Albericio, A. Delmás, P. Judd, S. Sharify, G. O'Leary, R. Genov, and A. Moshovos. Bit-pragmatic Deep Neural Network Computing. In MICRO, 2017.
[4]
J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger, and A. Moshovos. Cnvlutin: Ineffectual-neuron-free Deep Neural Network Computing. In ISCA, 2016.
[5]
A. Boutros, S. Yazdanshenas, and V. Betz. Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs. In FPL, 2018.
[6]
S. Cadambi, I. Durdanovic, V. Jakkula, M. Sankaradass, E. Cosatto, S. Chakradhar, and H. P. Graf. A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines. In FCCM, 2009.
[7]
R. Cai, A. Ren, N. Liu, C. Ding, L. Wang, X. Qian, M. Pedram, and Y. Wang. VIBNN: Hardware Acceleration of Bayesian Neural Networks. In ASPLOS, 2018.
[8]
A. M. Caulfield, E. S. Chung, A. Putnam, H. Angepat, J. Fowers, M. Haselman, S. Heil, M. Humphrey, P. Kaur, J. Kim, D. Lo, T. Massengill, K. Ovtcharov, M. Papamichael, L. Woods, S. Lanka, D. Chiou, and D. Burger. A Cloud-Scale Acceleration Architecture. In MICRO, 2016.
[9]
C. Chang and C. Lin. LIBSVM: A Library for Support Vector Machines. TIST, 2(3):27:1--27:27, 2011.
[10]
Y. Chen, J. Emer, and V. Sze. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. In ISCA, 2016.
[11]
Y. Chen, T. Krishna, J. S. Emer, and V. Sze. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. JSSC, 2017.
[12]
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. DaDianNao: A Machine-Learning Supercomputer. In MICRO, 2014.
[13]
Z. Chen, J. Gehrke, and F. Korn. Query Optimization in Compressed Database Systems. In SIGMOD, 2001.
[14]
G. R. Chiu, A. C. Ling, D. Capalija, A. Bitar, and M. S. Abdelfattah. Flexibility: FPGAs and CAD in Deep Learning Acceleration. In ISPD, 2018.
[15]
E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, M. Abeydeera, L. Adams, H. Angepat, C. Boehn, D. Chiou, O. Firestein, A. Forin, K. S. Gatlin, M. Ghandi, S. Heil, K. Holohan, A. E. Husseini, T. Juhasz, K. Kagi, R. Kovvuri, S. Lanka, F. van Megen, D. Mukhortov, P. Patel, B. Perez, A. Rapsang, S. Reinhardt, B. Rouhani, A. Sapek, R. Seera, S. Shekar, B. Sridharan, G. Weisz, L. Woods, P. Y. Xiao, D. Zhang, R. Zhao, and D. Burger. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE Micro, 38(2):8--20, 2018.
[16]
M. Courbariaux, Y. Bengio, and J. David. Low Precision Arithmetic For Deep Learning. CoRR, abs/1412.7024, 2014.
[17]
C. De Sa, M. Feldman, C. Ré, and K. Olukotun. Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent. In ISCA, 2017.
[18]
C. De Sa, M. Leszczynski, J. Zhang, A. Marzoev, C. R. Aberger, K. Olukotun, and C. Ré. High-Accuracy Low-Precision Training. ArXiv, 2018.
[19]
A. Delmas, S. Sharify, P. Judd, and A. Moshovos. Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability. CoRR, abs/1707.09068, 2017.
[20]
Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam. ShiDianNao: Shifting Vision Processing Closer to the Sensor. In ISCA, 2015.
[21]
C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. Iyer, D. Sylvester, D. Blaauw, and R. Das. Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks. In ISCA, 2018.
[22]
A. Elgohary, M. Boehm, P. J. Haas, F. R. Reiss, and B. Reinwald. Compressed Linear Algebra for Large-scale Machine Learning. PVLDB, 9(12):960--971, 2016.
[23]
R. Espasa, F. Ardanaz, J. Emer, S. Felix, J. Gago, R. Gramunt, I. Hernandez, T. Juan, G. Lowney, M. Mattina, and A. Seznec. Tarantula: a vector extension to the alpha architecture. In ISCA, 2002.
[24]
F. Farber, N. May, W. Lehner, I. Muller, H. Rauhe, J. Dees, and S. Ag. The SAP HANA Database: An Architecture Overview. In IEEE Data Eng. Bull., 2012.
[25]
Z. Feng, E. Lo, B. Kao, and W. Xu. ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout. In SIGMOD, 2015.
[26]
J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi, S. Heil, P. Patel, A. Sapek, G. Weisz, L. Woods, S. Lanka, S. K. Reinhardt, A. M. Caulfield, E. S. Chung, and D. Burger. A Configurable Cloud-Scale DNN Processor for Real-Time AI. In ISCA, 2018.
[27]
M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden. HYRISE: A Main Memory Hybrid Storage Engine. PVLDB, 4(2):105--116, 2010.
[28]
P. Gupta. Accelerating Datacenter Workloads. FPL, 2016.
[29]
S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. Deep Learning with Limited Numerical Precision. In ICML, 2015.
[30]
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In ISCA, 2016.
[31]
S. Han, H. Mao, and W. J. Dally. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In ICLR, 2015.
[32]
J. M. Hellerstein, C. Ré, F. Schoppmann, D. Z. Wang, E. Fratkin, A. Gorajek, K. S. Ng, C. Welton, X. Feng, K. Li, and A. Kumar. The MADlib Analytics Library: Or MAD Skills, the SQL. PVLDB, 5(12):1700--1711, 2012.
[33]
W. Hillis. The Connection Machine. MIT Press, 1986.
[34]
A. L. Holloway, V. Raman, G. Swart, and D. J. DeWitt. How to Barter Bits for Chronons: Compression and Bandwidth Trade Offs for Database Scans. In SIGMOD, 2007.
[35]
S. Idreos, F. Groffen, N. Nes, S. Manegold, S. Mullender, and M. Kersten. MonetDB: Two Decades of Research in Column-oriented Database. IEEE Data Engineering Bulletin, 2012.
[36]
Z. Istvan, D. Sidler, and G. Alonso. Runtime Parameterizable Regular Expression Operators for Databases. In FCCM, 2016.
[37]
Z. Istvan, L. Woods, and G. Alonso. Histograms As a Side Effect of Data Movement for Big Data. In SIGMOD, 2014.
[38]
A. Jain, A. Phanishayee, J. Mars, L. Tang, and G. Pekhimenko. Gist: Efficient Data Encoding for Deep Neural Network Training. In ISCA, 2018.
[39]
Y. Ji, Y. Zhang, W. Chen, and Y. Xie. Bridge the Gap Between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler. In ASPLOS, 2018.
[40]
R. Johnson and I. Pandis. The Bionic DBMS is Coming, but What Will It Look Like? In CIDR, 2013.
[41]
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. In-Datacenter Performance Analysis of a Tensor Processing Unit. In ISCA, 2017.
[42]
P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, N. E. Jerger, and A. Moshovos. Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks. In ICS, 2016.
[43]
P. Judd, J. Albericio, T. Hetherington, T. M. Aamodt, and A. Moshovos. Stripes: Bit-serial deep neural network computing. In MICRO, 2016.
[44]
T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An Efficient k-Means Clustering Algorithm: Analysis and Implementation. TPAMI, 24:881--892, 2002.
[45]
K. Kara, D. Alistarh, G. Alonso, O. Mutlu, and C. Zhang. FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off. In FCCM, 2017.
[46]
K. Kara, K. Eguro, C. Zhang, and G. Alonso. ColumnML: Column-Store Machine Learning with On-the-Fly Data Transformation. PVLDB, 12(4):348--361, 2018.
[47]
K. Kara, J. Giceva, and G. Alonso. Fpga-Based Data Partitioning. In SIGMOD, 2017.
[48]
V. Karakasis, T. Gkountouvas, K. Kourtis, G. Goumas, and N. Koziris. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. TPDS, 24(10):1930--1940, 2013.
[49]
J. Kim, M. Sullivan, E. Choukse, and M. Erez. Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures. In ISCA, 2016.
[50]
U. Köster, T. Webb, X. Wang, M. Nassar, A. K. Bansal, W. Constable, O. Elibol, S. Gray, S. Hall, L. Hornof, A. Khosrowshahi, C. Kloss, R. J. Pai, and N. Rao. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In NIPS. 2017.
[51]
C. Kozyrakis and D. Patterson. Vector vs. Superscalar and VLIW architectures For Embedded Multimedia Benchmarks. In MICRO, 2002.
[52]
C. E. Kozyrakis and D. A. Patterson. Scalable, Vector Processors For Embedded Systems. IEEE Micro, 2003.
[53]
A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS. 2012.
[54]
H. Kwon, A. Samajdar, and T. Krishna. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. In ASPLOS, 2018.
[55]
L. Lamport. Multiple Byte Processing with Full-word Instructions. Commun. ACM, 18(8):471--475, 1975.
[56]
D. Lee and S. Sebastian. Algorithms for Non-negative Matrix Factorization. In NIPS. 2001.
[57]
M. Li, T. Zhang, Y. Chen, and A. J. Smola. Efficient Mini-batch Training for Stochastic Optimization. In SIGKDD, 2014.
[58]
S. Li, C. Xu, Q. Zou, J. Zhao, Y. Lu, and Y. Xie. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories. In DAC, 2016.
[59]
Y. Li and J. M. Patel. BitWeaving: Fast Scans for Main Memory Data Processing. In SIGMOD, 2013.
[60]
Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. PVLDB, 7(10):907--918, 2014.
[61]
Z. Li, C. Ding, S. Wang, W. Wen, Y. Zhuo, C. Liu, Q. Qiu, W. Xu, X. Lin, X. Qian, and Y. Wang. E-RNN: Design Optimization for Efficient Recurrent Neural Networksin FPGAs. In HPCA, 2019.
[62]
D. Liu, T. Chen, S. Liu, J. Zhou, S. Zhou, O. Teman, X. Feng, X. Zhou, and Y. Chen. PuDianNao: A Polyvalent Machine Learning Accelerator. In ASPLOS, 2015.
[63]
Y. Liu, H. Zhang, L. Zeng, W. Wu, and C. Zhang. MLbench: Benchmarking Machine Learning Services Against Human Experts. PVLDB, 11(10):1220--1232, 2018.
[64]
D. Mahajan, J. K. Kim, J. Sacks, A. Ardalan, A. Kumar, and H. Esmaeilzadeh. In-RDBMS Hardware Acceleration of Advanced Analytics. PVLDB, 11(11):1317--1331, 2018.
[65]
D. Mahajan, J. Park, E. Amaro, H. Sharma, A. Yazdanbakhsh, J. K. Kim, and H. Esmaeilzadeh. TABLA: A Unified Template-based Framework For Accelerating Statistical Machine Learning. In HPCA, 2016.
[66]
B. Moons and M. Verhelst. A 0.3 ×2013;2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets. In VLSI-Circuits, 2016.
[67]
T. Moreau, M. Wyse, J. Nelson, A. Sampson, H. Esmaeilzadeh, L. Ceze, and M. Oskin. SNNAP: Approximate Computing on Programmable SoCs via Neural Acceleration. In HPCA, 2015.
[68]
R. Mueller, J. Teubner, and G. Alonso. Data Processing on FPGAs. PVLDB, 2(1):910--921, 2009.
[69]
R. Mueller, J. Teubner, and G. Alonso. Streams on Wires: A Query Compiler for FPGAs. PVLDB, 2(1):229--240, 2009.
[70]
R. Mueller, J. Teubner, and G. Alonso. Glacier: A Query-to-hardware Compiler. In SIGMOD, 2010.
[71]
F. Niu, B. Recht, C. Re, and S. Wright. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. In NIPS. 2011.
[72]
P. O'Neil and D. Quass. Improved Query Performance with Variant Indexes. In SIGMOD, pages 38--49, 1997.
[73]
M. Owaida, D. Sidler, K. Kara, and G. Alonso. Centaur: A Framework for Hybrid CPU-FPGA Databases. In FCCM, 2017.
[74]
G. Pekhimnko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. Linearly Compressed Pages: A Low-Complexity, Low-Latency Main Memory Compression Framework. In MICRO, 2013.
[75]
O. Polychroniou and K. A. Ross. Efficient Lightweight Compression Alongside Fast Scans. In DaMoN, 2015.
[76]
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-Time Query Processing. In ICDE, 2008.
[77]
B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. Hernández-Lobato, G.-Y. Wei, and D. Brooks. Minerva: Enabling Low-power, Highly-accurate Deep Neural Network Accelerators. In ISCA, 2016.
[78]
R. Rifkin and A. Klautau. In Defense of One-Vs-All Classification. In JMLR. 2004.
[79]
D. Rinfret, P. O'Neil, and E. O'Neil. Bit-Sliced Index Arithmetic. In SIGMOD, 2001.
[80]
R. M. Russell. The CRAY-1 Computer System. Commun. ACM, 21(1):63--72, 1978.
[81]
B. Schlegel, R. Gemulla, and W. Lehner. Fast Integer Compression Using SIMD Instructions. In DaMoN, 2010.
[82]
V. Seshadri, K. Hsieh, A. Boroum, D. Lee, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Fast Bulk Bitwise AND and OR in DRAM. IEEE CAL, 2015.
[83]
V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry. Ambit: In-memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. In MICRO, 2017.
[84]
H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh. From High-Level Deep Neural Models to FPGAs. In MICRO, 2016.
[85]
H. Sharma, J. Park, N. Suda, L. Lai, B. Chau, J. Kim, V. Chandra, and H. Esmaeilzadeh. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks. In ISCA, 2018.
[86]
D. Sidler, Z. István, M. Owaida, and G. Alonso. Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures. In SIGMOD, 2017.
[87]
D. Sidler, Z. István, M. Owaida, K. Kara, and G. Alonso. doppioDB: A Hardware Accelerated Database. In SIGMOD, 2017.
[88]
A. Sinha and A. P. Chandrakasan. Energy Efficient Filtering Using Adaptive Precision and Variable Voltage. In IEEE International ASIC/SOC Conference, 1999.
[89]
V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proceedings of the IEEE, 2017.
[90]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception Architecture For Computer Vision. In CVPR. 2016.
[91]
J. Teubner and L. Woods. Data Processing on FPGAs Synthesis Lectures on Data Management. 2013.
[92]
Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. Finn: A framework For Fast, Scalable Binarized Neural Network Inference. In FPGA, 2017.
[93]
Y. Umuroglu, L. Rasnayake, and M. Sjlander. BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing. In FPL, 2018.
[94]
J. Wang and G. Joshi. Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD. CoRR, abs/1810.08313, 2018.
[95]
Z. Wang, B. He, and W. Zhang. A Study of Data Partitioning on OpenCL-based FPGAs. In FPL, 2015.
[96]
Z. Wang, K. Kara, H. Zhang, G. Alonso, O. Mutlu, and C. Zhang. Accelerating Generalized Linear Models with MLWeaving:A One-Size-Fits-All System for Any-Precision Learning (Technical Report). CoRR, 2019.
[97]
Z. Wang, J. Paul, H. Y. Cheah, B. He, and W. Zhang. Relational Query Processing on OpenCL-based FPGAs. In FPL, 2016.
[98]
Z. Wang, J. Paul, B. He, and W. Zhang. Multikernel Data Partitioning With Channel on OpenCL-Based FPGAs. TVLSI, 25(6):1906--1918, 2017.
[99]
Z. Wang, K. Zhang, H. Zhou, X. Liu, and B. He. Hebe: An Order-Oblivious and High-Performance Execution Scheme for Conjunctive Predicates. In ICDE, 2018.
[100]
K. Q. Weinberger, A. Dasgupta, J. Attenberg, J. Langford, and A. J. Smola. Feature Hashing for Large Scale Multitask Learning. CoRR, 2009.
[101]
T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The Implementation and Performance of Compressed Databases. SIGMOD Rec., 29(3):55--67, 2000.
[102]
S. White. Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review. IEEE ASSP Magazine, 6(3):4--19, 1989.
[103]
T. Willhalm, R. Dementiev, and P. Fay. Intel Performance Counter Monitor - A better way to measure CPU utilization, https://software.intel.com/en-us/articles/intel-performance-counter-monitor, 2016.
[104]
L. Woods, Z. István, and G. Alonso. Ibex: An Intelligent Storage Engine with Support for Advanced SQL Offloading. VLDB, 7(11):963--974, 2014.
[105]
T. Xanthopoulos and A. Chandrakasan. A Low-power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization. In Symposium on VLSI Circuits, 1999.
[106]
J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, and S. Mahlke. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In ISCA, 2017.
[107]
C. Zhang and C. Ré. DimmWitted: A Study of Main-memory Statistical Analytics. PVLDB, 7(12):1283--1294, 2014.
[108]
H. Zhang, J. Li, K. Kara, D. Alistarh, J. Liu, and C. Zhang. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning. In ICML, volume 70, pages 4035--4043, 2017.
[109]
M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized Stochastic Gradient Descent. In NIPS. 2010.

Cited By

View all
  • (2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
  • (2023)A Deep Dive into Common Open Formats for Analytical DBMSsProceedings of the VLDB Endowment10.14778/3611479.361150716:11(3044-3056)Online publication date: 24-Aug-2023
  • (2023)Hierarchical Residual Encoding for Multiresolution Time Series CompressionProceedings of the ACM on Management of Data10.1145/35889531:1(1-26)Online publication date: 30-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 7
March 2019
112 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 March 2019
Published in PVLDB Volume 12, Issue 7

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)8
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)The Image Calculator: 10x Faster Image-AI Inference by Replacing JPEG with Self-designing Storage FormatProceedings of the ACM on Management of Data10.1145/36393072:1(1-31)Online publication date: 26-Mar-2024
  • (2023)A Deep Dive into Common Open Formats for Analytical DBMSsProceedings of the VLDB Endowment10.14778/3611479.361150716:11(3044-3056)Online publication date: 24-Aug-2023
  • (2023)Hierarchical Residual Encoding for Multiresolution Time Series CompressionProceedings of the ACM on Management of Data10.1145/35889531:1(1-26)Online publication date: 30-May-2023
  • (2023)AWARE: Workload-aware, Redundancy-exploiting Linear AlgebraProceedings of the ACM on Management of Data10.1145/35886821:1(1-28)Online publication date: 30-May-2023
  • (2023)P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.327925534:8(2311-2324)Online publication date: 1-Aug-2023
  • (2022)Exploiting HBM on FPGAs for Data ProcessingACM Transactions on Reconfigurable Technology and Systems10.1145/349123815:4(1-27)Online publication date: 9-Dec-2022
  • (2021)Progressive compressed recordsProceedings of the VLDB Endowment10.14778/3476249.347630814:11(2627-2641)Online publication date: 27-Oct-2021
  • (2021)Decomposed bounded floats for fast compression and queriesProceedings of the VLDB Endowment10.14778/3476249.347630514:11(2586-2598)Online publication date: 27-Oct-2021
  • (2021)Deep Learning: Systems and ResponsibilityProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457541(2867-2875)Online publication date: 9-Jun-2021
  • (2020)FPGA-accelerated compactions for LSM-based key-value storeProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386713(225-238)Online publication date: 24-Feb-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media