[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1972457.1972459acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnsdiConference Proceedingsconference-collections
Article

SSLShader: cheap SSL acceleration with commodity processors

Published: 30 March 2011 Publication History

Abstract

Secure end-to-end communication is becoming increasingly important as more private and sensitive data is transferred on the Internet. Unfortunately, today's SSL deployment is largely limited to security or privacy-critical domains. The low adoption rate is mainly attributed to the heavy cryptographic computation overhead on the server side, and the cost of good privacy on the Internet is tightly bound to expensive hardware SSL accelerators in practice.
In this paper we present high-performance SSL acceleration using commodity processors. First, we show that modern graphics processing units (GPUs) can be easily converted to general-purpose SSL accelerators. By exploiting the massive computing parallelism of GPUs, we accelerate SSL cryptographic operations beyond what state-of-the-art CPUs provide. Second, we build a transparent SSL proxy, SSLShader, that carefully leverages the trade-offs of recent hardware features such as AESNI and NUMA and achieves both high throughput and low latency. In our evaluation, the GPU implementation of RSA shows a factor of 22.6 to 31.7 improvement over the fastest CPU implementation. SSLShader achieves 29K transactions per second for small files while it transfers large files at 13 Gbps on a commodity server machine. These numbers are comparable to high-end commercial SSL appliances at a fraction of their price.

References

[1]
ab - Apache HTTP Server Benchmarking Tool. http://httpd.apache.org/docs/2.2/en/programs/ab.html.
[2]
Alexa Top 500 Global Sites. http://www.alexa.com/topsites.
[3]
Application Delivery Controllers, Array Networks. http://www.arraynetworks.net/?pageid=365.
[4]
Content Services Switches, Cisco. http://www.cisco.com/web/go/css11500.
[5]
Digital Signature Standard. http://csrc.nist.gov/fips.
[6]
F5 BIG-IP SSL Accelerator. http://www.f5.com/products/big-ip/feature-modules/ssl-acceleration.html.
[7]
Intel Advanced Encryption Standard Instructions (AESNI). http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/.
[8]
Intel Integrated Performance Primitives. http://software.intel.com/en-us/intel-ipp/.
[9]
nFast Series, Thales. http://iss.thalesgroup.com/Products/.
[10]
NITROX security processor, Cavium Networks. http://www.caviumnetworks.com/processor_security_nitrox-III.html.
[11]
OpenSSL Engine. http://www.openssl.org/docs/crypto/engine.html.
[12]
Researchers crack 768-bit RSA. http://www.bit-tech.net/news/bits/2010/01/13/researchers-crack-768-bit-rsa/1.
[13]
ServerIron ADX Series, Brocade. http://www.brocade.com/products-solutions/products/application-delivery/serveriron-adx-series/index.page.
[14]
Silicom Protocol Processor Adapter. http://www.silicom-usa.com/default.asp?contentID=676.
[15]
SSL Acceleration Cards, CAI Networks. http://cainetworks.com/products/ssl/rsa7000.htm.
[16]
The AMD Fusion Family of APUs. http://sites.amd.com/us/fusion/APU/Pages/fusion.aspx.
[17]
Security Architecture for the Internet Protocol. RFC 4301, 2005.
[18]
Netcraft SSL Survey. http://news.netcraft.com/SSL-survey, 2009.
[19]
Netcraft Web Server Survey. http://news.netcraft.com/archives/2010/04/15/april_2010_web_server_survey.html, 2009.
[20]
NVIDIA's Next Generation CUDA™ Compute Architecture: Fermi™. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf, 2009.
[21]
S. Agarwal, V. N. Padmanabhan, and D. A. Joseph. Addressing email loss with suremail: Measurement, design, and evaluation. In USENIX ATC, 2007.
[22]
G. Apostolopoulos, V. Peris, and D. Saha. Transport Layer Security: How much does it really cost? In IEEE Infocom, 1999.
[23]
A. Badam, K. Park, V. Pai, and L. Peterson. Hashcache: Cache storage for the next billion. In NSDI, 2009.
[24]
A. Bittau, M. Hamburg, M. Handley, D. Mazières, and D. Boneh. The case for ubiquitous transport-level encryption. In USENIX Security Symposium, 2010.
[25]
D. Boneh, H. Shacham, and E. Rescrola. Client side caching for TLS. In Network and Distributed System Security Symposium (NDSS), 2002.
[26]
J. Bos and M. Coster. Addition chain heuristics. In Advances in Cryptology (CRYPTO), 1989.
[27]
Ç. K. Koç. High-speed RSA implementation. Technical Report, 1994.
[28]
Ç. K. Koç. Analysis of sliding window techniques for exponentiation. Computer and Mathematics with Applications, 30(10):17-24, 1995.
[29]
C. Coarfa, P. Druschel, and D. S. Wallach. Performance Analysis of TLS Web Servers. In Network and Distributed System Security Symposium (NDSS), 2002.
[30]
D. L. Cook, J. Ioannidis, A. D. Keromytis1, and J. Luck. CryptoGraphics: Secret Key Cryptography Using Graphics Cards. In RSA Conference, Cryptographers Track (CT-RSA), 2005.
[31]
N. Costigan and M. Scott. Accelerating SSL using the Vector processors in IBMs Cell Broadband Engine for Sonys Playstation 3. In Cryptology ePrint Archive, Report, 2007.
[32]
J. Daemen and V. Rijmen. AES Proposal: Rijndael. http://csrc.nist.gov/archive/aes/rijndael/Rijndael-ammended.pdf, 1999.
[33]
W. Diffie and M. Hellman. New directions in cryptography. IEEE Transactions on Information Theory, 22(6):644-654, 1976.
[34]
S. Dussé and B. Kaliski. A cryptographic library for the Motorola DSP56000. In Advances in Cryptology--EUROCRYPT 1990.
[35]
S. Han, K. Jang, K. Park, and S. Moon. Packetshader: a gpu-accelerated software router. In ACM SIGCOMM, 2010.
[36]
O. Harrison and J. Waldron. Practical Symmetric Key Cryptography on Modern Graphics Hardware. In USENIX Security Symposium, 2008.
[37]
O. Harrison and J. Waldron. Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware. In International Conference on Cryptology in Africa, 2009.
[38]
J. Jonsson and B. Kaliski. Public-key cryptography standards (PKCS) #1: RSA cryptography specifications version 2.1, 2003.
[39]
E. Kasper and P. Schwabe. Faster and timing-attack resistant aes-gcm. In Cryptographic Hardware and Embedded Systems (CHES). 2009.
[40]
S. Kawamura, M. Koike, F. Sano, and A. Shimbo. Cox-rower architecture for fast parallel montgomery multiplication. In Advances in Cryptology-- EUROCRYPT 2000, pages 523-538. Springer, 2000.
[41]
D. E. Knuth. The Art of Computer Programming, volume 2. Addison-Wesley, 3th edition, 1997.
[42]
N. Koblitz. Elliptic curve cryptosystems. Mathematics of computation, 48(177):203-209, 1987.
[43]
M. E. Kounavis, X. Kang, K. Grewal, M. Eszenyi, S. Gueron, and D. Durham. Encrypting the internet. SIGCOMM Comput. Commun. Rev., 40(4):135-146, 2010.
[44]
S. A. Manavski. CUDA compatible gpu as an efficient hardware accelerator for aes cryptography.
[45]
P. Montgomery. Modular multiplication without trial division. Mathematics of Computation, 44(170):519-521, 1985.
[46]
National Institute of Standards and Technology (NIST). Recommendation for Key Management Part 1: General (Revised). 2007.
[47]
NVIDIA Corp. NVIDIA CUDA: Best Practices Guide, Version 3.1. 2010.
[48]
NVIDIA Corp. NVIDIA CUDA: Programming Guide, Version 3.1. 2010.
[49]
D. A. Osvik, J. W. Bos, D. Stefan, and D. Canright. Fast software aes encryption. In Foundations of Software Engineering (FSE), 2010.
[50]
H. Park, K. Park, and Y. Cho. Analysis of the variable length nonzero window method for exponentiation. Computers & Mathematics with Applications, 37(7):21-29, 1999.
[51]
J.-J. Quisquater and C. Couvreur. Fast decipherment algorithm for RSA public-key cryptosystem. Electronics Letters, 18(21):905-907, 1982.
[52]
E. Rescorla, A. Cain, and B. Korver. SSLACC: A Clustered SSL Accelerator. In USENIX Security Symposium, 2002.
[53]
R. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2):120-126, 1978.
[54]
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, P. Dubey, S. Junkins, A. Lake, J. Sugerman, R. Cavin, R. Espasa, E. Grochowski, T. Juan, and P. Hanrahan. Larrabee: a many-core x86 architecture for visual computing. ACM Transactions on Graphics (TOG), 27(3):1-15, 2008.
[55]
H. Shacham and D. Boneh. Improving SSL Handshake Performance via Batching. In RSA Conference, 2001.
[56]
R. Szerwinski and T. Gneysu. Exploiting the Power of GPUs for Asymmetric Cryptography. In International Workshop on Cryptographic Hardware and Embedded Systems, 2008.
[57]
S. B. Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Corey: An operating system for many cores. In USENIX OSDI, 2008.
[58]
J. Yang and J. Goodman. Symmetric Key Cryptography on Modern Graphics Hardware. In ASIACRYPT, 2007.

Cited By

View all
  • (2022)EC-ECC: Accelerating Elliptic Curve Cryptography for Edge Computing on Embedded GPU TX2ACM Transactions on Embedded Computing Systems10.1145/349273421:2(1-25)Online publication date: 8-Feb-2022
  • (2020)Paving the Way for NFV AccelerationACM Computing Surveys10.1145/339702253:4(1-42)Online publication date: 20-Aug-2020
  • (2019)Side-channel Timing Attack of RSA on a GPUACM Transactions on Architecture and Code Optimization10.1145/334172916:3(1-18)Online publication date: 13-Aug-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
NSDI'11: Proceedings of the 8th USENIX conference on Networked systems design and implementation
March 2011
27 pages

Sponsors

  • VMware
  • NSF: National Science Foundation
  • Google Inc.
  • Infosys
  • USENIX Assoc: USENIX Assoc

In-Cooperation

Publisher

USENIX Association

United States

Publication History

Published: 30 March 2011

Check for updates

Qualifiers

  • Article

Conference

NSDI '11
Sponsor:
  • NSF
  • USENIX Assoc

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)EC-ECC: Accelerating Elliptic Curve Cryptography for Edge Computing on Embedded GPU TX2ACM Transactions on Embedded Computing Systems10.1145/349273421:2(1-25)Online publication date: 8-Feb-2022
  • (2020)Paving the Way for NFV AccelerationACM Computing Surveys10.1145/339702253:4(1-42)Online publication date: 20-Aug-2020
  • (2019)Side-channel Timing Attack of RSA on a GPUACM Transactions on Architecture and Code Optimization10.1145/334172916:3(1-18)Online publication date: 13-Aug-2019
  • (2019)QTLSProceedings of the 24th Symposium on Principles and Practice of Parallel Programming10.1145/3293883.3295705(158-172)Online publication date: 16-Feb-2019
  • (2018)G-netProceedings of the 15th USENIX Conference on Networked Systems Design and Implementation10.5555/3307441.3307458(187-200)Online publication date: 9-Apr-2018
  • (2018)Data Placement Optimization in GPU Memory Hierarchy using Predictive ModelingProceedings of the Workshop on Memory Centric High Performance Computing10.1145/3286475.3286482(45-49)Online publication date: 11-Nov-2018
  • (2017)mOSProceedings of the 14th USENIX Conference on Networked Systems Design and Implementation10.5555/3154630.3154640(113-129)Online publication date: 27-Mar-2017
  • (2017)APUNetProceedings of the 14th USENIX Conference on Networked Systems Design and Implementation10.5555/3154630.3154638(83-96)Online publication date: 27-Mar-2017
  • (2017)CatalystACM SIGPLAN Notices10.1145/3140607.305076052:7(44-59)Online publication date: 8-Apr-2017
  • (2017)Towards a Scalable Modular QUIC ServerProceedings of the Workshop on Kernel-Bypass Networks10.1145/3098583.3098587(19-24)Online publication date: 9-Aug-2017
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media