[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

A Look Behind the Curtain: Traffic Classification in an Increasingly Encrypted Web

Published: 22 February 2021 Publication History

Abstract

Traffic classification is essential in network management for operations ranging from capacity planning, performance monitoring, volumetry, and resource provisioning, to anomaly detection and security. Recently, it has become increasingly challenging with the widespread adoption of encryption in the Internet, e.g., as a de-facto in HTTP/2 and QUIC protocols. In the current state of encrypted traffic classification using Deep Learning (DL), we identify fundamental issues in the way it is typically approached. For instance, although complex DL models with millions of parameters are being used, these models implement a relatively simple logic based on certain header fields of the TLS handshake, limiting model robustness to future versions of encrypted protocols. Furthermore, encrypted traffic is often treated as any other raw input for DL, while crucial domain-specific considerations exist that are commonly ignored. In this paper, we design a novel feature engineering approach that generalizes well for encrypted web protocols, and develop a neural network architecture based on Stacked Long Short-Term Memory (LSTM) layers and Convolutional Neural Networks (CNN) that works very well with our feature design. We evaluate our approach on a real-world traffic dataset from a major ISP and Mobile Network Operator. We achieve an accuracy of 95% in service classification with less raw traffic and smaller number of parameters, out-performing a state-of-the-art method by nearly 50% fewer false classifications. We show that our DL model generalizes for different classification objectives and encrypted web protocols. We also evaluate our approach on a public QUIC dataset with finer and application-level granularity in labeling, achieving an overall accuracy of 99%.

References

[1]
Université Toulouse 1. 2020. Blacklists UT1. http://dsi.ut-capitole.fr/blacklists/index_en.php . [Online; Accessed 01-October-2020].
[2]
Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapé. 2018. Mobile encrypted traffic classification using deep learning. In IEEE Network Traffic Measurement and Analysis Conference (TMA). 1--8.
[3]
Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapé. 2019. Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges. IEEE Transactions on Network and Service Management, Vol. 16, 2 (2019), 445--458.
[4]
Riyad Alshammari and A Nur Zincir-Heywood. 2009. Machine learning based encrypted traffic classification: Identifying ssh and skype. In IEEE symposium on computational intelligence for security and defense applications. 1--8.
[5]
Blake Anderson and David McGrew. 2017. Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1723--1732.
[6]
Blake Anderson and David McGrew. 2020. Accurate TLS Fingerprinting using Destination Context and Knowledge Bases. arXiv preprint arXiv:2009.01939 (2020).
[7]
Blake Anderson, Subharthi Paul, and David McGrew. 2018. Deciphering malware's use of TLS (without decryption). Springer Journal of Computer Virology and Hacking Techniques, Vol. 14, 3 (2018), 195--211.
[8]
Mike Belshe and Roberto Peon. 2012. SPDY Protocol. Technical Report. Network Working Group. 1--51 pages. https://tools.ietf.org/pdf/draft-mbelshe-httpbis-spdy-00.pdf
[9]
Mike Belshe, Roberto Peon, and Martin Thomson. 2015. Hypertext Transfer Protocol Version 2 (HTTP/2). IETF RFC 7540. 1--96 pages.
[10]
Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine learning, Vol. 79, 1--2 (2010), 151--175.
[11]
Dario Bonfiglio, Marco Mellia, Michela Meo, Dario Rossi, and Paolo Tofanelli. 2007. Revealing skype traffic: when randomness plays with you. In ACM SIGCOMM Computer Communication Review, Vol. 37. 37--48.
[12]
Raouf Boutaba, Mohammad A Salahuddin, Noura Limam, Sara Ayoubi, Nashid Shahriar, Felipe Estrada-Solano, and Oscar M Caicedo. 2018. A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. Springer Journal of Internet Services and Applications, Vol. 9, 1 (2018), 16.
[13]
Pierre-Olivier Brissaud, Jérôme Franccc is, Isabelle Chrisment, Thibault Cholez, and Olivier Bettan. 2019. Transparent and Service-Agnostic Monitoring of Encrypted Web Traffic. IEEE Transactions on Network and Service Management, Vol. 16, 3 (2019), 842--856.
[14]
Francesco Bronzino, Paul Schmitt, Sara Ayoubi, Guilherme Martins, Renata Teixeira, and Nick Feamster. 2019. Inferring streaming video quality from encrypted traffic: Practical models and deployment experience. ACM on Measurement and Analysis of Computing Systems (SIGMETRICS), Vol. 3, 3 (2019), 1--25.
[15]
Zhiyong Bu, Bin Zhou, Pengyu Cheng, Kecheng Zhang, and Zhen-Hua Ling. 2020. Encrypted Network Traffic Classification Using Deep and Parallel Network-in-Network Models. IEEE Access, Vol. 8 (2020), 132950--132959.
[16]
Zhitang Chen, Ke He, Jian Li, and Yanhui Geng. 2017. Seq2img: A sequence-to-image based approach towards ip traffic classification using convolutional neural networks. In IEEE International Conference on Big Data (Big Data). 1271--1276.
[17]
Ramin Hasibi, Matin Shokri, and Mehdi Dehghan. 2019. Augmentation scheme for dealing with imbalanced network traffic classification using deep learning. arXiv preprint arXiv:1901.00204 (2019).
[18]
Jonas Höchst, Lars Baumg"artner, Matthias Hollick, and Bernd Freisleben. 2017. Unsupervised traffic flow classification using a neural autoencoder. In IEEE Conference on Local Computer Networks (LCN). 523--526.
[19]
Janardhan Iyengar and Ian Swett. 2015. QUIC: A UDP-Based Secure and Reliable Transport for HTTP/2. Technical Report. Network Working Group. 1--30 pages.
[20]
Jana Iyengar and Martin Thomson. 2018. QUIC: A UDP-based multiplexed and secure transport. Internet Engineering Task Force, Internet-Draft (2018).
[21]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[22]
Arash Habibi Lashkari, Gerard Draper-Gil, Mohammad Saiful Islam Mamun, and Ali A Ghorbani. 2017. Characterization of Tor Traffic using Time based Features. In International Conference on Information Systems Security and Privacy (ICISSP) . 253--262.
[23]
Chang Liu, Longtao He, Gang Xiong, Zigang Cao, and Zhen Li. 2019. Fs-net: A flow sequence network for encrypted traffic classification. In IEEE Conference on Computer Communications (INFOCOM). 1171--1179.
[24]
Xun Liu, Junling You, Yulei Wu, Tong Li, Liangxiong Li, Zheyuan Zhang, and Jingguo Ge. 2020. Attention-based bidirectional gru networks for efficient https traffic classification. Elsevier Information Sciences, Vol. 541 (2020), 297--315.
[25]
Manuel Lopez-Martin, Belen Carro, Antonio Sanchez-Esguevillas, and Jaime Lloret. 2017. Network traffic classifier with convolutional and recurrent neural networks for Internet of Things. IEEE Access, Vol. 5 (2017), 18042--18050.
[26]
Mohammad Lotfollahi, Mahdi Jafari Siavoshani, Ramin Shirali Hossein Zade, and Mohammdsadegh Saberian. 2020. Deep packet: A novel approach for encrypted traffic classification using deep learning. Springer Soft Computing, Vol. 24, 3 (2020), 1999--2012.
[27]
Jonathan Muehlstein, Yehonatan Zion, Maor Bahumi, Itay Kirshenboim, Ran Dubin, Amit Dvir, and Ofir Pele. 2017. Analyzing HTTPS encrypted traffic to identify user's operating system, browser and application. In 2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC). IEEE, 1--6.
[28]
Shahbaz Rezaei, Bryce Kroencke, and Xin Liu. 2019. Large-scale mobile app identification using deep learning. IEEE Access, Vol. 8 (2019), 348--362.
[29]
Shahbaz Rezaei and Xin Liu. 2018. How to achieve high classification accuracy with just a few labels: semi-supervised approach using sampled packets. arXiv preprint arXiv:1812.09761 (2018).
[30]
Vera Rimmer, Davy Preuveneers, Marc Juarez, Tom Van Goethem, and Wouter Joosen. 2017. Automated website fingerprinting through deep learning. arXiv preprint arXiv:1708.06376 (2017).
[31]
Roei Schuster, Vitaly Shmatikov, and Eran Tromer. 2017. Beauty and the burst: Remote identification of encrypted video streams. In USENIX Security Symposium (USENIX Security 17). 1357--1374.
[32]
Yan Shi, Dezhi Feng, and Subir Biswas. 2019. A Natural Language-Inspired Multi-label Video Streaming Traffic Classification Method Based on Deep Neural Networks. arXiv preprint arXiv:1906.02679 (2019).
[33]
Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A Ghorbani. 2012. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. computers & security, Vol. 31, 3 (2012), 357--374.
[34]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.
[35]
Petr Velan, Milan vC ermák, Pavel vC eleda, and Martin Dravs ar. 2015. A survey of methods for encrypted traffic classification and analysis. International Journal of Network Management, Vol. 25, 5 (2015), 355--374.
[36]
Ly Vu, Cong Thanh Bui, and Quang Uy Nguyen. 2017. A deep learning based method for handling imbalanced problem in network traffic classification. In International Symposium on Information and Communication Technology. 333--339.
[37]
Pan Wang, Shuhang Li, Feng Ye, Zixuan Wang, and Moxuan Zhang. 2020. PacketCGAN: Exploratory study of class imbalance for encrypted traffic classification using CGAN. In IEEE International Conference on Communications (ICC). 1--7.
[38]
Wei Wang, Yiqiang Sheng, Jinlin Wang, Xuewen Zeng, Xiaozhou Ye, Yongzhong Huang, and Ming Zhu. 2018. HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access, Vol. 6 (2018), 1792--1806.
[39]
Wei Wang, Ming Zhu, Jinlin Wang, Xuewen Zeng, and Zhongzhen Yang. 2017. End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In IEEE International Conference on Intelligence and Security Informatics (ISI). 43--48.
[40]
Nigel Williams, Sebastian Zander, and Grenville Armitage. 2006. A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. ACM SIGCOMM Computer Communication Review, Vol. 36, 5 (2006), 5--16.
[41]
Haipeng Yao, Pengcheng Gao, Jingjing Wang, Peiying Zhang, Chunxiao Jiang, and Zhu Han. 2019 a. Capsule network assisted IoT traffic classification mechanism for smart cities. IEEE Internet of Things Journal, Vol. 6, 5 (2019), 7515--7525.
[42]
Haipeng Yao, Chong Liu, Peiying Zhang, Sheng Wu, Chunxiao Jiang, and Shui Yu. 2019 b. Identification of Encrypted Traffic Through Attention Mechanism Based Long Short Term Memory. IEEE Transactions on Big Data (2019).
[43]
Zhuang Zou, Jingguo Ge, Hongbo Zheng, Yulei Wu, Chunjing Han, and Zhongjiang Yao. 2018. Encrypted traffic classification with a convolutional long short-term memory neural network. In IEEE International Conference on High Performance Computing and Communications; IEEE International Conference on Smart City; IEEE International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 329--334.

Cited By

View all
  • (2025)A survey on encrypted network traffic: A comprehensive survey of identification/classification techniques, challenges, and future directionsComputer Networks10.1016/j.comnet.2024.110984257(110984)Online publication date: Feb-2025
  • (2025)Growth-adaptive distillation compressed fusion model for network traffic identification based on IoT cloud–edge collaborationAd Hoc Networks10.1016/j.adhoc.2024.103676167(103676)Online publication date: Feb-2025
  • (2024)Challenges and Advances in Analyzing TLS 1.3-Encrypted Traffic: A Comprehensive SurveyElectronics10.3390/electronics1320400013:20(4000)Online publication date: 11-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 5, Issue 1
POMACS
March 2021
252 pages
EISSN:2476-1249
DOI:10.1145/3452093
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2021
Published in POMACS Volume 5, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. encrypted traffic classification
  3. http/2
  4. quic
  5. tls

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)296
  • Downloads (Last 6 weeks)23
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A survey on encrypted network traffic: A comprehensive survey of identification/classification techniques, challenges, and future directionsComputer Networks10.1016/j.comnet.2024.110984257(110984)Online publication date: Feb-2025
  • (2025)Growth-adaptive distillation compressed fusion model for network traffic identification based on IoT cloud–edge collaborationAd Hoc Networks10.1016/j.adhoc.2024.103676167(103676)Online publication date: Feb-2025
  • (2024)Challenges and Advances in Analyzing TLS 1.3-Encrypted Traffic: A Comprehensive SurveyElectronics10.3390/electronics1320400013:20(4000)Online publication date: 11-Oct-2024
  • (2024)Towards Reusable Models in Traffic Classification2024 8th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA62044.2024.10559009(1-4)Online publication date: 21-May-2024
  • (2024)XAI for Interpretable Multimodal Architectures with Contextual Input in Mobile Network Traffic Classification2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619769(757-762)Online publication date: 3-Jun-2024
  • (2024)MFWDD: Model-based Feature Weight Drift Detection Showcased on TLS and QUIC Traffic2024 20th International Conference on Network and Service Management (CNSM)10.23919/CNSM62983.2024.10814630(1-5)Online publication date: 28-Oct-2024
  • (2024)Extending C2 Traffic Detection Methodologies: From TLS 1.2 to TLS 1.3-enabled MalwareProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3678921(181-196)Online publication date: 30-Sep-2024
  • (2024)A Machine Learning-Based Toolbox for P4 Programmable Data-PlanesIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340207421:4(4450-4465)Online publication date: 16-May-2024
  • (2024)ODE: An Online Data Selection Framework for Federated Learning With Limited StorageIEEE/ACM Transactions on Networking10.1109/TNET.2024.336553432:4(2794-2809)Online publication date: 26-Mar-2024
  • (2024)Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty QuantificationIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32441685:1(420-433)Online publication date: Jan-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media