More Web Proxy on the site http://driver.im/

research-article

A Look Behind the Curtain: Traffic Classification in an Increasingly Encrypted Web

Authors:

Mohammad A. Salahuddin,

Bertrand Mathieu,

Stephanie Moteau,

Stephane TuffinAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 5, Issue 1

Article No.: 4, Pages 1 - 26

https://doi.org/10.1145/3447382

Published: 22 February 2021 Publication History

Abstract

Traffic classification is essential in network management for operations ranging from capacity planning, performance monitoring, volumetry, and resource provisioning, to anomaly detection and security. Recently, it has become increasingly challenging with the widespread adoption of encryption in the Internet, e.g., as a de-facto in HTTP/2 and QUIC protocols. In the current state of encrypted traffic classification using Deep Learning (DL), we identify fundamental issues in the way it is typically approached. For instance, although complex DL models with millions of parameters are being used, these models implement a relatively simple logic based on certain header fields of the TLS handshake, limiting model robustness to future versions of encrypted protocols. Furthermore, encrypted traffic is often treated as any other raw input for DL, while crucial domain-specific considerations exist that are commonly ignored. In this paper, we design a novel feature engineering approach that generalizes well for encrypted web protocols, and develop a neural network architecture based on Stacked Long Short-Term Memory (LSTM) layers and Convolutional Neural Networks (CNN) that works very well with our feature design. We evaluate our approach on a real-world traffic dataset from a major ISP and Mobile Network Operator. We achieve an accuracy of 95% in service classification with less raw traffic and smaller number of parameters, out-performing a state-of-the-art method by nearly 50% fewer false classifications. We show that our DL model generalizes for different classification objectives and encrypted web protocols. We also evaluate our approach on a public QUIC dataset with finer and application-level granularity in labeling, achieving an overall accuracy of 99%.

References

[1]

Université Toulouse 1. 2020. Blacklists UT1. http://dsi.ut-capitole.fr/blacklists/index_en.php . [Online; Accessed 01-October-2020].

[2]

Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapé. 2018. Mobile encrypted traffic classification using deep learning. In IEEE Network Traffic Measurement and Analysis Conference (TMA). 1--8.

[3]

Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapé. 2019. Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges. IEEE Transactions on Network and Service Management, Vol. 16, 2 (2019), 445--458.

[4]

Riyad Alshammari and A Nur Zincir-Heywood. 2009. Machine learning based encrypted traffic classification: Identifying ssh and skype. In IEEE symposium on computational intelligence for security and defense applications. 1--8.

[5]

Blake Anderson and David McGrew. 2017. Machine learning for encrypted malware traffic classification: accounting for noisy labels and non-stationarity. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1723--1732.

Digital Library

[6]

Blake Anderson and David McGrew. 2020. Accurate TLS Fingerprinting using Destination Context and Knowledge Bases. arXiv preprint arXiv:2009.01939 (2020).

[7]

Blake Anderson, Subharthi Paul, and David McGrew. 2018. Deciphering malware's use of TLS (without decryption). Springer Journal of Computer Virology and Hacking Techniques, Vol. 14, 3 (2018), 195--211.

[8]

Mike Belshe and Roberto Peon. 2012. SPDY Protocol. Technical Report. Network Working Group. 1--51 pages. https://tools.ietf.org/pdf/draft-mbelshe-httpbis-spdy-00.pdf

[9]

Mike Belshe, Roberto Peon, and Martin Thomson. 2015. Hypertext Transfer Protocol Version 2 (HTTP/2). IETF RFC 7540. 1--96 pages.

[10]

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine learning, Vol. 79, 1--2 (2010), 151--175.

[11]

Dario Bonfiglio, Marco Mellia, Michela Meo, Dario Rossi, and Paolo Tofanelli. 2007. Revealing skype traffic: when randomness plays with you. In ACM SIGCOMM Computer Communication Review, Vol. 37. 37--48.

Digital Library

[12]

Raouf Boutaba, Mohammad A Salahuddin, Noura Limam, Sara Ayoubi, Nashid Shahriar, Felipe Estrada-Solano, and Oscar M Caicedo. 2018. A comprehensive survey on machine learning for networking: evolution, applications and research opportunities. Springer Journal of Internet Services and Applications, Vol. 9, 1 (2018), 16.

[13]

Pierre-Olivier Brissaud, Jérôme Franccc is, Isabelle Chrisment, Thibault Cholez, and Olivier Bettan. 2019. Transparent and Service-Agnostic Monitoring of Encrypted Web Traffic. IEEE Transactions on Network and Service Management, Vol. 16, 3 (2019), 842--856.

[14]

Francesco Bronzino, Paul Schmitt, Sara Ayoubi, Guilherme Martins, Renata Teixeira, and Nick Feamster. 2019. Inferring streaming video quality from encrypted traffic: Practical models and deployment experience. ACM on Measurement and Analysis of Computing Systems (SIGMETRICS), Vol. 3, 3 (2019), 1--25.

Digital Library

[15]

Zhiyong Bu, Bin Zhou, Pengyu Cheng, Kecheng Zhang, and Zhen-Hua Ling. 2020. Encrypted Network Traffic Classification Using Deep and Parallel Network-in-Network Models. IEEE Access, Vol. 8 (2020), 132950--132959.

[16]

Zhitang Chen, Ke He, Jian Li, and Yanhui Geng. 2017. Seq2img: A sequence-to-image based approach towards ip traffic classification using convolutional neural networks. In IEEE International Conference on Big Data (Big Data). 1271--1276.

[17]

Ramin Hasibi, Matin Shokri, and Mehdi Dehghan. 2019. Augmentation scheme for dealing with imbalanced network traffic classification using deep learning. arXiv preprint arXiv:1901.00204 (2019).

[18]

Jonas Höchst, Lars Baumg"artner, Matthias Hollick, and Bernd Freisleben. 2017. Unsupervised traffic flow classification using a neural autoencoder. In IEEE Conference on Local Computer Networks (LCN). 523--526.

[19]

Janardhan Iyengar and Ian Swett. 2015. QUIC: A UDP-Based Secure and Reliable Transport for HTTP/2. Technical Report. Network Working Group. 1--30 pages.

[20]

Jana Iyengar and Martin Thomson. 2018. QUIC: A UDP-based multiplexed and secure transport. Internet Engineering Task Force, Internet-Draft (2018).

[21]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[22]

Arash Habibi Lashkari, Gerard Draper-Gil, Mohammad Saiful Islam Mamun, and Ali A Ghorbani. 2017. Characterization of Tor Traffic using Time based Features. In International Conference on Information Systems Security and Privacy (ICISSP) . 253--262.

[23]

Chang Liu, Longtao He, Gang Xiong, Zigang Cao, and Zhen Li. 2019. Fs-net: A flow sequence network for encrypted traffic classification. In IEEE Conference on Computer Communications (INFOCOM). 1171--1179.

Digital Library

[24]

Xun Liu, Junling You, Yulei Wu, Tong Li, Liangxiong Li, Zheyuan Zhang, and Jingguo Ge. 2020. Attention-based bidirectional gru networks for efficient https traffic classification. Elsevier Information Sciences, Vol. 541 (2020), 297--315.

[25]

Manuel Lopez-Martin, Belen Carro, Antonio Sanchez-Esguevillas, and Jaime Lloret. 2017. Network traffic classifier with convolutional and recurrent neural networks for Internet of Things. IEEE Access, Vol. 5 (2017), 18042--18050.

[26]

Mohammad Lotfollahi, Mahdi Jafari Siavoshani, Ramin Shirali Hossein Zade, and Mohammdsadegh Saberian. 2020. Deep packet: A novel approach for encrypted traffic classification using deep learning. Springer Soft Computing, Vol. 24, 3 (2020), 1999--2012.

Digital Library

[27]

Jonathan Muehlstein, Yehonatan Zion, Maor Bahumi, Itay Kirshenboim, Ran Dubin, Amit Dvir, and Ofir Pele. 2017. Analyzing HTTPS encrypted traffic to identify user's operating system, browser and application. In 2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC). IEEE, 1--6.

Digital Library

[28]

Shahbaz Rezaei, Bryce Kroencke, and Xin Liu. 2019. Large-scale mobile app identification using deep learning. IEEE Access, Vol. 8 (2019), 348--362.

[29]

Shahbaz Rezaei and Xin Liu. 2018. How to achieve high classification accuracy with just a few labels: semi-supervised approach using sampled packets. arXiv preprint arXiv:1812.09761 (2018).

[30]

Vera Rimmer, Davy Preuveneers, Marc Juarez, Tom Van Goethem, and Wouter Joosen. 2017. Automated website fingerprinting through deep learning. arXiv preprint arXiv:1708.06376 (2017).

[31]

Roei Schuster, Vitaly Shmatikov, and Eran Tromer. 2017. Beauty and the burst: Remote identification of encrypted video streams. In USENIX Security Symposium (USENIX Security 17). 1357--1374.

[32]

Yan Shi, Dezhi Feng, and Subir Biswas. 2019. A Natural Language-Inspired Multi-label Video Streaming Traffic Classification Method Based on Deep Neural Networks. arXiv preprint arXiv:1906.02679 (2019).

[33]

Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A Ghorbani. 2012. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. computers & security, Vol. 31, 3 (2012), 357--374.

[34]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[35]

Petr Velan, Milan vC ermák, Pavel vC eleda, and Martin Dravs ar. 2015. A survey of methods for encrypted traffic classification and analysis. International Journal of Network Management, Vol. 25, 5 (2015), 355--374.

Digital Library

[36]

Ly Vu, Cong Thanh Bui, and Quang Uy Nguyen. 2017. A deep learning based method for handling imbalanced problem in network traffic classification. In International Symposium on Information and Communication Technology. 333--339.

Digital Library

[37]

Pan Wang, Shuhang Li, Feng Ye, Zixuan Wang, and Moxuan Zhang. 2020. PacketCGAN: Exploratory study of class imbalance for encrypted traffic classification using CGAN. In IEEE International Conference on Communications (ICC). 1--7.

[38]

Wei Wang, Yiqiang Sheng, Jinlin Wang, Xuewen Zeng, Xiaozhou Ye, Yongzhong Huang, and Ming Zhu. 2018. HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access, Vol. 6 (2018), 1792--1806.

[39]

Wei Wang, Ming Zhu, Jinlin Wang, Xuewen Zeng, and Zhongzhen Yang. 2017. End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In IEEE International Conference on Intelligence and Security Informatics (ISI). 43--48.

Digital Library

[40]

Nigel Williams, Sebastian Zander, and Grenville Armitage. 2006. A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. ACM SIGCOMM Computer Communication Review, Vol. 36, 5 (2006), 5--16.

Digital Library

[41]

Haipeng Yao, Pengcheng Gao, Jingjing Wang, Peiying Zhang, Chunxiao Jiang, and Zhu Han. 2019 a. Capsule network assisted IoT traffic classification mechanism for smart cities. IEEE Internet of Things Journal, Vol. 6, 5 (2019), 7515--7525.

[42]

Haipeng Yao, Chong Liu, Peiying Zhang, Sheng Wu, Chunxiao Jiang, and Shui Yu. 2019 b. Identification of Encrypted Traffic Through Attention Mechanism Based Long Short Term Memory. IEEE Transactions on Big Data (2019).

[43]

Zhuang Zou, Jingguo Ge, Hongbo Zheng, Yulei Wu, Chunjing Han, and Zhongjiang Yao. 2018. Encrypted traffic classification with a convolutional long short-term memory neural network. In IEEE International Conference on High Performance Computing and Communications; IEEE International Conference on Smart City; IEEE International Conference on Data Science and Systems (HPCC/SmartCity/DSS). 329--334.

Cited By

Sharma ALashkari A(2025)A survey on encrypted network traffic: A comprehensive survey of identification/classification techniques, challenges, and future directionsComputer Networks10.1016/j.comnet.2024.110984257(110984)Online publication date: Feb-2025
https://doi.org/10.1016/j.comnet.2024.110984
Yang YFan CChen SGao ZRui L(2025)Growth-adaptive distillation compressed fusion model for network traffic identification based on IoT cloud–edge collaborationAd Hoc Networks10.1016/j.adhoc.2024.103676167(103676)Online publication date: Feb-2025
https://doi.org/10.1016/j.adhoc.2024.103676
Zhou JFu WHu WSun ZHe TZhang Z(2024)Challenges and Advances in Analyzing TLS 1.3-Encrypted Traffic: A Comprehensive SurveyElectronics10.3390/electronics1320400013:20(4000)Online publication date: 11-Oct-2024
https://doi.org/10.3390/electronics13204000
Show More Cited By

Index Terms

A Look Behind the Curtain: Traffic Classification in an Increasingly Encrypted Web
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Networks
  1. Network performance evaluation
    1. Network measurement
  2. Network services
    1. Network management

Recommendations

Traffic classification in an increasingly encrypted web

Traffic classification is essential in network management for a wide range of operations. Recently, it has become increasingly challenging with the widespread adoption of encryption in the Internet, for example, as a de facto in HTTP/2 and QUIC ...
A Look Behind the Curtain: Traffic Classification in an Increasingly Encrypted Web
SIGMETRICS '21: Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems

Traffic classification is essential in network management for operations ranging from capacity planning, performance monitoring, volumetry, and resource provisioning, to anomaly detection and security. Recently, it has become increasingly challenging ...
A Look Behind the Curtain: Traffic Classification in an Increasingly Encrypted Web
SIGMETRICS '21

Traffic classification is essential in network management for operations ranging from capacity planning, performance monitoring, volumetry, and resource provisioning, to anomaly detection and security. Recently, it has become increasingly challenging ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 5, Issue 1

POMACS

March 2021

252 pages

EISSN:2476-1249

DOI:10.1145/3452093

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California
,
Zhi-Li Zhang
University of Minnesota

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 February 2021

Published in POMACS Volume 5, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
1,512
Total Downloads

Downloads (Last 12 months)296
Downloads (Last 6 weeks)23

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sharma ALashkari A(2025)A survey on encrypted network traffic: A comprehensive survey of identification/classification techniques, challenges, and future directionsComputer Networks10.1016/j.comnet.2024.110984257(110984)Online publication date: Feb-2025
https://doi.org/10.1016/j.comnet.2024.110984
Yang YFan CChen SGao ZRui L(2025)Growth-adaptive distillation compressed fusion model for network traffic identification based on IoT cloud–edge collaborationAd Hoc Networks10.1016/j.adhoc.2024.103676167(103676)Online publication date: Feb-2025
https://doi.org/10.1016/j.adhoc.2024.103676
Zhou JFu WHu WSun ZHe TZhang Z(2024)Challenges and Advances in Analyzing TLS 1.3-Encrypted Traffic: A Comprehensive SurveyElectronics10.3390/electronics1320400013:20(4000)Online publication date: 11-Oct-2024
https://doi.org/10.3390/electronics13204000
Luxemburk JHynek K(2024)Towards Reusable Models in Traffic Classification2024 8th Network Traffic Measurement and Analysis Conference (TMA)10.23919/TMA62044.2024.10559009(1-4)Online publication date: 21-May-2024
https://doi.org/10.23919/TMA62044.2024.10559009
Cerasuolo FGuarino ISpadari VAceto GPescapé A(2024)XAI for Interpretable Multimodal Architectures with Contextual Input in Mobile Network Traffic Classification2024 IFIP Networking Conference (IFIP Networking)10.23919/IFIPNetworking62109.2024.10619769(757-762)Online publication date: 3-Jun-2024
https://doi.org/10.23919/IFIPNetworking62109.2024.10619769
Jančička LSoukup DKoumar JNěmec FČejka T(2024)MFWDD: Model-based Feature Weight Drift Detection Showcased on TLS and QUIC Traffic2024 20th International Conference on Network and Service Management (CNSM)10.23919/CNSM62983.2024.10814630(1-5)Online publication date: 28-Oct-2024
https://doi.org/10.23919/CNSM62983.2024.10814630
Barradas DNovo CPortela BRomeiro SSantos N(2024)Extending C2 Traffic Detection Methodologies: From TLS 1.2 to TLS 1.3-enabled MalwareProceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses10.1145/3678890.3678921(181-196)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3678890.3678921
Zhang KSamaan NKarmouch A(2024)A Machine Learning-Based Toolbox for P4 Programmable Data-PlanesIEEE Transactions on Network and Service Management10.1109/TNSM.2024.340207421:4(4450-4465)Online publication date: 16-May-2024
https://dl.acm.org/doi/10.1109/TNSM.2024.3402074
Gong CZheng ZShao YLi BWu FChen G(2024)ODE: An Online Data Selection Framework for Federated Learning With Limited StorageIEEE/ACM Transactions on Networking10.1109/TNET.2024.336553432:4(2794-2809)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1109/TNET.2024.3365534
Jorgensen SHolodnak JDempsey Jde Souza KRaghunath ARivet VDeMoes NAlejos AWollaber A(2024)Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty QuantificationIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32441685:1(420-433)Online publication date: Jan-2024
https://doi.org/10.1109/TAI.2023.3244168
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents