Abstract
The anonymous communication technology has brought new challenges to traffic analysis since it creates a private network pathway. Clustering analysis has been proved to be efficient in grouping Internet traffic. However, the cluster number of traditional clustering algorithms must be pointed, like K-means. In this paper, the gravitation is introduced into the process of clustering in order to develop an improved Tor anonymous traffic identifier called gravitational clustering algorithm (GCA). In the proposed method, we consider each sample in the dataset as an object in the feature space, and the new object moves into the corresponding cluster according to gravitational force and similarity. The GCA was applied to a data set consisting of 2366 Tor network flows and 20926 other network flows. Simulation test evaluated and compared the performance of the proposed classifier with three state-of-the-art clustering algorithms. The tests yielded that the average accuracy rate, R and FM coefficient of the proposed GCA algorithm exceed 0.8. However, among the other three clustering algorithms, K-means can achieve the highest detection rate (0.5).
Similar content being viewed by others
References
Kido H, Yanagisawa Y, Satoh T (2005) An anonymous communication technique using dummies for location-based services. In: Proceedings of the International Society for Magnetic Resonance in Medicine on Pervasive Services, ICPS’05, pp 88–97
Dingledine R, Mathewson N, Syverson P (2004) The second-generation onion router[R]. Naval Research Lab, Washington, DC
Danezis G, Diaz C (2008) A survey of anonymous communication channels[R]. Technical Report MSRTR- 2008-35, Microsoft Research
Sherry J, Lan C, Popa R (2015) Blindbox: Deep packet inspection over encrypted traffic. In: Proceedings of ACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015, pp 213–226
Teuton J, Peterson E, Nordwall D et al (2013) LINEBACkER: Bio-inspired data reduction toward real time network traffic analysis. In: Proceedings of 2013 6th International Symposium on IEEE, pp 170–174
Ranjan S, Robinson J, Chen F Machine learning based botnet detection using real-time connectivity graph based traffic features. U.S. Patent 8,762,298[P]. 2014-6-24
Münz G, Li S, Carle G (2007) Traffic anomaly detection using k-means clustering. In: Proceedings of GI/ITG Workshop MMBnet. pp 13–14
De Oña J, López G, Mujalli R et al (2013) Analysis of traffic accidents on rural highways using Latent Class Clustering and Bayesian Networks. Accid Anal Prev 51:1–10
Fahad A, Alshatri N, Tari Z et al (2014) A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279
Finch H (2005) Comparison of distance measures in cluster analysis with dichotomous data. J Data Sci 3(1):85–100
Nath G, Kumar V, Reddy KS (2016) Scalable self-conscious spectral clustering. IJRCCT 5(8):387–393
Said AB, Hadjidj R, Foufou S (2015) Cluster validity index based on jeffrey divergence. Pattern Analysis and Applications, pp 1–11
Bauer KS, Sherr M, Grunwald D (2011) ExperimenTor: A Testbed for Safe and Realistic Tor Experimentation. CSET
Jiangtao L, Yongling J (2005) Survey of P2P traffic identification and engineering technology. Telecommun Sci 3(017):57–61
Peter S, Westhoff D, Castelluccia C (2010) A survey on the encryption of convergecast traffic with in-network processing. IEEE Trans Dependable Secure Comput 7(1):20–34
Möller U, Cottrell L, Palfrader P, Sassaman L (2003) Mixmaster protocolłversion 2. Draft
Tor metrics portal[EB/OL]. https://metrics.torproject.org/:TheTorProject, 2013
Wang X, Shi J, Fang B et al (2013) An empirical analysis of family in the Tor network. In: Proceedings of the 2013 IEEE International Conference on Communications (ICC Conference on SIGCOMM]//Communications (ICC). IEEE, pp 1995–2000
Zhou Y, Yang Q, Yang B, Wu Z (2014) A Tor Anonymous Communication System with Security. Enhancements[J]. J Comput Res Developement 51(7):1538–1546
Feng X, Tianbo L, Puxin Y, Wanjiang H, Xiaomeng Z, Hongyu Y (2014) Designs of routerupdate and SOCKS proxy for Tor anonymous communication system[J]. WIT Trans Eng Sci 92:21–29
John H, Amir H Bypassing Chinese Censorship without Proxies Using Cached Content
L7filter, application layer packet classifier for linux. http://l7-filter.sourceforge.net/
Opendpi. http://code.google.com/p/opendpi/
Donato W, Dainotti A et al (2014) Traffic identification engine: an open platform for traffic classification. IEEE Netw 28(2):56 C64
Arndt D. Calculating flow statistics using netmate. https://dan.arndt.ca/nims/calculating-flow-statistics-using-netmate/
Sebastian Zander NW netai - network traffic based application identification. http://caia.swin.edu.au/urp/dstc/netai/
Crotti M, Dusi M, Gringoli F, Salgarelli L (2007) Traffic classification through simple statistical fingerprinting. SIGCOMM Comput Commun Rev 37(1):5C16. doi:10.1145/1198255.1198257
Jamil HA, Zarei R, Fadlelssied NO, Aliyu M, Nor SM, Marsono MN Analysis of features selection for p2p traffic detection using support vector machine. In: Proceedings of the 2013 International Conference of Information and Communication Technology(ICoICT), IEEE, 2013; 116C121
Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Netw 18(1):223C 239
Zander S, Nguyen T, Armitage G Automated traffic classification and application identification using machine learning. 2005. 30th Anniversary. The IEEE Conference on Local Computer Networks, IEEE, 2005; 250C257
Xu K, Zhang M, Ye M, Qin Z, Westberg L, Westholmb T (2009) Ntrs:Afsm-based traffic identification system
Karagiannis T, Papagiannaki K, Faloutsos M (2005) Blinc: multilevel traffic classification in the dark. ACM SIGCOMM Comput Commun Rev 35:229C240. ACM
Hu Y, Chiu DM, Lui JC (2009) Profiling and identification of p2p traffic. Comput Netw 53(6):849C863. doi:10.1016/j.comnet.2008.11.005. http://www.sciencedirect.com/science/article/pii/S1389128608003848, traffic Classification and Its Applications to Modern Networks
Yan J, Fan X (2013) Hfbp: Identifying p2p traffic by host level and flow level behavior profiles. J Netw 8(8):1866C1873
He GF, Yang M, Luo JZ, Zhang L (2014) Online identification of Tor anonymous communication traffic. J Softw 24(3):540C546. doi:10.3724/SP.J.1001.2013.04253
Alaeddin A, Ali H, Jalal A (2015) A model for detecting tor encrypted traffic using supervised machine learning[J]. I J Comput Netw Inf Secur 7:10–23
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: A survey. ACM Comput Surv (CSUR) 41(3):15
Arndt DJ, Zincir-Heywood AN (2011) A comparison of three machine learning techniques for encrypted network traffic analysis. In: Proceedings of 2011 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), pp 107C114
Wang Y, Zhang Z, Guo L, Li S (2011) Using entropy to classify traffic more deeply. In: Proceedings of 2011 6th IEEE International Conference on Networking, Architecture and Storage (NAS), pp 45C52
Silveira F, Diot C, Taft N, Govindan R (2010) ASTUTE: Detecting a different class of traffic anomalies. In: Proceedings of the ACM SIGCOMM 2010 Conference on SIGCOMM, pp 267–278
Bauer K S, Sherr M, Grunwald D (2011) ExperimenTor: A Testbed for Safe and Realistic Tor Experimentation[C]. CSET
Kanda Y, Fukuda K, Sugawara T (2010) A flow analysis for mining traffic anoMalies. In: Proceedings of the IEEE International Conference on Communications, pp 23–27
Barker J, Hannay P, Szewczyk P (2011) Using traffic analysis to identify the second generation onion router. In: Proceedings IFIP 9th International Conference on Embedded and Ubiquitous Computing (EUC 2011), pp 72–78
Winter P, Lindskog S (2012) How the Great Firewall of China is Blocking Tor. In: Proceedings of 2nd USENIX Workshop on Free and Open Communications on the Internet, pp 1
Krawczyk H (2003) SIGMA: The SIGn-and-MAc’approach to authenticated Diffie-Hellman and its use in the IKE protocols. In: Proceedings of 23rd Annual International Cryptology Conference, pp. 400–425. Deri L and PF RING M http://www.ntop.Org[J].PFRING.html
Deri L (2011) PF RING M http://www.ntop.Org[J].PFRING.html
Wang J, Liu A, Yan T, Zeng Z A Resource Allocation Model Based on Double-sided Combinational Auctions for Transparent Computing, Peer-to-Peer Networking and Applications. Applications. doi:10.1007/s12083-017-0556-6
Liu Y, Liu A, Li Y, Li Z, Choi Y-J, Sekiya H, Li J (2017) APMD: A fast data transmission protocol with reliability guarantee for pervasive sensing data communication. Pervasive and Mobile Computing. doi:10.1016/j.pmcj.2017.03.012
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61572115), the Key Basic Research of Sichuan Province (Grant No. 2016JY0007).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rao, Z., Niu, W., Zhang, X. et al. Tor anonymous traffic identification based on gravitational clustering. Peer-to-Peer Netw. Appl. 11, 592–601 (2018). https://doi.org/10.1007/s12083-017-0566-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-017-0566-4