More Web Proxy on the site http://driver.im/

Article

Language identification of encrypted VoIP traffic: Alejandra y Roberto or Alice and Bob?

Authors:

Charles V. Wright,

Fabian Monrose,

Gerald M. MassonAuthors Info & Claims

SS'07: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium

Article No.: 4, Pages 1 - 12

Published: 06 August 2007 Publication History

Abstract

Voice over IP (VoIP) has become a popular protocol for making phone calls over the Internet. Due to the potential transit of sensitive conversations over untrusted network infrastructure, it is well understood that the contents of a VoIP session should be encrypted. However, we demonstrate that current cryptographic techniques do not provide adequate protection when the underlying audio is encoded using bandwidth-saving Variable Bit Rate (VBR) coders. Explicitly, we use the length of encrypted VoIP packets to tackle the challenging task of identifying the language of the conversation. Our empirical analysis of 2,066 native speakers of 21 different languages shows that a substantial amount of information can be discerned from encrypted VoIP traffic. For instance, our 21-way classifier achieves 66% accuracy, almost a 14-fold improvement over random guessing. For 14 of the 21 languages, the accuracy is greater than 90%. We achieve an overall binary classification (e.g., "Is this a Spanish or English conversation?") rate of 86.6%. Our analysis highlights what we believe to be interesting new privacy issues in VoIP.

References

[1]

{1} NIST language recognition evaluation. http://www.nist. gov/speech/tests/lang/index.htm.

[2]

{2} S. Andersen, A. Duric, H. Astrom, R. Hagen, W. Kleijn, and J. Linden. Internet Low Bit Rate Codec (iLBC), 2004. RFC 3951.

[3]

{3} R. Barbieri, D. Bruschi, and E. Rosti. Voice over IPsec: Analysis and solutions. In Proceedings of the 18th Annual Computer Security Applications Conference, pages 261-270, December 2002.

Digital Library

[4]

{4} M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman. The secure real-time transport protocol (SRTP). RFC 3711.

Digital Library

[5]

{5} F. Beritelli. High quality multi-rate CELP speech coding for wireless ATM networks. In Proceedings of the 1998 Global Telecommunications Conference, volume 3, pages 1350-1355, November 1998.

[6]

{6} P. Biondi and F. Desclaux. Silver needle in the Skype. In BlackHat Europe, 2006. http://www. blackhat.com/presentations/bh-europe-06/ bh-eu-06-biondi/bh-e%u-06-biondi-up.pdf.

[7]

{7} M. Blaze. Protocol failure in the escrowed encryption standard. In Proceedings of Second ACM Conference on Computer and Communications Security, pages 59-67, 1994.

Digital Library

[8]

{8} L. Burget, P. Matejka, and J. Cernocky. Discriminative training techniques for acoustic language identification. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages I-209-I-212, May 2006.

[9]

{9} R. Dingledine, N. Mathewson, and P. Syverson. Tor: The second-generation onion router. In Proceedings of the 13th USENIX Security Symposium, pages 303-320, August 2004.

Digital Library

[10]

{10} International Telecommunications Union. Recommendation G.711: Pulse code modulation (PCM) of voice frequencies, 1988.

[11]

{11} International Telecommunications Union. Recommendation P.1010: Fundamental voice transmission objectives for VoIP terminals and gateways, 2004.

[12]

{12} International Telecommunications Union. Recommendation G.729: Coding of speech at 8 kbits using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP), 2007.

[13]

{13} W. Jiang and H. Schulzrinne. Modeling of packet loss and delay and their effect on real-time multimedia service quality. In Proceedings of the 10th International Workshop on Network and Operating System Support for Digital Audio and Video, June 2000.

[14]

{14} D. R. Kuhn, T. J. Walsh, and S. Fries. Security considerations for voice over IP systems. Technical Report Special Publication 008-58, NIST, January 2005.

Digital Library

[15]

{15} T. Lander, R. A. Cole, B. T. Oshika, and M. Noel. The OGI 22 language telephone speech corpus. In EUROSPEECH, pages 817-820, 1995.

[16]

{16} S. McClellan and J. D. Gibson. Variable-rate CELP based on subband flatness. IEEE Transactions on Speech and Audio Processing , 5(2):120-130, March 1997.

[17]

{17} S. Morlat. Linphone, an open-source SIP video phone for Linux and Windows. http://www.linphone.org/.

[18]

{18} Y. K. Muthusamy, E. Barnard, and R. A. Cole. Reviewing automatic language identification. IEEE Signal Processing Magazine, 11(4):33-41, October 1994.

[19]

{19} J. Navrátil. Spoken language recognition--a step toward multilinguality in speechprocessing. IEEE Transactions on Speech and Audio Processing, 9(6):678-685, September 2001.

[20]

{20} E. Paksoy, A. McCree, and V. Viswanathan. A variable rate multimodal speech coder with gain-matched analysis-by-synthesis. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages 751-754, April 1997.

Digital Library

[21]

{21} N. Provos. Voice over misconfigured internet telephones. http: //vomit.xtdnet.nl.

[22]

{22} L. Rabiner and B. Juang. Fundamentals of Speech Recognition. Prentice Hall, 1993.

Digital Library

[23]

{23} J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler. SIP: Session initiation protocol. RFC 3261.

Digital Library

[24]

{24} M. R. Schroeder and B. S. Atal. Code-excited linear prediction (CELP): High-quality speech at very low bit rates. In Proceedings of the 1985 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 10, pages 937-940, April 1985.

[25]

{25} H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A transport protocol for real-time applications. RFC 1889.

[26]

{26} D. Song, D. Wagner, and X. Tian. Timing analysis of keystrokes and SSH timing attacks. In Proceedings of the 10th USENIX Security Symposium, August 2001.

Digital Library

[27]

{27} Q. Sun, D. R. Simon, Y.-M. Wang, W. Russell, V. N. Padmanabhan, and L. Qiu. Statistical identification of encrypted web browsing traffic. In Proceedings of the IEEE Symposium on Security and Privacy, pages 19-30, May 2002.

Digital Library

[28]

{28} J.-M. Valin. The Speex codec manual. http://www.speex. org/docs/manual/speex-manual.pdf, August 2006.

[29]

{29} J.-M. Valin and C. Montgomery. Improved noise weighting in CELP coding of speech - applying the Vorbis psychoacoustic model to Speex. In Audio Engineering Society Convention, May 2006. See also http://www.speex.org.

[30]

{30} S. V. Vaseghi. Finite state CELP for variable rate speech coding. IEE Proceedings I Communications, Speech and Vision, 138(6):603-610, December 1991.

[31]

{31} X. Wang, S. Chen, and S. Jajodia. Tracking anonymous peer-to-peer VoIP calls on the Internet. In Proceedings of the 12th ACM conference on Computer and communications security, pages 81-91, November 2005.

Digital Library

[32]

{32} C. White, I. Shafran, and J.-L. Gauvain. Discriminative classifiers for language recognition. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing , volume 1, pages I-213-I-216, May 2006.

[33]

{33} E. Wong, T. Martin, T. Svendsen, and S. Sridharan. Multilingual phone clustering for recognition of spontaneous indonesian speech utilising pronunciation modelling techniques. In Proceedings of the 8th European Conference on Speech Communication and Technology, pages 3133-3136, September 2003.

[34]

{34} C. V. Wright, F. Monrose, and G. M. Masson. On inferring application protocol behaviors in encrypted network traffic. Journal of Machine Learning Research, 7:2745-2769, December 2006. Special Topic on Machine Learning for Computer Security.

Digital Library

[35]

{35} L. Zhang, T. Wang, and V. Cuperman. A CELP variable rate speech codec with low average rate. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages 735-738, April 1997.

Digital Library

[36]

{36} Y. Zhang and V. Paxson. Detecting stepping stones. In Proceedings of the 9th USENIX Security Symposium, pages 171-184, August 2000.

Digital Library

[37]

{37} P. Zimmermann, A. Johnston, and J. Callas. ZRTP: Extensions to RTP for Diffie-Hellman key agreement for SRTP, March 2006. IETF Internet Draft.

[38]

{38} M. A. Zissman. Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), January 1996.

Cited By

Hilgefort JArp DRieck KCarlini NDemontis AChen Y(2021)Spying through Virtual Backgrounds of Video CallsProceedings of the 14th ACM Workshop on Artificial Intelligence and Security10.1145/3474369.3486870(135-144)Online publication date: 15-Nov-2021
https://dl.acm.org/doi/10.1145/3474369.3486870
Theodorou TMporas IPotamitis IFakotakis N(2018)Data-Driven Audio Feature Selection for Audio Quality Recognition in Broadcast NewsProceedings of the 10th Hellenic Conference on Artificial Intelligence10.1145/3200947.3201035(1-6)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.1145/3200947.3201035
(2017)BibliographyFrontiers of Multimedia Research10.1145/3122865.3122878(315-377)Online publication date: 19-Dec-2017
https://dl.acm.org/doi/10.1145/3122865.3122878
Show More Cited By

Index Terms

Language identification of encrypted VoIP traffic: Alejandra y Roberto or Alice and Bob?

Recommendations

QoS-aware path switching for VoIP traffic using SCTP

Voice over Internet protocol (VoIP) has been a prevalent multimedia service nowadays. It allows us to transmit voice data over IP networks. However, quality of service (QoS) is a major challenge to VoIP services. It must provide similar quality to ...
Identification of VoIP encrypted traffic using a machine learning approach

We investigate the performance of three different machine learning algorithms, namely C5.0, AdaBoost and Genetic programming (GP), to generate robust classifiers for identifying VoIP encrypted traffic. To this end, a novel approach (Alshammari and ...
An empirical approach towards characterization of encrypted and unencrypted VoIP traffic
Abstract
VoIP traffic classification plays a major role towards network policy enforcements. Characterization of VoIP media traffic is based on codec behaviour. With the introduction of variable bit rate codecs, coding, compression and encryption present ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

SS'07: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium

August 2007

351 pages

ISBN:1113335555779

Editor:
Niels Provos
Google Inc.

Publisher

USENIX Association

United States

Publication History

Published: 06 August 2007

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
11
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hilgefort JArp DRieck KCarlini NDemontis AChen Y(2021)Spying through Virtual Backgrounds of Video CallsProceedings of the 14th ACM Workshop on Artificial Intelligence and Security10.1145/3474369.3486870(135-144)Online publication date: 15-Nov-2021
https://dl.acm.org/doi/10.1145/3474369.3486870
Theodorou TMporas IPotamitis IFakotakis N(2018)Data-Driven Audio Feature Selection for Audio Quality Recognition in Broadcast NewsProceedings of the 10th Hellenic Conference on Artificial Intelligence10.1145/3200947.3201035(1-6)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.1145/3200947.3201035
(2017)BibliographyFrontiers of Multimedia Research10.1145/3122865.3122878(315-377)Online publication date: 19-Dec-2017
https://dl.acm.org/doi/10.1145/3122865.3122878
Chen KCai WShea RHuang CLiu JLeung VHsu C(2017)Cloud gamingFrontiers of Multimedia Research10.1145/3122865.3122877(287-314)Online publication date: 19-Dec-2017
https://dl.acm.org/doi/10.1145/3122865.3122877
Hsu CHong HElgamal TNahrstedt KVenkatasubramanian N(2017)Multimedia fog computingFrontiers of Multimedia Research10.1145/3122865.3122876(255-286)Online publication date: 19-Dec-2017
https://dl.acm.org/doi/10.1145/3122865.3122876
Ramanathan SGilani SSebe N(2017)Utilizing implicit user cues for multimedia analyticsFrontiers of Multimedia Research10.1145/3122865.3122875(219-251)Online publication date: 19-Dec-2017
https://dl.acm.org/doi/10.1145/3122865.3122875
Rizoiu MLee YMishra SXie L(2017)Hawkes processes for events in social mediaFrontiers of Multimedia Research10.1145/3122865.3122874(191-218)Online publication date: 19-Dec-2017
https://dl.acm.org/doi/10.1145/3122865.3122874
Singh V(2017)Situation recognition using multimodal dataFrontiers of Multimedia Research10.1145/3122865.3122873(159-189)Online publication date: 19-Dec-2017
https://dl.acm.org/doi/10.1145/3122865.3122873
Cui P(2017)Social-sensed multimedia computingFrontiers of Multimedia Research10.1145/3122865.3122872(137-157)Online publication date: 19-Dec-2017
https://dl.acm.org/doi/10.1145/3122865.3122872
Jeǵou H(2017)Efficient similarity searchFrontiers of Multimedia Research10.1145/3122865.3122871(105-134)Online publication date: 19-Dec-2017
https://dl.acm.org/doi/10.1145/3122865.3122871
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents