[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1362903.1362907guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Language identification of encrypted VoIP traffic: Alejandra y Roberto or Alice and Bob?

Published: 06 August 2007 Publication History

Abstract

Voice over IP (VoIP) has become a popular protocol for making phone calls over the Internet. Due to the potential transit of sensitive conversations over untrusted network infrastructure, it is well understood that the contents of a VoIP session should be encrypted. However, we demonstrate that current cryptographic techniques do not provide adequate protection when the underlying audio is encoded using bandwidth-saving Variable Bit Rate (VBR) coders. Explicitly, we use the length of encrypted VoIP packets to tackle the challenging task of identifying the language of the conversation. Our empirical analysis of 2,066 native speakers of 21 different languages shows that a substantial amount of information can be discerned from encrypted VoIP traffic. For instance, our 21-way classifier achieves 66% accuracy, almost a 14-fold improvement over random guessing. For 14 of the 21 languages, the accuracy is greater than 90%. We achieve an overall binary classification (e.g., "Is this a Spanish or English conversation?") rate of 86.6%. Our analysis highlights what we believe to be interesting new privacy issues in VoIP.

References

[1]
{1} NIST language recognition evaluation. http://www.nist. gov/speech/tests/lang/index.htm.
[2]
{2} S. Andersen, A. Duric, H. Astrom, R. Hagen, W. Kleijn, and J. Linden. Internet Low Bit Rate Codec (iLBC), 2004. RFC 3951.
[3]
{3} R. Barbieri, D. Bruschi, and E. Rosti. Voice over IPsec: Analysis and solutions. In Proceedings of the 18th Annual Computer Security Applications Conference, pages 261-270, December 2002.
[4]
{4} M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman. The secure real-time transport protocol (SRTP). RFC 3711.
[5]
{5} F. Beritelli. High quality multi-rate CELP speech coding for wireless ATM networks. In Proceedings of the 1998 Global Telecommunications Conference, volume 3, pages 1350-1355, November 1998.
[6]
{6} P. Biondi and F. Desclaux. Silver needle in the Skype. In BlackHat Europe, 2006. http://www. blackhat.com/presentations/bh-europe-06/ bh-eu-06-biondi/bh-e%u-06-biondi-up.pdf.
[7]
{7} M. Blaze. Protocol failure in the escrowed encryption standard. In Proceedings of Second ACM Conference on Computer and Communications Security, pages 59-67, 1994.
[8]
{8} L. Burget, P. Matejka, and J. Cernocky. Discriminative training techniques for acoustic language identification. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages I-209-I-212, May 2006.
[9]
{9} R. Dingledine, N. Mathewson, and P. Syverson. Tor: The second-generation onion router. In Proceedings of the 13th USENIX Security Symposium, pages 303-320, August 2004.
[10]
{10} International Telecommunications Union. Recommendation G.711: Pulse code modulation (PCM) of voice frequencies, 1988.
[11]
{11} International Telecommunications Union. Recommendation P.1010: Fundamental voice transmission objectives for VoIP terminals and gateways, 2004.
[12]
{12} International Telecommunications Union. Recommendation G.729: Coding of speech at 8 kbits using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP), 2007.
[13]
{13} W. Jiang and H. Schulzrinne. Modeling of packet loss and delay and their effect on real-time multimedia service quality. In Proceedings of the 10th International Workshop on Network and Operating System Support for Digital Audio and Video, June 2000.
[14]
{14} D. R. Kuhn, T. J. Walsh, and S. Fries. Security considerations for voice over IP systems. Technical Report Special Publication 008-58, NIST, January 2005.
[15]
{15} T. Lander, R. A. Cole, B. T. Oshika, and M. Noel. The OGI 22 language telephone speech corpus. In EUROSPEECH, pages 817-820, 1995.
[16]
{16} S. McClellan and J. D. Gibson. Variable-rate CELP based on subband flatness. IEEE Transactions on Speech and Audio Processing , 5(2):120-130, March 1997.
[17]
{17} S. Morlat. Linphone, an open-source SIP video phone for Linux and Windows. http://www.linphone.org/.
[18]
{18} Y. K. Muthusamy, E. Barnard, and R. A. Cole. Reviewing automatic language identification. IEEE Signal Processing Magazine, 11(4):33-41, October 1994.
[19]
{19} J. Navrátil. Spoken language recognition--a step toward multilinguality in speechprocessing. IEEE Transactions on Speech and Audio Processing, 9(6):678-685, September 2001.
[20]
{20} E. Paksoy, A. McCree, and V. Viswanathan. A variable rate multimodal speech coder with gain-matched analysis-by-synthesis. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages 751-754, April 1997.
[21]
{21} N. Provos. Voice over misconfigured internet telephones. http: //vomit.xtdnet.nl.
[22]
{22} L. Rabiner and B. Juang. Fundamentals of Speech Recognition. Prentice Hall, 1993.
[23]
{23} J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler. SIP: Session initiation protocol. RFC 3261.
[24]
{24} M. R. Schroeder and B. S. Atal. Code-excited linear prediction (CELP): High-quality speech at very low bit rates. In Proceedings of the 1985 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 10, pages 937-940, April 1985.
[25]
{25} H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A transport protocol for real-time applications. RFC 1889.
[26]
{26} D. Song, D. Wagner, and X. Tian. Timing analysis of keystrokes and SSH timing attacks. In Proceedings of the 10th USENIX Security Symposium, August 2001.
[27]
{27} Q. Sun, D. R. Simon, Y.-M. Wang, W. Russell, V. N. Padmanabhan, and L. Qiu. Statistical identification of encrypted web browsing traffic. In Proceedings of the IEEE Symposium on Security and Privacy, pages 19-30, May 2002.
[28]
{28} J.-M. Valin. The Speex codec manual. http://www.speex. org/docs/manual/speex-manual.pdf, August 2006.
[29]
{29} J.-M. Valin and C. Montgomery. Improved noise weighting in CELP coding of speech - applying the Vorbis psychoacoustic model to Speex. In Audio Engineering Society Convention, May 2006. See also http://www.speex.org.
[30]
{30} S. V. Vaseghi. Finite state CELP for variable rate speech coding. IEE Proceedings I Communications, Speech and Vision, 138(6):603-610, December 1991.
[31]
{31} X. Wang, S. Chen, and S. Jajodia. Tracking anonymous peer-to-peer VoIP calls on the Internet. In Proceedings of the 12th ACM conference on Computer and communications security, pages 81-91, November 2005.
[32]
{32} C. White, I. Shafran, and J.-L. Gauvain. Discriminative classifiers for language recognition. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing , volume 1, pages I-213-I-216, May 2006.
[33]
{33} E. Wong, T. Martin, T. Svendsen, and S. Sridharan. Multilingual phone clustering for recognition of spontaneous indonesian speech utilising pronunciation modelling techniques. In Proceedings of the 8th European Conference on Speech Communication and Technology, pages 3133-3136, September 2003.
[34]
{34} C. V. Wright, F. Monrose, and G. M. Masson. On inferring application protocol behaviors in encrypted network traffic. Journal of Machine Learning Research, 7:2745-2769, December 2006. Special Topic on Machine Learning for Computer Security.
[35]
{35} L. Zhang, T. Wang, and V. Cuperman. A CELP variable rate speech codec with low average rate. In Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages 735-738, April 1997.
[36]
{36} Y. Zhang and V. Paxson. Detecting stepping stones. In Proceedings of the 9th USENIX Security Symposium, pages 171-184, August 2000.
[37]
{37} P. Zimmermann, A. Johnston, and J. Callas. ZRTP: Extensions to RTP for Diffie-Hellman key agreement for SRTP, March 2006. IETF Internet Draft.
[38]
{38} M. A. Zissman. Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1), January 1996.

Cited By

View all
  • (2021)Spying through Virtual Backgrounds of Video CallsProceedings of the 14th ACM Workshop on Artificial Intelligence and Security10.1145/3474369.3486870(135-144)Online publication date: 15-Nov-2021
  • (2018)Data-Driven Audio Feature Selection for Audio Quality Recognition in Broadcast NewsProceedings of the 10th Hellenic Conference on Artificial Intelligence10.1145/3200947.3201035(1-6)Online publication date: 9-Jul-2018
  • (2017)BibliographyFrontiers of Multimedia Research10.1145/3122865.3122878(315-377)Online publication date: 19-Dec-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
SS'07: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
August 2007
351 pages
ISBN:1113335555779

Publisher

USENIX Association

United States

Publication History

Published: 06 August 2007

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media