More Web Proxy on the site http://driver.im/

research-article

A speaker verification method using frame-level self-attention

Authors:

Chunxiao Zhang,

Hang MinAuthors Info & Claims

ICCIR '22: Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics

Pages 758 - 762

https://doi.org/10.1145/3548608.3559301

Published: 14 October 2022 Publication History

Abstract

A new method of applying the self-attention mechanism for text-independent speaker verification is presented in this paper. There are differences between the speech frames, which may affect the performance of the speaker verification system. Generally, to capture the differences, the speaker verification system needs to perform a weighted average for the frame-level output in the pooling stage of extracting the speaker embedding, and the weights are learned through the self-attention mechanism. In this work, the self-attention mechanism is introduced into the frame-level model in the system to extract frame-level embeddings, which provides a new way of capturing the differences of speech frames for speaker verification. A self-attention layer is added between the frame-level layers to directly obtain different frame-level features. These features are then combined to form the more discriminative speaker embeddings so that the system achieves better performance. Results of experiments conducted on the Voxceleb1 dataset show that the proposed system outperforms the baselines. Moreover, the improvement is consistent with different speech duration.

References

[1]

H. Muckenhirn, M. M. Doss, and S. Marcell, “Towards directly modeling raw speech signal for speaker verification using CNNs,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 2018, pp. 4884-4888.

Digital Library

[2]

Y. Zhao, T. Zhou, Z. Chen, and J. Wu, “Improving deep CNN networks with long temporal context for text-independent speaker verification,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 2020, pp. 6834-6838.

[3]

X. Li, J. Zhong, J. Yu, “Bayesian x-vector: Bayesian neural network based x-vector system for speaker verification,” unpublished.

[4]

N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, 2010, vol. 19, pp. 788-798.

Digital Library

[5]

S. H. Ghalehjegh, and R. C. Rose, “Deep bottleneck features for i-vector based text-independent speaker verification,” in 2015 ieee workshop on automatic speech recognition and understanding (asru), Scottsdale, 2015, pp. 555-560.

[6]

Y. Lei, N. Scheffer, L. Ferrer, and M. McLaren, “A novel scheme for speaker recognition using a phonetically-aware deep neural network,” in 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), Florence, 2014, pp. 1695-1699.

[7]

E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. Gonzalez-Dominguez, “Deep neural networks for small footprint text-dependent speaker verification,” in 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), Florence, 2014, pp. 4052-4056.

[8]

G. Heigold, I. Moreno, S. Bengio, and N. Shazeer, “End-to-end text-dependent speaker verification,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, 2016, pp. 5115-5119.

Digital Library

[9]

C. Li, X. Ma, B. Jiang, “Deep speaker: an end-to-end neural speaker embedding system,” unpublished.

[10]

L. Wan, Q. Wang, A. Papir, and I. L. Moreno, “Generalized end-to-end loss for speaker verification,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 2018, pp. 4879-4883.

Digital Library

[11]

D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, “X-vectors: Robust dnn embeddings for speaker recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 2018, pp. 5329-5333.

Digital Library

[12]

A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. J. Lang, “Phoneme recognition using time-delay neural networks,” IEEE transactions on acoustics, speech, and signal processing, 1989, vol. 37, pp. 328-339.

[13]

D. Snyder, J. Villalba, N. Chen, “The JHU Speaker Recognition System for the VOiCES 2019 Challenge,” in INTERSPEECH, Graz, 2019, pp. 2468-2472.

[14]

Y. Zhu, and B. Mak, “Orthogonal training for text-independent speaker verification,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 2020, pp. 6584-6588.

[15]

D. Garcia-Romero, A. McCree, D. Snyder, and G. Sell, “JHU-HLTCOE system for the VoxSRC speaker recognition challenge,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, 2020, pp. 7559-7563.

[16]

Y. Zhu, T. Ko, D. Snyder, B. Mak, and D. Povey, “Self-attentive speaker embeddings for text-independent speaker verification,” in Interspeech, Hyderabad, 2018, pp. 3573-3577.

[17]

A. Vaswani, N. Shazeer, N. Parmar, “Attention is all you need,” in Advances in neural information processing systems, Long Beach, 2017, pp. 5998-6008.

[18]

S. Ioffe, “Probabilistic linear discriminant analysis,” in European Conference on Computer Vision, Graz, 2006, pp. 531-542.

Digital Library

A speaker verification method using frame-level self-attention
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches

Recommendations

Speaker verification using excitation source information

In this work we develop a speaker recognition system based on the excitation source information and demonstrate its significance by comparing with the vocal tract information based system. The speaker-specific excitation information is extracted by the ...
Speaker verification under degraded condition: a perceptual study

This study analyzes the effect of degradation on human and automatic speaker verification (SV) tasks. The perceptual test is conducted by the subjects having knowledge about speaker verification. An automatic SV system is developed using the Mel-...
Improving speaker verification using ALISP-Based specific GMMs
AVBPA'05: Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication

In recent years, research in speaker verification has expended from using only the acoustic content of speech to trying to utilise high level features of information, such as linguistic content, pronunciation and idiolectal word usage. Phone based ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCIR '22: Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics

June 2022

905 pages

ISBN:9781450397179

DOI:10.1145/3548608

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ICCIR 2022

ICCIR 2022: 2022 2nd International Conference on Control and Intelligent Robot

June 24 - 26, 2022

Nanjing, China

Acceptance Rates

Overall Acceptance Rate 131 of 239 submissions, 55%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
34
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten