[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Bursuc et al., 2021 - Google Patents

Separable convolutions and test-time augmentations for low-complexity and calibrated acoustic scene classification

Bursuc et al., 2021

View PDF
Document ID
3515853360908266144
Author
Bursuc A
Puy G
Jain H
Publication year

External Links

Snippet

This report details the architecture we used to address Task 1a of the of DCASE2021 challenge. Our architecture is based on 4 layer convolutional neural network taking as input a log-mel spectrogram. The complexity of this network is controlled by using separable …
Continue reading at dcase.community (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6268Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • G06F17/5009Computer-aided design using simulation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/68Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
    • G06K9/6807Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
    • G06K9/6842Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/36Image preprocessing, i.e. processing the image information without deciding about the identity of the image
    • G06K9/46Extraction of features or characteristics of the image

Similar Documents

Publication Publication Date Title
Suh et al. Designing acoustic scene classification models with CNN variants
Zhou et al. Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function.
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
McLaren et al. Application of convolutional neural networks to speaker recognition in noisy conditions
CN110033756B (en) Language identification method and device, electronic equipment and storage medium
CN112685597B (en) Weak supervision video clip retrieval method and system based on erasure mechanism
CN110379416A (en) A kind of neural network language model training method, device, equipment and storage medium
US10109272B2 (en) Apparatus and method for training a neural network acoustic model, and speech recognition apparatus and method
CN111785288B (en) Voice enhancement method, device, equipment and storage medium
Mo et al. Neural architecture search for keyword spotting
CN109256137A (en) Voice acquisition method, device, computer equipment and storage medium
CN114627863B (en) Speech recognition method and device based on artificial intelligence
CN110853630B (en) Lightweight speech recognition method facing edge calculation
CN111048097A (en) Twin network voiceprint recognition method based on 3D convolution
CN113450830A (en) Voice emotion recognition method of convolution cyclic neural network with multiple attention mechanisms
Perez-Castanos et al. Listen carefully and tell: an audio captioning system based on residual learning and gammatone audio representation
Li et al. Oriental language recognition (OLR) 2020: Summary and analysis
Bursuc et al. Separable convolutions and test-time augmentations for low-complexity and calibrated acoustic scene classification
CN117789699B (en) Speech recognition method, device, electronic equipment and computer readable storage medium
Wu et al. Improving Deep CNN Architectures with Variable-Length Training Samples for Text-Independent Speaker Verification.
Aksoy et al. Classification of Environmental Sounds with Deep Learning
Mccree et al. Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17.
Zhou et al. Energy-Friendly Keyword Spotting System Using Add-Based Convolution.
Wan et al. ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression
Jeong et al. Trident ResNets with low-complexity for acoustic scene classification