Bursuc et al., 2021 - Google Patents

Separable convolutions and test-time augmentations for low-complexity and calibrated acoustic scene classification

Bursuc et al., 2021

Document ID: 3515853360908266144
Author: Bursuc A; Puy G; Jain H
Publication year: 2021

External Links

Cited by

Snippet

This report details the architecture we used to address Task 1a of the of DCASE2021 challenge. Our architecture is based on 4 layer convolutional neural network taking as input a log-mel spectrogram. The complexity of this network is controlled by using separable …

Continue reading at dcase.community (PDF) (other versions)

230000003416 augmentation 0 title abstract description 10

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image

Similar Documents

Publication	Publication Date	Title
Suh et al.	2020	Designing acoustic scene classification models with CNN variants
Zhou et al.	2019	Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function.
CN110600017B (en)	2022-03-04	Training method of voice processing model, voice recognition method, system and device
McLaren et al.	2014	Application of convolutional neural networks to speaker recognition in noisy conditions
CN110033756B (en)	2021-03-16	Language identification method and device, electronic equipment and storage medium
CN112685597B (en)	2021-07-13	Weak supervision video clip retrieval method and system based on erasure mechanism
CN110379416A (en)	2019-10-25	A kind of neural network language model training method, device, equipment and storage medium
US10109272B2 (en)	2018-10-23	Apparatus and method for training a neural network acoustic model, and speech recognition apparatus and method
CN111785288B (en)	2022-03-15	Voice enhancement method, device, equipment and storage medium
Mo et al.	2020	Neural architecture search for keyword spotting
CN109256137A (en)	2019-01-22	Voice acquisition method, device, computer equipment and storage medium
CN114627863B (en)	2024-03-22	Speech recognition method and device based on artificial intelligence
CN110853630B (en)	2022-02-18	Lightweight speech recognition method facing edge calculation
CN111048097A (en)	2020-04-21	Twin network voiceprint recognition method based on 3D convolution
CN113450830A (en)	2021-09-28	Voice emotion recognition method of convolution cyclic neural network with multiple attention mechanisms
Perez-Castanos et al.	2020	Listen carefully and tell: an audio captioning system based on residual learning and gammatone audio representation
Li et al.	2021	Oriental language recognition (OLR) 2020: Summary and analysis
Bursuc et al.	2021	Separable convolutions and test-time augmentations for low-complexity and calibrated acoustic scene classification
CN117789699B (en)	2024-09-06	Speech recognition method, device, electronic equipment and computer readable storage medium
Wu et al.	2021	Improving Deep CNN Architectures with Variable-Length Training Samples for Text-Independent Speaker Verification.
Aksoy et al.	2022	Classification of Environmental Sounds with Deep Learning
Mccree et al.	2018	Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17.
Zhou et al.	2021	Energy-Friendly Keyword Spotting System Using Add-Based Convolution.
Wan et al.	2023	ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression
Jeong et al.	2021	Trident ResNets with low-complexity for acoustic scene classification