Bursuc et al., 2021 - Google Patents
Separable convolutions and test-time augmentations for low-complexity and calibrated acoustic scene classificationBursuc et al., 2021
View PDF- Document ID
- 3515853360908266144
- Author
- Bursuc A
- Puy G
- Jain H
- Publication year
External Links
Snippet
This report details the architecture we used to address Task 1a of the of DCASE2021 challenge. Our architecture is based on 4 layer convolutional neural network taking as input a log-mel spectrogram. The complexity of this network is controlled by using separable …
- 230000003416 augmentation 0 title abstract description 10
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G10L15/144—Training of HMMs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G06K9/6807—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries
- G06K9/6842—Dividing the references in groups prior to recognition, the recognition taking place in steps; Selecting relevant dictionaries according to the linguistic properties, e.g. English, German
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Suh et al. | Designing acoustic scene classification models with CNN variants | |
Zhou et al. | Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function. | |
CN110600017B (en) | Training method of voice processing model, voice recognition method, system and device | |
McLaren et al. | Application of convolutional neural networks to speaker recognition in noisy conditions | |
CN110033756B (en) | Language identification method and device, electronic equipment and storage medium | |
CN112685597B (en) | Weak supervision video clip retrieval method and system based on erasure mechanism | |
CN110379416A (en) | A kind of neural network language model training method, device, equipment and storage medium | |
US10109272B2 (en) | Apparatus and method for training a neural network acoustic model, and speech recognition apparatus and method | |
CN111785288B (en) | Voice enhancement method, device, equipment and storage medium | |
Mo et al. | Neural architecture search for keyword spotting | |
CN109256137A (en) | Voice acquisition method, device, computer equipment and storage medium | |
CN114627863B (en) | Speech recognition method and device based on artificial intelligence | |
CN110853630B (en) | Lightweight speech recognition method facing edge calculation | |
CN111048097A (en) | Twin network voiceprint recognition method based on 3D convolution | |
CN113450830A (en) | Voice emotion recognition method of convolution cyclic neural network with multiple attention mechanisms | |
Perez-Castanos et al. | Listen carefully and tell: an audio captioning system based on residual learning and gammatone audio representation | |
Li et al. | Oriental language recognition (OLR) 2020: Summary and analysis | |
Bursuc et al. | Separable convolutions and test-time augmentations for low-complexity and calibrated acoustic scene classification | |
CN117789699B (en) | Speech recognition method, device, electronic equipment and computer readable storage medium | |
Wu et al. | Improving Deep CNN Architectures with Variable-Length Training Samples for Text-Independent Speaker Verification. | |
Aksoy et al. | Classification of Environmental Sounds with Deep Learning | |
Mccree et al. | Language Recognition for Telephone and Video Speech: The JHU HLTCOE Submission for NIST LRE17. | |
Zhou et al. | Energy-Friendly Keyword Spotting System Using Add-Based Convolution. | |
Wan et al. | ABC-KD: Attention-Based-Compression Knowledge Distillation for Deep Learning-Based Noise Suppression | |
Jeong et al. | Trident ResNets with low-complexity for acoustic scene classification |