Roy et al., 2009 - Google Patents
Fast transcription of unstructured audio recordingsRoy et al., 2009
View PDF- Document ID
- 10417787848602850561
- Author
- Roy B
- Roy D
- Publication year
External Links
Snippet
We introduce a new method for human-machine collaborative speech transcription that is significantly faster than existing transcription methods. In this approach, automatic audio processing algorithms are used to robustly detect speech in audio recordings and split …
- 230000035897 transcription 0 title abstract description 59
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
- G11B27/19—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
- G11B27/28—Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G10L15/265—Speech recognisers specially adapted for particular applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30796—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3074—Audio data retrieval
- G06F17/30743—Audio data retrieval using features automatically derived from the audio content, e.g. descriptors, fingerprints, signatures, MEP-cepstral coefficients, musical score, tempo
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Albanie et al. | Emotion recognition in speech using cross-modal transfer in the wild | |
Roy et al. | Fast transcription of unstructured audio recordings | |
CN105245917B (en) | A kind of system and method for multi-media voice subtitle generation | |
Morgan et al. | The meeting project at ICSI | |
US10977299B2 (en) | Systems and methods for consolidating recorded content | |
Yeh et al. | Segment-based emotion recognition from continuous Mandarin Chinese speech | |
JPWO2005069171A1 (en) | Document association apparatus and document association method | |
Sun et al. | Multi-modal sentiment analysis using deep canonical correlation analysis | |
CN101625862A (en) | Method for detecting voice interval in automatic caption generating system | |
Mirkin et al. | A recorded debating dataset | |
Latif et al. | Controlling prosody in end-to-end TTS: A case study on contrastive focus generation | |
Petermann et al. | Tackling the cocktail fork problem for separation and transcription of real-world soundtracks | |
Lebourdais et al. | Overlaps and gender analysis in the context of broadcast media | |
Li et al. | Multi-scale attention for audio question answering | |
Atmaja et al. | Jointly predicting emotion, age, and country using pre-trained acoustic embedding | |
Solberg et al. | A Large Norwegian Dataset for Weak Supervision ASR | |
Gilmartin et al. | Capturing Chat: Annotation and Tools for Multiparty Casual Conversation. | |
Chen et al. | Automatic emphatic information extraction from aligned acoustic data and its application on sentence compression | |
Kotsakis et al. | Investigation of spoken-language detection and classification in broadcasted audio content | |
Leh et al. | Speech Analytics in Research Based on Qualitative Interviews: Experiences from KA3 | |
Bertillo et al. | Enhancing Accessibility of Parliamentary Video Streams: AI-Based Automatic Indexing Using Verbatim Reports. | |
Demilie Melese et al. | Speaker-based language identification for Ethio-Semitic languages using CRNN and hybrid features | |
Kachhoria et al. | Minutes of Meeting Generation for Online Meetings Using NLP & ML Techniques | |
Waghmare et al. | A Comparative Study of the Various Emotional Speech Databases | |
Jain et al. | Detection of Sarcasm Through Tone Analysis on Video and Audio Files: A Comparative Study on AI Models Performance |