[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Roy et al., 2009 - Google Patents

Fast transcription of unstructured audio recordings

Roy et al., 2009

View PDF
Document ID
10417787848602850561
Author
Roy B
Roy D
Publication year

External Links

Snippet

We introduce a new method for human-machine collaborative speech transcription that is significantly faster than existing transcription methods. In this approach, automatic audio processing algorithms are used to robustly detect speech in audio recordings and split …
Continue reading at dspace.mit.edu (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • G10L15/265Speech recognisers specially adapted for particular applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30781Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F17/30784Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
    • G06F17/30796Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3074Audio data retrieval
    • G06F17/30743Audio data retrieval using features automatically derived from the audio content, e.g. descriptors, fingerprints, signatures, MEP-cepstral coefficients, musical score, tempo
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00

Similar Documents

Publication Publication Date Title
Albanie et al. Emotion recognition in speech using cross-modal transfer in the wild
Roy et al. Fast transcription of unstructured audio recordings
CN105245917B (en) A kind of system and method for multi-media voice subtitle generation
Morgan et al. The meeting project at ICSI
US10977299B2 (en) Systems and methods for consolidating recorded content
Yeh et al. Segment-based emotion recognition from continuous Mandarin Chinese speech
JPWO2005069171A1 (en) Document association apparatus and document association method
Sun et al. Multi-modal sentiment analysis using deep canonical correlation analysis
CN101625862A (en) Method for detecting voice interval in automatic caption generating system
Mirkin et al. A recorded debating dataset
Latif et al. Controlling prosody in end-to-end TTS: A case study on contrastive focus generation
Petermann et al. Tackling the cocktail fork problem for separation and transcription of real-world soundtracks
Lebourdais et al. Overlaps and gender analysis in the context of broadcast media
Li et al. Multi-scale attention for audio question answering
Atmaja et al. Jointly predicting emotion, age, and country using pre-trained acoustic embedding
Solberg et al. A Large Norwegian Dataset for Weak Supervision ASR
Gilmartin et al. Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
Chen et al. Automatic emphatic information extraction from aligned acoustic data and its application on sentence compression
Kotsakis et al. Investigation of spoken-language detection and classification in broadcasted audio content
Leh et al. Speech Analytics in Research Based on Qualitative Interviews: Experiences from KA3
Bertillo et al. Enhancing Accessibility of Parliamentary Video Streams: AI-Based Automatic Indexing Using Verbatim Reports.
Demilie Melese et al. Speaker-based language identification for Ethio-Semitic languages using CRNN and hybrid features
Kachhoria et al. Minutes of Meeting Generation for Online Meetings Using NLP & ML Techniques
Waghmare et al. A Comparative Study of the Various Emotional Speech Databases
Jain et al. Detection of Sarcasm Through Tone Analysis on Video and Audio Files: A Comparative Study on AI Models Performance