US20180130484A1 - Systems and methods for interrelating text transcript information with video and/or audio information - Google Patents
Systems and methods for interrelating text transcript information with video and/or audio information Download PDFInfo
- Publication number
- US20180130484A1 US20180130484A1 US15/677,416 US201715677416A US2018130484A1 US 20180130484 A1 US20180130484 A1 US 20180130484A1 US 201715677416 A US201715677416 A US 201715677416A US 2018130484 A1 US2018130484 A1 US 2018130484A1
- Authority
- US
- United States
- Prior art keywords
- transcript
- words
- data
- traditional
- computer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012545 processing Methods 0.000 claims abstract description 22
- 238000013475 authorization Methods 0.000 claims 1
- 230000006870 function Effects 0.000 description 33
- 238000004891 communication Methods 0.000 description 26
- 238000013518 transcription Methods 0.000 description 14
- 230000035897 transcription Effects 0.000 description 14
- 230000000704 physical effect Effects 0.000 description 6
- 238000012546 transfer Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 4
- 239000000463 material Substances 0.000 description 4
- 238000003491 array Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 229920000747 poly(lactic acid) Polymers 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/04—Time compression or expansion
- G10L21/055—Time compression or expansion for synchronising with other signals, e.g. video signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
- H04N21/43074—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of additional data with content streams on the same device, e.g. of EPG data or interactive icon with a TV program
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G10L15/265—
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
- H04N21/234336—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by media transcoding, e.g. video is transformed into a slideshow of still pictures or audio is converted into text
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/242—Synchronization processes, e.g. processing of PCR [Program Clock References]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/4223—Cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
- H04N21/8547—Content authoring involving timestamps for synchronizing content
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/04—Synchronising
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
Definitions
- FIG. 1 is a block diagram that illustrates an example embodiment of a system for interrelating text transcript information with video and/or audio information according to various aspects of the present disclosure
- FIG. 2 is a flowchart that illustrates an example embodiment of a method of processing video transcription information according to various aspects of the present disclosure
- FIG. 3 is a block diagram of a computing device suitable for use to implement portions of the system according to the present disclosure.
- FIG. 4 is a flow chart that illustrates an example embodiment of a method for aligning a human-made transcript to the timing of a machine transcript according to various aspects of the present disclosure.
- audiovisual information, data, or recordings refers to video that includes audio, video that is associated with separate audio of the video scene, or audio alone.
- the internet, with such sites as youtube.com, has provided an avenue for posting audiovisual recordings for public viewing.
- Security agencies e.g., police forces
- a security agency may also prepare and release an audiovisual recording as evidence for use in a proceeding.
- a security agency may release not only an audiovisual recording, but also a written record (e.g., transcript) of the audio portion of the recording.
- Audio portions of an audiovisual recording may be transcribed in a traditional manner (e.g., by a court reporter, by a transcriptionist) or by a computer (e.g., computer-generated transcription).
- a transcript includes a written representation of content in the audio portion (e.g., audio data) of the audiovisual recording.
- traditional transcripts are generally more accurate than computer-generated transcripts.
- Traditional transcripts are generally more accurate as to the semantic translation of sounds into proper words for a particular language.
- a computer-generated transcript may include a record of the location (e.g., time, position) in the audiovisual data where each word or sound was detected.
- Traditional prepared transcripts generally do not include additional data that provides the location of each word or sound of the transcript in the audiovisual recording.
- traditional transcripts may be used to provide accurate semantics and computer-generated transcripts may be used to locate the words of the audio data to the location in the audiovisual recording where the words occur.
- Traditional transcripts may be used in combination with computer-generated transcripts so that the words of the traditional transcript may be linked (e.g., tied, associated, aligned) to the location in the audiovisual recording where the word occurs.
- Tying the traditional transcript to the timing of the audiovisual recording enables real-time redaction of portions of the audiovisual data on playback of the recording. Redaction may be performed automatically, with little or no human intervention. Rules may be specified as to the type of material that should be redacted from an audiovisual recording. During a presentation of audiovisual data, if the rules specify that words or sounds in the transcript should be redacted, the presentation may be altered to redact the portion of the audiovisual data that falls within the rules of redaction.
- a transcript in electronic form enables a user to search the transcript for particular words such as geographic locations, articles of clothing, weapons, buildings, or other objects.
- the link to the location of words in the audiovisual data permits a user to search the transcript and find the location in the audiovisual data easily.
- a transcript may include a description of the characteristics the sounds or words of the audio.
- a description may include the volume (e.g., intensity), tone (e.g., menacing, threatening, helpful, kind), frequency range, or emptions (e.g., anger, elation) of a word or a sound.
- the description of the audio data may be searched by a user and linked to a location in the audiovisual data.
- System 100 of FIG. 1 is an example environment for ordering the creation of transcripts, storing transcripts, generating transcripts, storing audiovisual data, aligning the words of a transcript and/or description of the audio data to locations in audiovisual data, manipulating (e.g. redacting, searching) the audiovisual data using an aligned transcript, generating presentations of the aligned transcript and the audiovisual data, and using an aligned transcript to redact a presentation of an audiovisual recording in accordance with rules that specify material that should be redacted.
- System 100 may include one or more traditional transcript service providers 104 , evidence management system 102 , and one or more recording systems 106 .
- a recording system detects physical properties in an environment and records (e.g., stores) the information (e.g., data) regarding the physical properties. Recorded information may be analyzed to determine characteristics of the physical properties detected and recorded.
- Recorded information may relate to an incident (e.g., event, occurrence). Recorded information may provide a record of an incident. Recorded information may be reviewed to provide a reminder of the incident. Recorded information may be used as evidence to prove the occurrence of an incident.
- incident e.g., event, occurrence
- a recording system may detect and record visual (e.g., video) data and/or audible (e.g., audio) physical properties. Visual and/or audible physical properties detected and recorded may be within the range of vision and/or hearing of a human. Visual and/or audible physical properties detected and recorded may outside the range of vision and/or hearing of a human.
- the capture and storing of video and/or audio data may be accomplished using any suitable technique. Use of the term video data may refer to both video and audio data together.
- a recording system may create an audiovisual record.
- Data stored by a recording system may be stored in any suitable format, including but not limited to H.264, MPEG-4, AAC, and WAV.
- a recording system may convert the recorded information from one format (e.g., analog data, encoding) to another format (e.g., digital data, encoding).
- a recording system may communicate (e.g., transmit, receive) data.
- a recording system may transmit recorded data to another system.
- a recording system may include any conventional communication circuitry for transmitting and/or receiving data.
- a recording system may use any conventional wired (e.g., LAN, Ethernet) or wireless communication (e.g., Bluetooth, Bluetooth Low Energy, WiFi, ZigBee, 2G, 3G, 4G, WiMax) protocol.
- a recording system may store audiovisual data for a period (e.g., shift, day) then transmit the audiovisual data to another system.
- a recording system may transmit audiovisual information to another system as it is captured (e.g., live streaming).
- Recording system 106 performs the functions of recording system discussed herein.
- recording system 106 may include a digital camera such as a wearable (e.g., body-worn, carried) camera that records audiovisual data.
- recording system 106 includes an in-car camera or dash cam that records audiovisual data.
- Recording system 106 may include separate recording systems, such as a digital camera and a wireless microphone that cooperate to perform the functions of a recording system. For example, video data from a first camera and audio data from a second camera may be combined and/or used.
- the act (e.g., function, operation) of recording may use any suitable technique known to one of ordinary skill in the art, and so is not described in further detail herein.
- recording system 106 records audiovisual information then transmits data to evidence management system 102 . In some implementations, recording system 106 , recording system 106 live streams audiovisual data to evidence management system 102 .
- An evidence management system may collect and manage information.
- An evidence management system may receive recorded data from one or more recording systems.
- An evidence management system may receive transcripts from one or more traditional transcript service providers.
- An evidence management system may provide recorded data, transcript data, and/or data that has been processed to a person or entity.
- An evidence management system may communicate with other systems to transmit and receive data.
- An evidence management system may include any conventional communication circuitry for transmitting and/or receiving data.
- An evidence management system may use any conventional wired or wireless communication protocol for communicating data.
- An evidence management system may store data.
- An evidence management system may store recorded data, the traditional transcripts, computer-generated transcripts, and/or alignment data that associates recorded data to transcript data. Recoded data includes audiovisual data.
- An evidence management system may store and/or manage data in such a manner that it may be used as evidence in a proceeding, such as a legal proceeding.
- An evidence management system may organize stored data according to the recording system that captured the data.
- An evidence management system may organize stored data according to a particular recorded data (e.g., video).
- An evidence management system may further organize stored data according to agencies (e.g., groups, organizations).
- An evidence management system may group captured data for storage according to the agency to which the person using the recording system used to capture the data is employed.
- Evidence management system 102 is an example of an embodiment of an evidence management system. Evidence management system 102 performs the functions of an evidence management system discussed herein.
- Evidence management system 102 may include alignment engine 108 , management engine 110 , computer-generated transcript engine 112 , video data store 114 , audio data store 116 , computer-generated transcript data store 118 , traditional transcript data store 120 , and alignment data store 122 .
- An evidence management system may perform one or more operations (e.g., functions).
- An operation may include providing recorded data to a traditional transcript service provider, such as to traditional transcript service provider 104 , associating transcripts to recorded data, aligning traditional transcripts to recorded data, generating a computer-generated transcript of recorded data, and/or providing data to another system.
- An engine may perform one or more operations of an evidence management system.
- An engine may perform one or more functions or a single function.
- An engine may access stored data to perform a function.
- An engine may generate data for storage.
- engine refers to, in general, circuitry, logic embodied in hardware and/or software instructions executable by a processor of a computing device.
- Circuitry includes any circuit and/or electrical/electronic subsystem for performing a function.
- Logic embedded in hardware includes any circuitry that performs a predetermined operation or predetermined sequence of operations. Examples of logic embedded in hardware include standard logic gates, application specific integrated circuits (“ASICs”), field-programmable gate arrays (“FPGAs”), microcell arrays, programmable logic arrays (“PLAs”), programmable array logic (“PALs”), complex programmable logic devices (“CPLDs”), erasable programmable logic devices (“EPLDs”), and programmable logic controllers (“PLCs”).
- ASICs application specific integrated circuits
- FPGAs field-programmable gate arrays
- PLAs programmable logic arrays
- PALs programmable array logic
- CPLDs complex programmable logic devices
- EPLDs erasable programmable logic devices
- Logic embodied in (e.g., implemented as) software instructions may be written in any programming language, including but not limited to C, C++, COBOL, JAVATM, PHP, Perl, HTML, CSS, JavaScript, VBScript, ASPX, HDL, and/or Microsoft .NETTM programming languages such as C#.
- the software for an engine may be compiled into an executable program or written in an interpreted programming language for execution by a suitable interpreter or virtual machine executed by a processing circuit.
- Engines may be callable (e.g., executable, controllable) from other engines or from themselves.
- engines described herein can be merged with other engines, other applications, or may be divided into sub-engines.
- Engines that are implemented as logic embedded in software may be stored in any type of computer-readable medium.
- An engine may be stored on and executed by one or more general purpose computers, thus creating a special purpose computer configured to perform the functions of (e.g., provide) the engine.
- the devices and systems illustrated herein may include one or more computing devices configured to perform the functions of the illustrated engines, though the computing devices themselves have not been illustrated in every case for the sake of clarity.
- a computer-generated transcript engine generates a computer-generated transcript.
- a computer-generated transcript engine may receive audio data, analyze the audio data to identify words of one or more languages, and provide a record of the words identified for storage.
- a computer-generated transcript engine may include sophisticated algorithms to perform semantic analysis. Semantic analysis may include recognizing the different connotations (e.g., meanings) of words to correctly identify words used in human speech. Semantic analysis may include identifying words and phrases used in the vernacular (e.g., region idioms, gang-related speech) and providing a translated meaning.
- computer-generated transcript engine 112 may use any suitable speech-to-text algorithms to produce the computer-generated transcript, including but not limited acoustic modeling, language modeling, Hidden Markov models, feedforward artificial neural networks, and recurrent neural networks.
- a computer-generated transcript engine may identify the location of words and phrases in the audio data.
- a timestamp in the computer-generated transcript may indicate a point in time (e.g., location) at which a given recognized word occurred in the audio data.
- a timestamp may have any suitable format including a time of day, an elapsed time from the beginning of the recording, and an elapsed time from the previously recognized word.
- a computer-generated transcript engine may record the location of each word and/or phrase identified in the audio data.
- a computer-generated transcript engine may prepare data that relates (e.g., associates) each word or phrase to its respective location in the audio data.
- a computer-generated transcript engine may use data that relates the audio data to the video data to relate the location of words and/or phrases to a location in the video data.
- the data that describes the location of a word and/or a phrase to a location in audiovisual data may be referred to as alignment data.
- a computer-generated transcript engine may store alignment data. Alignment data may be stored in alignment data store 122 and/or in computer-generated transcript data store 118 . Alignment data may be stored with the computer-generated transcript in a single file or separately with information as to how the alignment data relates to the computer-generated transcript.
- Confidence scores may be associated with words within the computer-generated transcript to indicate the estimated likelihood that the word generated by the computer-generated transcript engine is an accurate transcription of the word in the audio data.
- a management engine may manage the generation and storage of transcripts.
- a management engine may receive instructions from a user and/or other engines.
- a management engine may perform a function responsive to an instruction.
- a management engine may order the generation of a transcript by a traditional transcript service provider, such as traditional transcript service providers 104 , or by a computer-generated transcript engine, such as computer-generated transcript engine 112 engine.
- a management engine may provide audiovisual data to a traditional transcript service provider and/or a computer-generated transcript engine.
- a management engine may use any conventional communication protocol to transmit audiovisual information.
- a management engine may track the progress of transcript generation.
- a management engine may receive a transcript.
- a management engine may use any conventional communication protocol to receive a transcript.
- a management engine may store a transcript.
- a management engine may associate a transcript with an audiovisual data.
- a management engine may associate a computer-generated transcript to a traditional transcript.
- a management engine may receive alignment data.
- a management engine may store alignment data.
- a management engine may associate alignment data with audiovisual data.
- a management engine may associate alignment data to a computer-generated transcript and/or a traditional transcript.
- a management engine may associate stored data to one or more recording systems, such as recording systems 106 .
- a management engine may associate stored data to the recording system that recorded an audio data that was used to generate a transcript.
- a management engine may associate stored data, transcripts, and/or alignment data to an agency.
- a management engine may provide reports regarding the functions it has or will perform.
- Management engine 110 performs the functions of a management engine discussed herein.
- management engine 110 may provide audio data to one or more traditional transcript service providers 104 to have the audio data transcribed. Management engine 110 may receive traditional transcripts back from the traditional transcript service providers 104 . Management engine 110 may store traditional transcripts in traditional transcript data store 120 .
- Management engine 110 may provide a platform for agencies that store information in evidence management system 102 to manage purchase of transcription services from traditional transcript service providers 104 through the evidence management system 102 .
- An agency may enter into contracts with one or more traditional transcript service providers 104 .
- the agency contract may include various terms including service level agreements and price points.
- Management engine 110 may create accounts for traditional transcript service providers 104 within the evidence management system 102 to allow the agency to request transcripts from the traditional transcript service providers 104 according to their agency contracts.
- a user e.g., officer associated with an agency may choose from traditional transcript service providers 104 that have a contract with the agency.
- Management engine 110 may provide audio data to a traditional transcript service provider 104 in response to a request from a user.
- Management engine 110 may seek approval from a supervisor of the user before requesting a transcript from the traditional transcript service provider 104 .
- the management engine 110 may wait for the approval of the supervisor before it sends out audio data to the traditional transcript service provider 104 .
- Management engine 110 may create a unique identifier for each transcription order.
- Evidence management system 102 may provide the unique identifier to the requesting agency to allow them to obtain customer service directly from traditional transcript service provider 104 .
- Traditional transcript service provider 104 may use information obtained from the evidence management system 102 , such as the unique identifier, supervisor name, and user name, to bill the agency directly for transcription services.
- Management engine 110 may select a traditional transcript service provider 104 , or management engine 110 may be instructed on which traditional transcript service provider 104 to use. Management engine 110 may select traditional transcript service provider 104 using any algorithm (e.g., round-robin) or using any criteria (e.g., cost, throughput, loading at provider, highest accuracy).
- algorithm e.g., round-robin
- criteria e.g., cost, throughput, loading at provider, highest accuracy
- a traditional transcript service provider 104 may be capable of producing a highly accurate transcript of audio data.
- audio data only or the entire audiovisual data may be transmitted to a computing device (e.g., computer) of a traditional transcript service provider 104 .
- Traditional transcript service provider 104 may use one or more computing devices and/or mechanical devices to allow an operator (e.g., a person trained to provide transcription services) to listen to the audio data and enter a transcription of speech and/or other audio elements within the audio data.
- evidence management system 102 may stream audio data to a computing device of an operator, and may provide a web-based interface, an app, a desktop application, or an application programming interface (API) for the operator to enter the transcription. Further description of examples of the interaction between the evidence management system 102 and the traditional transcript service providers 104 is provided below.
- API application programming interface
- Management engine 110 may transcode the audio data into a format desired by a given transcription service provider 104 , such as WAV.
- Management engine 110 may include additional metadata along with the audio data, including but not limited to an owning user, an owning agency, and/or a desired type of transcript (e.g., verbatim or standard).
- a traditional transcript from the traditional transcript service provider 104 may be provided in any suitable format including but not limited to a text file and a word processing document.
- a traditional transcript may include explanatory information including but not limited to the identity of the speakers, a description of noises and/or sounds, and/or the meaning of colloquial language or slang.
- Management engine 110 may also manage (e.g., control) a process (e.g., work flow) for making revisions to the traditional transcript. For example, as a prosecution team and a defense team argue over the exact words that should appear in a given transcript, management engine 110 may track changes made to the transcript, the identity of the person making the change, and any information as to the reason for the change. In another example, a judge may order certain portions of the transcript stricken. Management engine 110 may make and track such changes to the traditional transcript when instructed to do so.
- a process e.g., work flow
- An alignment engine aligns data.
- An alignment engine may identify where particular data in one set of data (e.g., file) corresponds to particular data in another file.
- An alignment engine may record how the data of one file aligns with the data of another file.
- An alignment engine may include the data from one or more of the files in the file that stores alignment data.
- An alignment engine may align the words and/or phrases of a traditional transcript to some or all of the words or phrases of a computer-generated transcript. Using the alignment data for the alignment between the traditional transcript and the computer-generated transcript, an alignment engine may align the traditional transcript to some or all of the identified locations in the audio file. An alignment engine may use confidence scores provided by computer-generated transcript engine 112 to aid in alignment. Using the alignment information between the audio data and the video data, an alignment engine may align the traditional transcript to video data.
- the data from one or more of the above alignments may be referred to as enhanced alignment data.
- an alignment engine may not be able to align all of the words in the traditional transcript to words in the computer-generated transcript.
- An alignment engine may use any algorithm for spacing words in the traditional transcript that do not match words in the computer-generated transcript.
- alignment engine 108 compares the words in the traditional transcript to the words in the computer-generated transcript to find matches.
- Alignment engine 108 may compare the words of the traditional transcript to only those words in the computer-generated transcript having a confidence score greater than a threshold.
- Alignment engine 108 may compare single words from the traditional transcript to words of the computer-generated transcript to find a match.
- Alignment engine 108 may require that a group of words (e.g., sequence) from the traditional transcript match the same group of words in the same order before identifying the words as matching.
- Alignment engine 108 may space unmatched words equally between the matched words. Alignment engine 108 may compare the intensity (e.g., volume) of the audio data to words that commonly spoken loudly or try to identify the sounds of individual letters or syllables in the words of the transcript to letter or syllable sounds in the audio data to identify a location of unmatched words.
- intensity e.g., volume
- a “data store” as described herein may be any suitable device configured to store data for access by a computing device.
- a data store receives data.
- a data store retains (e.g., stores) data.
- a data store retrieves data.
- a data store provides data for use by a system, such as an engine.
- a data store may organize data for storage.
- a data store may organize data as a database for storage and/or retrieval. The operations of organizing data for storage in or retrieval from a database of a data store may be performed by a data store.
- a data store may include a repository for persistently storing and managing collections of data.
- a data store may store files that are not organized in a database. Data in a data store may be stored in computer-readable medium.
- RDBMS relational database management system
- any other suitable storage technique and/or device capable of quickly and reliably providing the stored data in response to queries may be used, such as a key-value store and an object database.
- Data stores 114 - 122 perform the functions of a data store discussed herein.
- a data store may be implemented using any computer-readable medium.
- An engine e.g., 108 - 112
- computing device of management system 102 may access data store 114 - 122 locally (e.g., via data bus), over a network, and/or as a cloud-based service.
- a data store suitable for use with recording systems 106 which includes reliable storage but also low overhead, is a file system or database management system that stores data in files (or records) on a computer-readable medium such as flash memory, random access memory (RAM), or hard disk drives.
- a computer-readable medium such as flash memory, random access memory (RAM), or hard disk drives.
- a computer-readable medium may store, retrieve, and/or organize data.
- the term “computer-readable medium” includes any storage medium that is readable by a machine (e.g., computer, processor, processing circuit). Storage medium includes any devices, materials, and/or structures used to place, keep, and retrieve data (e.g., information).
- a storage medium may be volatile or non-volatile.
- a storage medium may include any semiconductor (e.g., RAM, ROM, EPROM, Flash), magnetic (e.g., hard disk drive), optical technology (e.g., CD, DVD), or combination thereof.
- Computer-readable medium includes storage medium that is removable or non-removable from a system. Computer-readable medium may store any type of information, organized in any manner, and usable for any purpose such as computer readable instructions, data structures, program modules, or other data.
- FIG. 2 is a flowchart that illustrates an example embodiment of a method of processing video transcript information according to various aspects of the present disclosure.
- the method 200 proceeds to block 202 , where one or more recording systems 106 record audiovisual data, and transmit the data to an evidence management system 102 .
- the recording systems 106 may be capable of wireless communication, and may transmit recorded data to the evidence management system 102 using any suitable transmission technology including but not limited to WiFi, 3G, 4G, LTE, and WiMAX.
- the recording systems 106 may be physically connected to a dock via any suitable type of wired connection including but not limited to USB, FireWire, and a 3.5 mm connector.
- the dock may then obtain the recorded data from the recording system 106 and then transmit the recorded data to the evidence management system 102 via a network. Further description of devices and techniques for transmitting recorded data to an evidence management system 102 from a recording system 106 are described in commonly owned, co-pending U.S. patent application Ser. No. 15/210,060, filed Jul. 14, 2016, the entire disclosure of which is hereby incorporated by reference herein for all purposes.
- the evidence management system 102 stores the video data (if any) of the audiovisual data in a video data store 114 and stores the audio data of the audiovisual data in an audio data store 116 .
- the video data and the audio data may be stored together.
- the video data and audio data may be stored in separate data stores in order to allow the audio data alone to be transmitted to a traditional transcript service provider 104 , so the video data and audio data may be associated with each other by a unique identifier or using another suitable technique.
- a computer-generated transcript engine 112 of the evidence management system 102 creates a computer-generated transcript that includes timestamps and confidence scores for at least some words in the audio data, and at block 208 , the machine transcription engine 112 stores the computer-generated transcript in a computer-generated transcript data store 118 .
- the computer-generated transcript is stored in a machine-readable format, including but not limited to JavaScript Object Notation (“JSON”) and extensible markup language (“XML”).
- JSON JavaScript Object Notation
- XML extensible markup language
- all audio data may be transcribed by the computer-generated transcript engine 112 .
- only audio data tagged with a given type or stored in a given category may be transcribed by the computer-generated transcript engine 112 .
- a management engine 110 of the evidence management system 102 obtains a traditional transcript of the audio data from a traditional transcript service provider 104 .
- the management engine 110 stores the traditional transcript in a traditional transcript data store 120 .
- the method 200 then proceeds to block 214 , where an alignment engine 108 of the evidence management system 102 matches one or more high-confidence words from the computer-generated transcript to words from the traditional transcription.
- An example of a method suitable for use in block 214 is illustrated in FIG. 4 and described further below.
- the alignment engine 108 adds the timestamps from the matching high-confidence words to the words of the traditional transcript.
- the timestamps may be added to the traditional transcript using any suitable format.
- the traditional transcription may be reformatted into a JSON format, an XML format, or another machine-readable format in order to associate the timestamps with the words.
- a separate record of matching words, timestamps, and their location within the traditional transcript may be created. The separate record may be stored in the alignment data store 122 , or may be stored along with the traditional transcript in the traditional transcript data store 120 .
- at least some of the actions described in block 216 may be performed in the method suitable for use in block 214 .
- the alignment engine 108 associates the traditional transcript with the audiovisual data using the timestamps as guideposts.
- the association may include creating an overlay that presents the text from the traditional transcript at a pace indicated by the guidepost timestamps. Using an overlay allows the evidence management system 102 to generate an MPEG, DVD, or other video presentation format that would always present the transcript text in the same manner.
- the association may include creating a subtitle track and/or SRT file that presents the text from the traditional transcript at a pace indicated by the guidepost timestamps. Using a subtitle track allows a viewer to turn the transcript display on and off. The pace of display between the guidepost timestamps may be determined using syllables or other characteristics detected in the audio data.
- the management engine 110 processes the audiovisual data based on analysis of the traditional transcript.
- the actions of block 220 are optional, and either may not be performed, or may be performed separately from the rest of the method 200 once a traditional transcript has been aligned to audiovisual data.
- Management engine 110 may use the aligned traditional transcript to automatically redact portions of the audiovisual data. For example, management engine 110 may detect names, addresses, profanity, and/or other keywords, or pattern-based portions of text from the aligned traditional transcript. Management engine 110 may use the associated timestamps to automatically redact the detected portions of the audiovisual data. As another example, the management engine 110 may provide a search interface that allows the full text of traditional transcripts to be searched, wherein the search results will link directly to the relevant portion of audiovisual data using the timestamps.
- the management engine 110 may present a web-based interface, an app, a desktop application, an API, or another type of interface that allows a user to select a portion of the transcript to be redacted, and the management engine 110 may automatically redact the associated portions of the audiovisual data.
- the method 200 then proceeds to an end block and terminates.
- FIG. 4 is a flowchart that illustrates a method according to various aspects of the present disclosure.
- Method 400 may be performed by one or more engines, for example a computer-generated transcript engine 112 and an alignment engine 108 of evidence management system 102 , or by a computing device of evidence management system 102 .
- Method 400 includes blocks start 402 , select 404 , search present 408 , present found 410 , simple 412 , search prior 414 , prior found 416 , set yes 418 , set no 420 , search next 422 , next found 424 , test prior 426 , associate 428 , complete 430 , and end 432 .
- Evidence management system 102 may execute one or more of the blocks of method 400 in parallel. Evidence management system 102 may begin execution of a block when it has received the data required to perform the function of the block.
- Method 400 begins execution with start 402 .
- Start 402 may initialize any variable needed to perform method 400 .
- Start 402 may retrieve, for example from a data store, any preference information provided by a user such as whether all three words (e.g., prior, present, next) must match to associate a time or the threshold for the confidence score to determine whether a word from the traditional transcript matches the computer-generated transcript.
- Execution continues with select 404 .
- Select 404 accesses the computer-generated transcript to select three contiguous words.
- the meaning of the term “contiguous” depends on whether the confidence score of the words of the computer-generated transcript is considered. If the confidence score is not considered, the term “contiguous” means that there are no words between the selected words. If the confidence score is considered, the term “contiguous” means that the selected words each have a confidence score greater than the threshold and all of the words between any of the selected words, if any, have a confidence score of less than the threshold. Edge cases, such as first starting or ending method 400 are not described herein.
- Primary word refers to the first word in the sequence of three contiguous words
- present word refers to the word that follows “prior word” and comes before “next word” in the transcript.
- the words “prior”, “present”, and “next” refer to the order of words in the order in which the words would be read in the transcript.
- Search present 408 searches for the present word in the traditional transcript.
- Search present 408 uses any conventional technique for searching a digital file for a particular word.
- the word from the traditional transcript that matches the present word is referred to herein as the identified word.
- Execution moves to present found 410 .
- Present found 410 makes a decision based on whether the search present 408 found the present word in the traditional transcript. If the present word was found, execution goes to simple 412 . If the present word was not found, execution goes to select 404 .
- Simple 412 determines whether method 400 should make decisions based on finding present word alone in the traditional transcript or whether prior word, next word, or both must also be found in the traditional transcript in the proper order.
- a user or a method executed by an engine may determine whether the search performed is simple or more involved.
- execution moves to associate 428 . If more than one word must be found in the traditional transcript, execution moves to search prior 414 .
- Search prior 414 searches for the prior word in the traditional transcript. If there are no words between the prior word and the present word in the computer-generated transcript, search prior 414 may select the word that is proximate the present word in the traditional transcript; however, if the proximate word does not match the previous word, search prior 414 may need to search the traditional transcript backwards from the present word just in case the traditional transcript contains a description between the two words in the traditional transcript. If search prior must search for a march to the previous word, search prior 414 may use any conventional technique for searching a digital file for a particular word. Execution moves to prior found 416 . A condition of indicating that the prior word was found may include determining that any word that matches prior word is contiguous to the word that matched present word.
- Prior found 416 makes a decision based on whether the search prior 414 found the prior word in the traditional transcript. If the prior word was found, execution goes to set yes 418 . If the prior word was not found or the requirement for contiguousness was not met, execution goes to set no 420 .
- Set yes 418 sets a variable to indicate that the prior word was found in the traditional transcript.
- Set no 420 sets a variable to indicate that the prior word was not found in the traditional transcript or the contiguousness requirements were not met. Execution from set yes 418 and set no 420 goes to search next 422 .
- Search next 422 searches for the next word in the traditional transcript. If there are no words between the prior word and the present word in the computer-generated transcript, search next 422 may select the word that is just ahead the present word in the traditional transcript. If the word from the traditional transcript does not match the next word, as discussed above, search next 422 may any conventional technique for searching a digital file for a particular word. Execution moves to next found 424 . A condition of indicating that the next word was found may include determining that any word that matches next word is continuous to the word that matched present word.
- Next found 424 makes a decision based on whether search next 422 found the next word in the traditional transcript. If the next word was found, at least two of the three words were found in the traditional transcript and execution goes to associate 428 . If the next word was not found or the requirement for contiguousness was not met, execution goes to test prior 426 .
- Test prior 426 makes a decision based on whether search prior 414 found the prior word in the traditional transcript. If the prior word was found, then two of the three words were found in the traditional transcript and execution goes to associate 428 .
- Associate 428 associates the time that present word occurs in the audio data to the identified word in the traditional transcript.
- the location of each word that is recognized in the data file may be recorded.
- the location of a word may be the time the word occurs in the audio data, the number of words or syllables before the particular word in the audio data, the time before or after a unique sound in the audio data, or any other method for determining the location of a word in the audio file.
- Associate 428 associates the location, in this case time, of the present word to the identified word in the traditional transcript. Associating may include altering the traditional transcript to include the time in a manner that it relates to the identified word or making a separate record that relates the content of the traditional transcript and the identified times.
- the words in the traditional transcript that are not associated with a time from the computer-generated transcript may be assigned a time that is in sequential order with the associated times of the words before and after.
- evidence management system 102 determines whether the entire computer-generated transcript has been processed. If the words of the computer-generated transcript have all been assessed and skipped or compared to words in the traditional transcript, execution goes to end 432 where the method ends. Otherwise, execution returns to select 404 .
- other techniques may be used to determine words that match between the traditional transcript and the computer-generated transcript. For example, the same word has to be found in the traditional transcript within a given distance of an expected position of the word from the computer-generated transcript in order to be considered a match. Stated differently, if the word “dog” is determined to be a high-confidence word, and it is the 500 th word in the computer-generated transcript, a match in the traditional transcript may be the word “dog” that appears at either the 500 th word or within a predetermined number of words from the 500 th word. In some embodiments, the order or position of previously matched words may be used to further enhance the ability to find matching words. In some embodiments, the correlation between the low-confidence words and the unmatched words in the traditional transcript may be used for machine learning to improve the quality of a subsequent computer-generated transcript.
- a computing device may perform a function.
- a computing device may provide a result of performing a function.
- a computing device may receive information, manipulate the received information, and provide the manipulated information.
- a computing device may execute a stored program to perform a function.
- a computing device may provide and/or receive digital data via a conventional bus using any conventional protocol.
- a computing device may provide and/or receive digital data via a network connection.
- a computing device may store information and retrieve stored information. Information received, stored, and/or manipulated by the computing device may be used to perform a function and/or to perform a stored program.
- a computing device may control the operation and/or function of other circuits and/or components of a system.
- a computing device may receive status information regarding the operation of other components, perform calculations with respect to the status information, and provide commands (e.g., instructions) to one or more other components for the component to start operation, continue operation, alter operation, suspend operation, or cease operation.
- commands and/or status may be communicated between a computing device and other circuits and/or components via any type of buss including any type of conventional data/address bus.
- FIG. 3 is a block diagram that illustrates aspects of computing device 300 appropriate for use as a computing device of the present disclosure.
- Computing device 300 performs the functions of a computing device discussed above.
- Computing device 300 may include processor 102 , system memory 304 , communication bus 106 , storage memory 108 , and network interface circuit 310 .
- computing device 300 describes various elements that are common to many different types of computing devices. While FIG. 3 is described with reference to a computing device that is implemented as a device on a network, the description below is applicable to servers, personal computers, mobile phones, smart phones, tablet computers, embedded computing devices, and other devices that may be used to implement portions of embodiments of the present disclosure. Moreover, those of ordinary skill in the art and others will recognize that the computing device 300 may be any one of any number of currently available or yet to be developed devices.
- a processor also referred to as a processing circuit, includes any circuitry and/or electrical/electronic subsystem for performing a function.
- a processing circuit may include circuitry that performs (e.g., executes) a stored program.
- a processing circuit may include a digital signal processor, a microcontroller, a microprocessor, an application specific integrated circuit, a programmable logic device, logic circuitry, state machines, MEMS devices, signal conditioning circuitry, communication circuitry, a radio, data busses, address busses, and/or a combination thereof in any quantity suitable for performing a function and/or executing one or more stored programs.
- a processing circuit may further include conventional passive electronic devices (e.g., resistors, capacitors, inductors) and/or active electronic devices (op amps, comparators, analog-to-digital converters, digital-to-analog converters, programmable logic).
- a processing circuit may include conventional data buses, output ports, input ports, timers, memory, and arithmetic units.
- a processing circuit may provide and/or receive electrical signals whether digital and/or analog in form.
- a processing circuit may provide and/or receive digital data.
- a processing circuit may have a low power state in which only a portion of its circuits operate or it performs only certain function.
- a processing circuit may be switched (e.g., awoken) from a low power state to a higher power state in which more or all of its circuits operate or it performs additional certain functions.
- a system memory may store data and/or program modules that are immediately accessible to and/or are currently being operated on by the processing circuit.
- a system memory may be a computer-readable medium.
- a processor may perform or control the operation of a computing device by executing a stored program.
- a communication bus transfers data between the components of a computing device.
- a communication bus may transfer data between computing devices.
- a communication bus may include a control bus, an address bus, and/or a data bus.
- a control bus may control access to the data and/or address bus.
- An address bus may specify a location of where data and/or control may be sent and/or received.
- Data, address, and/or control transfer via a communication bus may be unidirectional.
- Data, address, and/or control transfer via a communication bus may be bidirectional. Data, address, and/or control may be transferred serially and/or in parallel.
- a communication bus may include any conventional control bus, address bus, and/or data bus (e.g., internal bus, expansion bus, local bus, front-side-bus, USB, FireWire, Serial ATA, AGP, PCI express, PCI, HyperTransport, InfiniBand, EISA, NuBus, MicroChannel, SBus, I2C, HIPPI, CAN bus, FutureBus).
- a communication bus may use any protocol, whether conventional or custom (e.g., application specific, proprietary) to transfer data.
- a communication bus may transfer data, address, and/or control using any transmission medium.
- a transmission medium includes any material (e.g., physical) substance capable of propagating waves and/or energy (e.g., optical, electrical, electro-magnetic).
- a network interface enables a computing device to communicate with other devices and/or systems over a network.
- the functions of a network interface may be performed by circuits, logic embedded in hardware, software instructions executable by a processor, or any combination thereof.
- the functions performed by a network interface enable a computing device to communicate with anther device.
- the functions performed by a network interface, whether using hardware or software executed by a processor, may be referred to as services.
- a device may request the services of a communication interface to communicate with a computing device.
- a network interface may communicate via wireless medium and/or a wired medium.
- a network interface may include circuits, logic embedded in hardware, or software instructions executable by a processor (e.g., wireless network interface) for wireless communication.
- a network interface may include circuits, logic embedded in hardware, or software instructions executable by a processor (e.g., wired network interface) for wired communication.
- the circuits, logic embedded in hardware, or software used for a wireless network interface may be the same in whole or in part as the circuits, logic embedded in hardware, or software used for a wired network interface.
- a network interface may communicate using any conventional wired (e.g., LAN, Ethernet) or wireless communication (e.g., Bluetooth, Bluetooth Low Energy, WiFi, ZigBee, 2G, 3G, LTE, WiMax) protocol.
- computing device 300 may include at least one processor 302 and a system memory 304 connected by communication bus 306 .
- Processor 302 , system memory 304 , and communication bus 306 may perform the functions and include the structures of a processor, a system memory, and a communication bus respectively discussed above.
- system memory 304 may be a volatile or nonvolatile computer-readable medium, including but not limited to read only memory (“ROM”), random access memory (“RAM”), EEPROM, and/or flash memory.
- ROM read only memory
- RAM random access memory
- EEPROM electrically erasable programmable read-only memory
- network interface may be performed by processor 302 .
- the network interface 310 illustrated in FIG. 3 may represent one or more wireless interfaces or physical communication interfaces described and illustrated above with respect to particular components of the system 100 .
- the computing device 300 also includes a storage memory 308 .
- services may be accessed using a computing device that does not include means for persisting data to a local storage memory. Therefore, the storage memory 308 depicted in FIG. 3 is represented with a dashed line to indicate that the storage memory 308 is optional.
- the storage memory 308 may be a computer-readable medium that may be volatile or nonvolatile, removable or nonremovable, and implemented using any technology capable of storing information including, but not limited to, a hard drive, solid state drive, CD ROM, DVD, or other disk storage, magnetic cassettes, magnetic tape, and magnetic disk storage.
- FIG. 3 does not show some of the typical components of many computing devices.
- the computing device 300 may include input devices, such as a keyboard, keypad, mouse, microphone, touch input device, touch screen, tablet, and/or the like. Such input devices may be coupled to the computing device 300 by wired or wireless connections including RF, infrared, serial, parallel, Bluetooth, Bluetooth low energy, USB, or other suitable connections protocols using wireless or physical connections.
- the computing device 300 may also include output devices such as a display, speakers, printer, etc. Since these devices are well known in the art, they are not illustrated or described further herein.
- a black dog house is intended to mean a house for a black dog.
- the term “provided” is used to definitively identify an object that not a claimed element of the invention but an object that performs the function of a workpiece that cooperates with the claimed invention.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Quality & Reliability (AREA)
- Computer Security & Cryptography (AREA)
- Data Mining & Analysis (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
A system or method for manipulating audiovisual data using transcript information. The system or method performs the following actions. Creating a computer-generated transcript of audio data from the audiovisual data, the computer-generated transcript includes a plurality of words, at least some words of the plurality of words are associated with a respective timestamp and a confidence score. Receiving a traditional transcript of the audio data, the traditional transcript includes a plurality of words that are not associated with timestamps. Identifying one or more words from the plurality of words of the computer-generated transcript that match words from the plurality of words of the traditional transcript. Associating the timestamp of the one or more words of the computer-generated transcript with the matching word of the traditional transcript. Processing the audiovisual data using the traditional transcript and the associated timestamps.
Description
- The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
-
FIG. 1 is a block diagram that illustrates an example embodiment of a system for interrelating text transcript information with video and/or audio information according to various aspects of the present disclosure; -
FIG. 2 is a flowchart that illustrates an example embodiment of a method of processing video transcription information according to various aspects of the present disclosure; -
FIG. 3 is a block diagram of a computing device suitable for use to implement portions of the system according to the present disclosure; and -
FIG. 4 is a flow chart that illustrates an example embodiment of a method for aligning a human-made transcript to the timing of a machine transcript according to various aspects of the present disclosure. - Body cameras, in-car cameras, wireless microphones, and smart phones have increased the amount of recorded audiovisual information. As used herein, “audiovisual” information, data, or recordings refers to video that includes audio, video that is associated with separate audio of the video scene, or audio alone. The internet, with such sites as youtube.com, has provided an avenue for posting audiovisual recordings for public viewing. Security agencies (e.g., police forces) not only capture audiovisual recordings, but at times release audiovisual recordings or a portion of an audiovisual recording to the public for viewing. A security agency may also prepare and release an audiovisual recording as evidence for use in a proceeding.
- A security agency may release not only an audiovisual recording, but also a written record (e.g., transcript) of the audio portion of the recording. Audio portions of an audiovisual recording may be transcribed in a traditional manner (e.g., by a court reporter, by a transcriptionist) or by a computer (e.g., computer-generated transcription). A transcript includes a written representation of content in the audio portion (e.g., audio data) of the audiovisual recording. Presently, traditional transcripts are generally more accurate than computer-generated transcripts. Traditional transcripts are generally more accurate as to the semantic translation of sounds into proper words for a particular language.
- However, a computer-generated transcript may include a record of the location (e.g., time, position) in the audiovisual data where each word or sound was detected. Traditional prepared transcripts generally do not include additional data that provides the location of each word or sound of the transcript in the audiovisual recording.
- According to various aspects of the present disclosure, traditional transcripts may be used to provide accurate semantics and computer-generated transcripts may be used to locate the words of the audio data to the location in the audiovisual recording where the words occur. Traditional transcripts may be used in combination with computer-generated transcripts so that the words of the traditional transcript may be linked (e.g., tied, associated, aligned) to the location in the audiovisual recording where the word occurs.
- Tying the traditional transcript to the timing of the audiovisual recording enables real-time redaction of portions of the audiovisual data on playback of the recording. Redaction may be performed automatically, with little or no human intervention. Rules may be specified as to the type of material that should be redacted from an audiovisual recording. During a presentation of audiovisual data, if the rules specify that words or sounds in the transcript should be redacted, the presentation may be altered to redact the portion of the audiovisual data that falls within the rules of redaction.
- Further, a transcript in electronic form, whether traditionally or computer-generated, enables a user to search the transcript for particular words such as geographic locations, articles of clothing, weapons, buildings, or other objects. The link to the location of words in the audiovisual data permits a user to search the transcript and find the location in the audiovisual data easily.
- A transcript may include a description of the characteristics the sounds or words of the audio. A description may include the volume (e.g., intensity), tone (e.g., menacing, threatening, helpful, kind), frequency range, or emptions (e.g., anger, elation) of a word or a sound. The description of the audio data may be searched by a user and linked to a location in the audiovisual data.
-
System 100 ofFIG. 1 is an example environment for ordering the creation of transcripts, storing transcripts, generating transcripts, storing audiovisual data, aligning the words of a transcript and/or description of the audio data to locations in audiovisual data, manipulating (e.g. redacting, searching) the audiovisual data using an aligned transcript, generating presentations of the aligned transcript and the audiovisual data, and using an aligned transcript to redact a presentation of an audiovisual recording in accordance with rules that specify material that should be redacted. -
System 100 may include one or more traditionaltranscript service providers 104,evidence management system 102, and one ormore recording systems 106. - A recording system detects physical properties in an environment and records (e.g., stores) the information (e.g., data) regarding the physical properties. Recorded information may be analyzed to determine characteristics of the physical properties detected and recorded.
- Recorded information may relate to an incident (e.g., event, occurrence). Recorded information may provide a record of an incident. Recorded information may be reviewed to provide a reminder of the incident. Recorded information may be used as evidence to prove the occurrence of an incident.
- A recording system may detect and record visual (e.g., video) data and/or audible (e.g., audio) physical properties. Visual and/or audible physical properties detected and recorded may be within the range of vision and/or hearing of a human. Visual and/or audible physical properties detected and recorded may outside the range of vision and/or hearing of a human. The capture and storing of video and/or audio data may be accomplished using any suitable technique. Use of the term video data may refer to both video and audio data together.
- A recording system may create an audiovisual record. Data stored by a recording system may be stored in any suitable format, including but not limited to H.264, MPEG-4, AAC, and WAV. A recording system may convert the recorded information from one format (e.g., analog data, encoding) to another format (e.g., digital data, encoding).
- A recording system may communicate (e.g., transmit, receive) data. A recording system may transmit recorded data to another system. A recording system may include any conventional communication circuitry for transmitting and/or receiving data. A recording system may use any conventional wired (e.g., LAN, Ethernet) or wireless communication (e.g., Bluetooth, Bluetooth Low Energy, WiFi, ZigBee, 2G, 3G, 4G, WiMax) protocol. A recording system may store audiovisual data for a period (e.g., shift, day) then transmit the audiovisual data to another system. A recording system may transmit audiovisual information to another system as it is captured (e.g., live streaming).
-
Recording system 106 performs the functions of recording system discussed herein. In some embodiments,recording system 106 may include a digital camera such as a wearable (e.g., body-worn, carried) camera that records audiovisual data. In some embodiments,recording system 106 includes an in-car camera or dash cam that records audiovisual data.Recording system 106 may include separate recording systems, such as a digital camera and a wireless microphone that cooperate to perform the functions of a recording system. For example, video data from a first camera and audio data from a second camera may be combined and/or used. The act (e.g., function, operation) of recording may use any suitable technique known to one of ordinary skill in the art, and so is not described in further detail herein. - In some implementations,
recording system 106 records audiovisual information then transmits data toevidence management system 102. In some implementations,recording system 106,recording system 106 live streams audiovisual data toevidence management system 102. - An evidence management system may collect and manage information. An evidence management system may receive recorded data from one or more recording systems. An evidence management system may receive transcripts from one or more traditional transcript service providers. An evidence management system may provide recorded data, transcript data, and/or data that has been processed to a person or entity. An evidence management system may communicate with other systems to transmit and receive data. An evidence management system may include any conventional communication circuitry for transmitting and/or receiving data. An evidence management system may use any conventional wired or wireless communication protocol for communicating data.
- An evidence management system may store data. An evidence management system may store recorded data, the traditional transcripts, computer-generated transcripts, and/or alignment data that associates recorded data to transcript data. Recoded data includes audiovisual data. An evidence management system may store and/or manage data in such a manner that it may be used as evidence in a proceeding, such as a legal proceeding.
- An evidence management system may organize stored data according to the recording system that captured the data. An evidence management system may organize stored data according to a particular recorded data (e.g., video). An evidence management system may further organize stored data according to agencies (e.g., groups, organizations). An evidence management system may group captured data for storage according to the agency to which the person using the recording system used to capture the data is employed.
-
Evidence management system 102 is an example of an embodiment of an evidence management system.Evidence management system 102 performs the functions of an evidence management system discussed herein. -
Evidence management system 102 may includealignment engine 108,management engine 110, computer-generatedtranscript engine 112,video data store 114,audio data store 116, computer-generatedtranscript data store 118, traditionaltranscript data store 120, andalignment data store 122. - An evidence management system may perform one or more operations (e.g., functions). An operation may include providing recorded data to a traditional transcript service provider, such as to traditional
transcript service provider 104, associating transcripts to recorded data, aligning traditional transcripts to recorded data, generating a computer-generated transcript of recorded data, and/or providing data to another system. An engine may perform one or more operations of an evidence management system. An engine may perform one or more functions or a single function. An engine may access stored data to perform a function. An engine may generate data for storage. - The term “engine” as used herein refers to, in general, circuitry, logic embodied in hardware and/or software instructions executable by a processor of a computing device. Circuitry includes any circuit and/or electrical/electronic subsystem for performing a function. Logic embedded in hardware includes any circuitry that performs a predetermined operation or predetermined sequence of operations. Examples of logic embedded in hardware include standard logic gates, application specific integrated circuits (“ASICs”), field-programmable gate arrays (“FPGAs”), microcell arrays, programmable logic arrays (“PLAs”), programmable array logic (“PALs”), complex programmable logic devices (“CPLDs”), erasable programmable logic devices (“EPLDs”), and programmable logic controllers (“PLCs”). Logic embodied in (e.g., implemented as) software instructions may be written in any programming language, including but not limited to C, C++, COBOL, JAVA™, PHP, Perl, HTML, CSS, JavaScript, VBScript, ASPX, HDL, and/or Microsoft .NET™ programming languages such as C#. The software for an engine may be compiled into an executable program or written in an interpreted programming language for execution by a suitable interpreter or virtual machine executed by a processing circuit. Engines may be callable (e.g., executable, controllable) from other engines or from themselves.
- Generally, the engines described herein can be merged with other engines, other applications, or may be divided into sub-engines. Engines that are implemented as logic embedded in software may be stored in any type of computer-readable medium. An engine may be stored on and executed by one or more general purpose computers, thus creating a special purpose computer configured to perform the functions of (e.g., provide) the engine.
- The devices and systems illustrated herein may include one or more computing devices configured to perform the functions of the illustrated engines, though the computing devices themselves have not been illustrated in every case for the sake of clarity.
- A computer-generated transcript engine generates a computer-generated transcript. A computer-generated transcript engine may receive audio data, analyze the audio data to identify words of one or more languages, and provide a record of the words identified for storage. A computer-generated transcript engine may include sophisticated algorithms to perform semantic analysis. Semantic analysis may include recognizing the different connotations (e.g., meanings) of words to correctly identify words used in human speech. Semantic analysis may include identifying words and phrases used in the vernacular (e.g., region idioms, gang-related speech) and providing a translated meaning.
- For example, computer-generated
transcript engine 112 may use any suitable speech-to-text algorithms to produce the computer-generated transcript, including but not limited acoustic modeling, language modeling, Hidden Markov models, feedforward artificial neural networks, and recurrent neural networks. - A computer-generated transcript engine may identify the location of words and phrases in the audio data. A timestamp in the computer-generated transcript may indicate a point in time (e.g., location) at which a given recognized word occurred in the audio data. A timestamp may have any suitable format including a time of day, an elapsed time from the beginning of the recording, and an elapsed time from the previously recognized word. A computer-generated transcript engine may record the location of each word and/or phrase identified in the audio data. A computer-generated transcript engine may prepare data that relates (e.g., associates) each word or phrase to its respective location in the audio data. A computer-generated transcript engine may use data that relates the audio data to the video data to relate the location of words and/or phrases to a location in the video data. The data that describes the location of a word and/or a phrase to a location in audiovisual data may be referred to as alignment data. A computer-generated transcript engine may store alignment data. Alignment data may be stored in
alignment data store 122 and/or in computer-generatedtranscript data store 118. Alignment data may be stored with the computer-generated transcript in a single file or separately with information as to how the alignment data relates to the computer-generated transcript. - Because wearable cameras and other types of recording devices in the field may be subject to poor audio quality, including but not limited to having large amounts of noise, having inarticulate speech, having radio chatter or other background noise, computer-generated transcription of audiovisual data may generate inconsistent results. Confidence scores may be associated with words within the computer-generated transcript to indicate the estimated likelihood that the word generated by the computer-generated transcript engine is an accurate transcription of the word in the audio data.
- A management engine may manage the generation and storage of transcripts. A management engine may receive instructions from a user and/or other engines. A management engine may perform a function responsive to an instruction. A management engine may order the generation of a transcript by a traditional transcript service provider, such as traditional
transcript service providers 104, or by a computer-generated transcript engine, such as computer-generatedtranscript engine 112 engine. A management engine may provide audiovisual data to a traditional transcript service provider and/or a computer-generated transcript engine. A management engine may use any conventional communication protocol to transmit audiovisual information. A management engine may track the progress of transcript generation. - A management engine may receive a transcript. A management engine may use any conventional communication protocol to receive a transcript. A management engine may store a transcript. A management engine may associate a transcript with an audiovisual data. A management engine may associate a computer-generated transcript to a traditional transcript.
- A management engine may receive alignment data. A management engine may store alignment data. A management engine may associate alignment data with audiovisual data. A management engine may associate alignment data to a computer-generated transcript and/or a traditional transcript.
- A management engine may associate stored data to one or more recording systems, such as
recording systems 106. A management engine may associate stored data to the recording system that recorded an audio data that was used to generate a transcript. A management engine may associate stored data, transcripts, and/or alignment data to an agency. - A management engine may provide reports regarding the functions it has or will perform.
-
Management engine 110 performs the functions of a management engine discussed herein. - For example,
management engine 110 may provide audio data to one or more traditionaltranscript service providers 104 to have the audio data transcribed.Management engine 110 may receive traditional transcripts back from the traditionaltranscript service providers 104.Management engine 110 may store traditional transcripts in traditionaltranscript data store 120. -
Management engine 110 may provide a platform for agencies that store information inevidence management system 102 to manage purchase of transcription services from traditionaltranscript service providers 104 through theevidence management system 102. An agency may enter into contracts with one or more traditionaltranscript service providers 104. The agency contract may include various terms including service level agreements and price points.Management engine 110 may create accounts for traditionaltranscript service providers 104 within theevidence management system 102 to allow the agency to request transcripts from the traditionaltranscript service providers 104 according to their agency contracts. - A user (e.g., officer) associated with an agency may choose from traditional
transcript service providers 104 that have a contract with the agency.Management engine 110 may provide audio data to a traditionaltranscript service provider 104 in response to a request from a user.Management engine 110 may seek approval from a supervisor of the user before requesting a transcript from the traditionaltranscript service provider 104. Themanagement engine 110 may wait for the approval of the supervisor before it sends out audio data to the traditionaltranscript service provider 104. -
Management engine 110 may create a unique identifier for each transcription order.Evidence management system 102 may provide the unique identifier to the requesting agency to allow them to obtain customer service directly from traditionaltranscript service provider 104. Traditionaltranscript service provider 104 may use information obtained from theevidence management system 102, such as the unique identifier, supervisor name, and user name, to bill the agency directly for transcription services. -
Management engine 110, as opposed to a user at an agency, may select a traditionaltranscript service provider 104, ormanagement engine 110 may be instructed on which traditionaltranscript service provider 104 to use.Management engine 110 may select traditionaltranscript service provider 104 using any algorithm (e.g., round-robin) or using any criteria (e.g., cost, throughput, loading at provider, highest accuracy). - A traditional
transcript service provider 104 may be capable of producing a highly accurate transcript of audio data. In some embodiments, audio data only or the entire audiovisual data may be transmitted to a computing device (e.g., computer) of a traditionaltranscript service provider 104. Traditionaltranscript service provider 104 may use one or more computing devices and/or mechanical devices to allow an operator (e.g., a person trained to provide transcription services) to listen to the audio data and enter a transcription of speech and/or other audio elements within the audio data. In some embodiments,evidence management system 102 may stream audio data to a computing device of an operator, and may provide a web-based interface, an app, a desktop application, or an application programming interface (API) for the operator to enter the transcription. Further description of examples of the interaction between theevidence management system 102 and the traditionaltranscript service providers 104 is provided below. -
Management engine 110 may transcode the audio data into a format desired by a giventranscription service provider 104, such as WAV.Management engine 110 may include additional metadata along with the audio data, including but not limited to an owning user, an owning agency, and/or a desired type of transcript (e.g., verbatim or standard). - A traditional transcript from the traditional
transcript service provider 104 may be provided in any suitable format including but not limited to a text file and a word processing document. A traditional transcript may include explanatory information including but not limited to the identity of the speakers, a description of noises and/or sounds, and/or the meaning of colloquial language or slang. -
Management engine 110 may also manage (e.g., control) a process (e.g., work flow) for making revisions to the traditional transcript. For example, as a prosecution team and a defense team argue over the exact words that should appear in a given transcript,management engine 110 may track changes made to the transcript, the identity of the person making the change, and any information as to the reason for the change. In another example, a judge may order certain portions of the transcript stricken.Management engine 110 may make and track such changes to the traditional transcript when instructed to do so. - An alignment engine aligns data. An alignment engine may identify where particular data in one set of data (e.g., file) corresponds to particular data in another file. An alignment engine may record how the data of one file aligns with the data of another file. An alignment engine may include the data from one or more of the files in the file that stores alignment data.
- An alignment engine may align the words and/or phrases of a traditional transcript to some or all of the words or phrases of a computer-generated transcript. Using the alignment data for the alignment between the traditional transcript and the computer-generated transcript, an alignment engine may align the traditional transcript to some or all of the identified locations in the audio file. An alignment engine may use confidence scores provided by computer-generated
transcript engine 112 to aid in alignment. Using the alignment information between the audio data and the video data, an alignment engine may align the traditional transcript to video data. - The data from one or more of the above alignments may be referred to as enhanced alignment data.
- Due to the present quality of computer-generated transcripts, not all of the words in a computer-generated transcript are likely to be intelligible or recognizable as words of a known language. Depending on the quality of the audio data, an operator transcribing audio data may not be able to recognize all of the words spoken. Accordingly, an alignment engine may not be able to align all of the words in the traditional transcript to words in the computer-generated transcript. An alignment engine may use any algorithm for spacing words in the traditional transcript that do not match words in the computer-generated transcript.
- For example,
alignment engine 108 compares the words in the traditional transcript to the words in the computer-generated transcript to find matches.Alignment engine 108 may compare the words of the traditional transcript to only those words in the computer-generated transcript having a confidence score greater than a threshold.Alignment engine 108 may compare single words from the traditional transcript to words of the computer-generated transcript to find a match.Alignment engine 108 may require that a group of words (e.g., sequence) from the traditional transcript match the same group of words in the same order before identifying the words as matching. -
Alignment engine 108 may space unmatched words equally between the matched words.Alignment engine 108 may compare the intensity (e.g., volume) of the audio data to words that commonly spoken loudly or try to identify the sounds of individual letters or syllables in the words of the transcript to letter or syllable sounds in the audio data to identify a location of unmatched words. - As understood by one of ordinary skill in the art, a “data store” as described herein may be any suitable device configured to store data for access by a computing device. A data store receives data. A data store retains (e.g., stores) data. A data store retrieves data. A data store provides data for use by a system, such as an engine. A data store may organize data for storage. A data store may organize data as a database for storage and/or retrieval. The operations of organizing data for storage in or retrieval from a database of a data store may be performed by a data store. A data store may include a repository for persistently storing and managing collections of data. A data store may store files that are not organized in a database. Data in a data store may be stored in computer-readable medium.
- One example of a data store suitable for use with the high capacity needs of the
evidence management system 102 is a highly reliable, high-speed relational database management system (“RDBMS”) executing on one or more computing devices and accessible over a high-speed network. However, any other suitable storage technique and/or device capable of quickly and reliably providing the stored data in response to queries may be used, such as a key-value store and an object database. - Data stores 114-122 perform the functions of a data store discussed herein. A data store may be implemented using any computer-readable medium. An engine (e.g., 108-112) or computing device of
management system 102 may access data store 114-122 locally (e.g., via data bus), over a network, and/or as a cloud-based service. - In an example of a data store suitable for use with
recording systems 106, which includes reliable storage but also low overhead, is a file system or database management system that stores data in files (or records) on a computer-readable medium such as flash memory, random access memory (RAM), or hard disk drives. - One of ordinary skill in the art will recognize that separate data stores described herein may be combined into a single data store, and/or a single data store described herein may be separated into multiple data stores, without departing from the scope of the present disclosure.
- A computer-readable medium may store, retrieve, and/or organize data. As used herein, the term “computer-readable medium” includes any storage medium that is readable by a machine (e.g., computer, processor, processing circuit). Storage medium includes any devices, materials, and/or structures used to place, keep, and retrieve data (e.g., information). A storage medium may be volatile or non-volatile. A storage medium may include any semiconductor (e.g., RAM, ROM, EPROM, Flash), magnetic (e.g., hard disk drive), optical technology (e.g., CD, DVD), or combination thereof. Computer-readable medium includes storage medium that is removable or non-removable from a system. Computer-readable medium may store any type of information, organized in any manner, and usable for any purpose such as computer readable instructions, data structures, program modules, or other data.
-
FIG. 2 is a flowchart that illustrates an example embodiment of a method of processing video transcript information according to various aspects of the present disclosure. From a start block, themethod 200 proceeds to block 202, where one ormore recording systems 106 record audiovisual data, and transmit the data to anevidence management system 102. In some embodiments, therecording systems 106 may be capable of wireless communication, and may transmit recorded data to theevidence management system 102 using any suitable transmission technology including but not limited to WiFi, 3G, 4G, LTE, and WiMAX. In some embodiments, therecording systems 106 may be physically connected to a dock via any suitable type of wired connection including but not limited to USB, FireWire, and a 3.5 mm connector. The dock may then obtain the recorded data from therecording system 106 and then transmit the recorded data to theevidence management system 102 via a network. Further description of devices and techniques for transmitting recorded data to anevidence management system 102 from arecording system 106 are described in commonly owned, co-pending U.S. patent application Ser. No. 15/210,060, filed Jul. 14, 2016, the entire disclosure of which is hereby incorporated by reference herein for all purposes. - At
block 204, theevidence management system 102 stores the video data (if any) of the audiovisual data in avideo data store 114 and stores the audio data of the audiovisual data in anaudio data store 116. In some embodiments, the video data and the audio data may be stored together. In some embodiments, the video data and audio data may be stored in separate data stores in order to allow the audio data alone to be transmitted to a traditionaltranscript service provider 104, so the video data and audio data may be associated with each other by a unique identifier or using another suitable technique. - Next, at
block 206, a computer-generatedtranscript engine 112 of theevidence management system 102 creates a computer-generated transcript that includes timestamps and confidence scores for at least some words in the audio data, and atblock 208, themachine transcription engine 112 stores the computer-generated transcript in a computer-generatedtranscript data store 118. In some embodiments, the computer-generated transcript is stored in a machine-readable format, including but not limited to JavaScript Object Notation (“JSON”) and extensible markup language (“XML”). In some embodiments, all audio data may be transcribed by the computer-generatedtranscript engine 112. In some embodiments, only audio data tagged with a given type or stored in a given category may be transcribed by the computer-generatedtranscript engine 112. - At
block 210, amanagement engine 110 of theevidence management system 102 obtains a traditional transcript of the audio data from a traditionaltranscript service provider 104. - At
block 212, themanagement engine 110 stores the traditional transcript in a traditionaltranscript data store 120. - The
method 200 then proceeds to block 214, where analignment engine 108 of theevidence management system 102 matches one or more high-confidence words from the computer-generated transcript to words from the traditional transcription. An example of a method suitable for use inblock 214 is illustrated inFIG. 4 and described further below. - At
block 216, thealignment engine 108 adds the timestamps from the matching high-confidence words to the words of the traditional transcript. The timestamps may be added to the traditional transcript using any suitable format. In some embodiments, the traditional transcription may be reformatted into a JSON format, an XML format, or another machine-readable format in order to associate the timestamps with the words. In some embodiments, a separate record of matching words, timestamps, and their location within the traditional transcript may be created. The separate record may be stored in thealignment data store 122, or may be stored along with the traditional transcript in the traditionaltranscript data store 120. In some embodiments, at least some of the actions described inblock 216 may be performed in the method suitable for use inblock 214. - At
block 218, thealignment engine 108 associates the traditional transcript with the audiovisual data using the timestamps as guideposts. The association may include creating an overlay that presents the text from the traditional transcript at a pace indicated by the guidepost timestamps. Using an overlay allows theevidence management system 102 to generate an MPEG, DVD, or other video presentation format that would always present the transcript text in the same manner. The association may include creating a subtitle track and/or SRT file that presents the text from the traditional transcript at a pace indicated by the guidepost timestamps. Using a subtitle track allows a viewer to turn the transcript display on and off. The pace of display between the guidepost timestamps may be determined using syllables or other characteristics detected in the audio data. - At
block 220, themanagement engine 110 processes the audiovisual data based on analysis of the traditional transcript. The actions ofblock 220 are optional, and either may not be performed, or may be performed separately from the rest of themethod 200 once a traditional transcript has been aligned to audiovisual data. -
Management engine 110 may use the aligned traditional transcript to automatically redact portions of the audiovisual data. For example,management engine 110 may detect names, addresses, profanity, and/or other keywords, or pattern-based portions of text from the aligned traditional transcript.Management engine 110 may use the associated timestamps to automatically redact the detected portions of the audiovisual data. As another example, themanagement engine 110 may provide a search interface that allows the full text of traditional transcripts to be searched, wherein the search results will link directly to the relevant portion of audiovisual data using the timestamps. As yet another example, themanagement engine 110 may present a web-based interface, an app, a desktop application, an API, or another type of interface that allows a user to select a portion of the transcript to be redacted, and themanagement engine 110 may automatically redact the associated portions of the audiovisual data. - The
method 200 then proceeds to an end block and terminates. -
FIG. 4 is a flowchart that illustrates a method according to various aspects of the present disclosure.Method 400 may be performed by one or more engines, for example a computer-generatedtranscript engine 112 and analignment engine 108 ofevidence management system 102, or by a computing device ofevidence management system 102.Method 400 includes blocks start 402, select 404, search present 408, present found 410, simple 412, search prior 414, prior found 416, set yes 418, set no 420, search next 422, next found 424, test prior 426,associate 428, complete 430, and end 432. -
Evidence management system 102 may execute one or more of the blocks ofmethod 400 in parallel.Evidence management system 102 may begin execution of a block when it has received the data required to perform the function of the block. -
Method 400 begins execution withstart 402. Start 402 may initialize any variable needed to performmethod 400. Start 402 may retrieve, for example from a data store, any preference information provided by a user such as whether all three words (e.g., prior, present, next) must match to associate a time or the threshold for the confidence score to determine whether a word from the traditional transcript matches the computer-generated transcript. Execution continues with select 404. -
Select 404 accesses the computer-generated transcript to select three contiguous words. The meaning of the term “contiguous” depends on whether the confidence score of the words of the computer-generated transcript is considered. If the confidence score is not considered, the term “contiguous” means that there are no words between the selected words. If the confidence score is considered, the term “contiguous” means that the selected words each have a confidence score greater than the threshold and all of the words between any of the selected words, if any, have a confidence score of less than the threshold. Edge cases, such as first starting or endingmethod 400 are not described herein. “Prior word” refers to the first word in the sequence of three contiguous words, “present word” refers to the word that follows “prior word” and comes before “next word” in the transcript. The words “prior”, “present”, and “next” refer to the order of words in the order in which the words would be read in the transcript. After selecting three continuous words from the computer-generated transcript, execution moves to threshold 406. - Search present 408 searches for the present word in the traditional transcript. Search present 408 uses any conventional technique for searching a digital file for a particular word. The word from the traditional transcript that matches the present word is referred to herein as the identified word. Execution moves to present found 410. Present found 410 makes a decision based on whether the search present 408 found the present word in the traditional transcript. If the present word was found, execution goes to simple 412. If the present word was not found, execution goes to select 404.
-
Simple 412 determines whethermethod 400 should make decisions based on finding present word alone in the traditional transcript or whether prior word, next word, or both must also be found in the traditional transcript in the proper order. A user or a method executed by an engine may determine whether the search performed is simple or more involved. - If only a simple search is to be performed, execution moves to associate 428. If more than one word must be found in the traditional transcript, execution moves to search prior 414.
- Search prior 414 searches for the prior word in the traditional transcript. If there are no words between the prior word and the present word in the computer-generated transcript, search prior 414 may select the word that is proximate the present word in the traditional transcript; however, if the proximate word does not match the previous word, search prior 414 may need to search the traditional transcript backwards from the present word just in case the traditional transcript contains a description between the two words in the traditional transcript. If search prior must search for a march to the previous word, search prior 414 may use any conventional technique for searching a digital file for a particular word. Execution moves to prior found 416. A condition of indicating that the prior word was found may include determining that any word that matches prior word is contiguous to the word that matched present word.
- Prior found 416 makes a decision based on whether the search prior 414 found the prior word in the traditional transcript. If the prior word was found, execution goes to set yes 418. If the prior word was not found or the requirement for contiguousness was not met, execution goes to set no 420.
- Set yes 418 sets a variable to indicate that the prior word was found in the traditional transcript. Set no 420 sets a variable to indicate that the prior word was not found in the traditional transcript or the contiguousness requirements were not met. Execution from set yes 418 and set no 420 goes to search next 422.
- Search next 422 searches for the next word in the traditional transcript. If there are no words between the prior word and the present word in the computer-generated transcript, search next 422 may select the word that is just ahead the present word in the traditional transcript. If the word from the traditional transcript does not match the next word, as discussed above, search next 422 may any conventional technique for searching a digital file for a particular word. Execution moves to next found 424. A condition of indicating that the next word was found may include determining that any word that matches next word is continuous to the word that matched present word.
- Next found 424 makes a decision based on whether search next 422 found the next word in the traditional transcript. If the next word was found, at least two of the three words were found in the traditional transcript and execution goes to associate 428. If the next word was not found or the requirement for contiguousness was not met, execution goes to test prior 426.
- Test prior 426 makes a decision based on whether search prior 414 found the prior word in the traditional transcript. If the prior word was found, then two of the three words were found in the traditional transcript and execution goes to associate 428.
-
Associate 428 associates the time that present word occurs in the audio data to the identified word in the traditional transcript. - As discussed above, when a computer-generated transcript is made, the location of each word that is recognized in the data file may be recorded. As further discussed above, the location of a word may be the time the word occurs in the audio data, the number of words or syllables before the particular word in the audio data, the time before or after a unique sound in the audio data, or any other method for determining the location of a word in the audio file.
-
Associate 428 associates the location, in this case time, of the present word to the identified word in the traditional transcript. Associating may include altering the traditional transcript to include the time in a manner that it relates to the identified word or making a separate record that relates the content of the traditional transcript and the identified times. - It is possible that not all words in the traditional transcript will be associated with a time from the computer-generated transcript. The words in the traditional transcript that are not associated with a time from the computer-generated transcript may be assigned a time that is in sequential order with the associated times of the words before and after.
- In complete 430,
evidence management system 102 determines whether the entire computer-generated transcript has been processed. If the words of the computer-generated transcript have all been assessed and skipped or compared to words in the traditional transcript, execution goes to end 432 where the method ends. Otherwise, execution returns to select 404. - In some embodiments, other techniques may be used to determine words that match between the traditional transcript and the computer-generated transcript. For example, the same word has to be found in the traditional transcript within a given distance of an expected position of the word from the computer-generated transcript in order to be considered a match. Stated differently, if the word “dog” is determined to be a high-confidence word, and it is the 500th word in the computer-generated transcript, a match in the traditional transcript may be the word “dog” that appears at either the 500th word or within a predetermined number of words from the 500th word. In some embodiments, the order or position of previously matched words may be used to further enhance the ability to find matching words. In some embodiments, the correlation between the low-confidence words and the unmatched words in the traditional transcript may be used for machine learning to improve the quality of a subsequent computer-generated transcript.
- A computing device may perform a function. A computing device may provide a result of performing a function. A computing device may receive information, manipulate the received information, and provide the manipulated information. A computing device may execute a stored program to perform a function.
- A computing device may provide and/or receive digital data via a conventional bus using any conventional protocol. A computing device may provide and/or receive digital data via a network connection. A computing device may store information and retrieve stored information. Information received, stored, and/or manipulated by the computing device may be used to perform a function and/or to perform a stored program.
- A computing device may control the operation and/or function of other circuits and/or components of a system. A computing device may receive status information regarding the operation of other components, perform calculations with respect to the status information, and provide commands (e.g., instructions) to one or more other components for the component to start operation, continue operation, alter operation, suspend operation, or cease operation. Commands and/or status may be communicated between a computing device and other circuits and/or components via any type of buss including any type of conventional data/address bus.
- For example,
FIG. 3 is a block diagram that illustrates aspects ofcomputing device 300 appropriate for use as a computing device of the present disclosure.Computing device 300 performs the functions of a computing device discussed above.Computing device 300 may includeprocessor 102,system memory 304,communication bus 106,storage memory 108, andnetwork interface circuit 310. - While multiple different types of computing devices were discussed above,
computing device 300 describes various elements that are common to many different types of computing devices. WhileFIG. 3 is described with reference to a computing device that is implemented as a device on a network, the description below is applicable to servers, personal computers, mobile phones, smart phones, tablet computers, embedded computing devices, and other devices that may be used to implement portions of embodiments of the present disclosure. Moreover, those of ordinary skill in the art and others will recognize that thecomputing device 300 may be any one of any number of currently available or yet to be developed devices. - A processor, also referred to as a processing circuit, includes any circuitry and/or electrical/electronic subsystem for performing a function. A processing circuit may include circuitry that performs (e.g., executes) a stored program. A processing circuit may include a digital signal processor, a microcontroller, a microprocessor, an application specific integrated circuit, a programmable logic device, logic circuitry, state machines, MEMS devices, signal conditioning circuitry, communication circuitry, a radio, data busses, address busses, and/or a combination thereof in any quantity suitable for performing a function and/or executing one or more stored programs.
- A processing circuit may further include conventional passive electronic devices (e.g., resistors, capacitors, inductors) and/or active electronic devices (op amps, comparators, analog-to-digital converters, digital-to-analog converters, programmable logic). A processing circuit may include conventional data buses, output ports, input ports, timers, memory, and arithmetic units.
- A processing circuit may provide and/or receive electrical signals whether digital and/or analog in form. A processing circuit may provide and/or receive digital data.
- A processing circuit may have a low power state in which only a portion of its circuits operate or it performs only certain function. A processing circuit may be switched (e.g., awoken) from a low power state to a higher power state in which more or all of its circuits operate or it performs additional certain functions.
- A system memory may store data and/or program modules that are immediately accessible to and/or are currently being operated on by the processing circuit. A system memory may be a computer-readable medium. In this regard, a processor may perform or control the operation of a computing device by executing a stored program.
- A communication bus transfers data between the components of a computing device. A communication bus may transfer data between computing devices. A communication bus may include a control bus, an address bus, and/or a data bus. A control bus may control access to the data and/or address bus. An address bus may specify a location of where data and/or control may be sent and/or received. Data, address, and/or control transfer via a communication bus may be unidirectional. Data, address, and/or control transfer via a communication bus may be bidirectional. Data, address, and/or control may be transferred serially and/or in parallel.
- A communication bus may include any conventional control bus, address bus, and/or data bus (e.g., internal bus, expansion bus, local bus, front-side-bus, USB, FireWire, Serial ATA, AGP, PCI express, PCI, HyperTransport, InfiniBand, EISA, NuBus, MicroChannel, SBus, I2C, HIPPI, CAN bus, FutureBus). A communication bus may use any protocol, whether conventional or custom (e.g., application specific, proprietary) to transfer data.
- A communication bus may transfer data, address, and/or control using any transmission medium. A transmission medium includes any material (e.g., physical) substance capable of propagating waves and/or energy (e.g., optical, electrical, electro-magnetic).
- A network interface enables a computing device to communicate with other devices and/or systems over a network. The functions of a network interface may be performed by circuits, logic embedded in hardware, software instructions executable by a processor, or any combination thereof. The functions performed by a network interface enable a computing device to communicate with anther device. The functions performed by a network interface, whether using hardware or software executed by a processor, may be referred to as services. A device may request the services of a communication interface to communicate with a computing device.
- A network interface may communicate via wireless medium and/or a wired medium. A network interface may include circuits, logic embedded in hardware, or software instructions executable by a processor (e.g., wireless network interface) for wireless communication. A network interface may include circuits, logic embedded in hardware, or software instructions executable by a processor (e.g., wired network interface) for wired communication. The circuits, logic embedded in hardware, or software used for a wireless network interface may be the same in whole or in part as the circuits, logic embedded in hardware, or software used for a wired network interface. A network interface may communicate using any conventional wired (e.g., LAN, Ethernet) or wireless communication (e.g., Bluetooth, Bluetooth Low Energy, WiFi, ZigBee, 2G, 3G, LTE, WiMax) protocol.
- In a basic configuration,
computing device 300 may include at least oneprocessor 302 and asystem memory 304 connected bycommunication bus 306. -
Processor 302,system memory 304, andcommunication bus 306 may perform the functions and include the structures of a processor, a system memory, and a communication bus respectively discussed above. - Depending on the configuration of
computing device 300,system memory 304 may be a volatile or nonvolatile computer-readable medium, including but not limited to read only memory (“ROM”), random access memory (“RAM”), EEPROM, and/or flash memory. - Some or all of the functions of network interface may be performed by
processor 302. As will be appreciated by one of ordinary skill in the art, thenetwork interface 310 illustrated inFIG. 3 may represent one or more wireless interfaces or physical communication interfaces described and illustrated above with respect to particular components of thesystem 100. - In the embodiment depicted in
FIG. 3 , thecomputing device 300 also includes astorage memory 308. However, services may be accessed using a computing device that does not include means for persisting data to a local storage memory. Therefore, thestorage memory 308 depicted inFIG. 3 is represented with a dashed line to indicate that thestorage memory 308 is optional. In any event, thestorage memory 308 may be a computer-readable medium that may be volatile or nonvolatile, removable or nonremovable, and implemented using any technology capable of storing information including, but not limited to, a hard drive, solid state drive, CD ROM, DVD, or other disk storage, magnetic cassettes, magnetic tape, and magnetic disk storage. - Suitable implementations of computing devices that include a
processor 302,system memory 304,communication bus 306,storage memory 308, andnetwork interface circuit 310 are known and commercially available. For ease of illustration and because it is not important for an understanding of the claimed subject matter,FIG. 3 does not show some of the typical components of many computing devices. In this regard, thecomputing device 300 may include input devices, such as a keyboard, keypad, mouse, microphone, touch input device, touch screen, tablet, and/or the like. Such input devices may be coupled to thecomputing device 300 by wired or wireless connections including RF, infrared, serial, parallel, Bluetooth, Bluetooth low energy, USB, or other suitable connections protocols using wireless or physical connections. Similarly, thecomputing device 300 may also include output devices such as a display, speakers, printer, etc. Since these devices are well known in the art, they are not illustrated or described further herein. - While the preferred embodiment of the invention have been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention. Examples listed in parentheses may be used in the alternative or in any practical combination. As used in the specification and claims, the words ‘comprising’, ‘comprises’, ‘including’, ‘includes’, ‘having’, and ‘has’ introduce an open-ended statement of component structures and/or functions. In the specification and claims, the words ‘a’ and ‘an’ are used as indefinite articles meaning ‘one or more’. When a descriptive phrase includes a series of nouns and/or adjectives, each successive word is intended to modify the entire combination of words preceding it. For example, a black dog house is intended to mean a house for a black dog. In the claims, the term “provided” is used to definitively identify an object that not a claimed element of the invention but an object that performs the function of a workpiece that cooperates with the claimed invention.
- The location indicators “herein”, “hereunder”, “above”, “below”, or other word that refer to a location, whether specific or general, shall be construed to refer to any location in the specification.
Claims (18)
1. A method performed by a computing device for manipulating audiovisual data using transcript information, the method comprising:
creating a computer-generated transcript of audio data from the audiovisual data, the computer-generated transcript includes a plurality of words, at least some words of the plurality of words are associated with a respective timestamp and a confidence score;
receiving a traditional transcript of the audio data, the traditional transcript includes a plurality of words that are not associated with timestamps;
identifying one or more words from the plurality of words of the computer-generated transcript that match words from the plurality of words of the traditional transcript;
associating the timestamp of the one or more words of the computer-generated transcript with the matching word of the traditional transcript; and
processing the audiovisual data using the traditional transcript and the associated timestamps.
2. The method of claim 1 wherein processing the audiovisual data comprises generating a presentation of the traditional transcript for display along with a presentation of the audiovisual data using the timestamps associated with the traditional transcript to determine a pace of display of the traditional transcript.
3. The method of claim 2 wherein generating the presentation of the traditional transcript comprises generating at least one of a subtitle track and generating a video overlay.
4. The method of claim 1 wherein processing the audiovisual data comprises:
determining a selection of one or more words of the traditional transcript;
identifying a portion of the audiovisual data that corresponds to the selected one or more words based on the timestamps; and
processing the identified portion of the audiovisual data.
5. The method of claim 4 wherein processing the identified portion of the audiovisual data comprises at least one of redacting the identified portion and presenting the identified portion.
6. The method of claim 4 wherein determining the selection of one or more words of the traditional transcript comprises at least one of receiving a selection of one or more words of the traditional transcript via a user interface and searching the traditional transcript for the one or more words.
7. The method of claim 1 wherein identifying one or more matching words comprises identifying one or more words of the plurality of words of the computer-generated transcript that are associated with confidence scores higher than a predetermined threshold.
8. A method performed by a computing device for manipulating audiovisual data, the method comprising:
obtaining a transcript of audio data from the audiovisual data, the transcript includes a plurality of words, two or more words of the transcript include a respective timestamp;
determining a selection of one or more words of the transcript;
identifying a portion of the audiovisual data corresponding to the selected one or more words in accordance with the timestamps; and
manipulating the identified portion of the audiovisual data.
9. The method of claim 8 wherein manipulating the identified portion of the audiovisual data comprises redacting the identified portion.
10. The method of claim 8 wherein manipulating the identified portion of the audiovisual data comprises creating a clip that includes the identified portion.
11. The method of claim 8 wherein manipulating the identified portion of the audiovisual data comprises presenting the identified portion.
12. The method of claim 8 wherein determining the selection of one or more words of the transcript comprises receiving a selection of the one or more words via a user interface.
13. The method of claim 8 wherein determining the selection of one or more words of the transcript comprises identifying within words of the transcript one or more of profanity, names, and addresses.
14. The method of claim 8 wherein determining the selection of one or more words of the transcript comprises searching the transcript for the one or more words.
15. The method of claim 1 wherein determining the selection of one or more words of the transcript comprises identifying one or more words of the transcript that are associated with confidence scores higher than a predetermined threshold.
16. A system for processing transcripts of audiovisual data, the system comprising:
one or more recording systems each configured to generate audiovisual data; and
an evidence management system having one or more computing devices configured to:
store audio data and video data received from the recording systems;
transmit audio data to a plurality of traditional transcript service providers;
receive traditional transcripts of the audio data from the traditional transcript service providers; and
store the traditional transcripts of the audio data.
17. The system of claim 16 wherein transmitting audio data to the plurality of traditional transcript service providers comprises:
determining a selection of a traditional transcript service provider from the plurality of traditional transcript service providers, the selection provided by a user of the evidence management system; and
transmitting audio data associated with the user to the selected traditional transcript service provider.
18. The system of claim 17 further comprising obtaining authorization from a supervisor of the user before transmitting the audio data associated with the user to the selected traditional transcript service provider.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/677,416 US20180130484A1 (en) | 2016-11-07 | 2017-08-15 | Systems and methods for interrelating text transcript information with video and/or audio information |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662418613P | 2016-11-07 | 2016-11-07 | |
US15/677,416 US20180130484A1 (en) | 2016-11-07 | 2017-08-15 | Systems and methods for interrelating text transcript information with video and/or audio information |
Publications (1)
Publication Number | Publication Date |
---|---|
US20180130484A1 true US20180130484A1 (en) | 2018-05-10 |
Family
ID=62064193
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/677,416 Abandoned US20180130484A1 (en) | 2016-11-07 | 2017-08-15 | Systems and methods for interrelating text transcript information with video and/or audio information |
US15/677,399 Active US10755729B2 (en) | 2016-11-07 | 2017-08-15 | Systems and methods for interrelating text transcript information with video and/or audio information |
US16/289,058 Active 2037-08-19 US10943600B2 (en) | 2016-11-07 | 2019-02-28 | Systems and methods for interrelating text transcript information with video and/or audio information |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/677,399 Active US10755729B2 (en) | 2016-11-07 | 2017-08-15 | Systems and methods for interrelating text transcript information with video and/or audio information |
US16/289,058 Active 2037-08-19 US10943600B2 (en) | 2016-11-07 | 2019-02-28 | Systems and methods for interrelating text transcript information with video and/or audio information |
Country Status (2)
Country | Link |
---|---|
US (3) | US20180130484A1 (en) |
WO (1) | WO2018084910A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020028760A1 (en) * | 2018-08-02 | 2020-02-06 | Veritone, Inc. | System and method for neural network orchestration |
CN113314124A (en) * | 2021-06-15 | 2021-08-27 | 宿迁硅基智能科技有限公司 | Text output method and system, storage medium and electronic device |
US11138970B1 (en) * | 2019-12-06 | 2021-10-05 | Asapp, Inc. | System, method, and computer program for creating a complete transcription of an audio recording from separately transcribed redacted and unredacted words |
US20220054049A1 (en) * | 2018-12-21 | 2022-02-24 | Universite De Montpellier | High-precision temporal measurement of vibro-acoustic events in synchronisation with a sound signal on a touch-screen device |
US11521639B1 (en) | 2021-04-02 | 2022-12-06 | Asapp, Inc. | Speech sentiment analysis using a speech sentiment classifier pretrained with pseudo sentiment labels |
US20230141096A1 (en) * | 2021-11-11 | 2023-05-11 | Sorenson Ip Holdings, Llc | Transcription presentation |
US11763803B1 (en) | 2021-07-28 | 2023-09-19 | Asapp, Inc. | System, method, and computer program for extracting utterances corresponding to a user problem statement in a conversation between a human agent and a user |
US20230360640A1 (en) * | 2022-05-03 | 2023-11-09 | Microsoft Technology Licensing, Llc | Keyword-based dialogue summarizer |
US12067363B1 (en) | 2022-02-24 | 2024-08-20 | Asapp, Inc. | System, method, and computer program for text sanitization |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10785385B2 (en) * | 2018-12-26 | 2020-09-22 | NBCUniversal Media, LLC. | Systems and methods for aligning text and multimedia content |
US12020708B2 (en) * | 2020-10-12 | 2024-06-25 | SoundHound AI IP, LLC. | Method and system for conversation transcription with metadata |
US20220189501A1 (en) | 2020-12-16 | 2022-06-16 | Truleo, Inc. | Audio analysis of body worn camera |
Family Cites Families (72)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2986345B2 (en) * | 1993-10-18 | 1999-12-06 | インターナショナル・ビジネス・マシーンズ・コーポレイション | Voice recording indexing apparatus and method |
US6144938A (en) * | 1998-05-01 | 2000-11-07 | Sun Microsystems, Inc. | Voice user interface with personality |
US6757362B1 (en) * | 2000-03-06 | 2004-06-29 | Avaya Technology Corp. | Personal virtual assistant |
US6263308B1 (en) * | 2000-03-20 | 2001-07-17 | Microsoft Corporation | Methods and apparatus for performing speech recognition using acoustic models which are improved through an interactive process |
US6912498B2 (en) * | 2000-05-02 | 2005-06-28 | Scansoft, Inc. | Error correction in speech recognition by correcting text around selected area |
US6505153B1 (en) * | 2000-05-22 | 2003-01-07 | Compaq Information Technologies Group, L.P. | Efficient method for producing off-line closed captions |
US7966187B1 (en) * | 2001-02-15 | 2011-06-21 | West Corporation | Script compliance and quality assurance using speech recognition |
GB2372864B (en) * | 2001-02-28 | 2005-09-07 | Vox Generation Ltd | Spoken language interface |
GB2381688B (en) | 2001-11-03 | 2004-09-22 | Dremedia Ltd | Time ordered indexing of audio-visual data |
US6766294B2 (en) * | 2001-11-30 | 2004-07-20 | Dictaphone Corporation | Performance gauge for a distributed speech recognition system |
US7103542B2 (en) * | 2001-12-14 | 2006-09-05 | Ben Franklin Patent Holding Llc | Automatically improving a voice recognition system |
EP1517549A3 (en) * | 2003-08-27 | 2005-04-20 | Matsushita Electric Industrial Co., Ltd. | Video displaying system |
US20050132420A1 (en) | 2003-12-11 | 2005-06-16 | Quadrock Communications, Inc | System and method for interaction with television content |
US7406414B2 (en) * | 2003-12-15 | 2008-07-29 | International Business Machines Corporation | Providing translations encoded within embedded digital information |
US20050137867A1 (en) * | 2003-12-17 | 2005-06-23 | Miller Mark R. | Method for electronically generating a synchronized textual transcript of an audio recording |
US8335688B2 (en) * | 2004-08-20 | 2012-12-18 | Multimodal Technologies, Llc | Document transcription system training |
US7844464B2 (en) | 2005-07-22 | 2010-11-30 | Multimodal Technologies, Inc. | Content-based audio playback emphasis |
US8412521B2 (en) * | 2004-08-20 | 2013-04-02 | Multimodal Technologies, Llc | Discriminative training of document transcription system |
US20080255837A1 (en) * | 2004-11-30 | 2008-10-16 | Jonathan Kahn | Method for locating an audio segment within an audio file |
GB0503162D0 (en) * | 2005-02-16 | 2005-03-23 | Ibm | Method and apparatus for voice message editing |
US8194641B2 (en) | 2005-03-28 | 2012-06-05 | Cisco Technology, Inc. | Method and system for operating a communication service portal |
US20070011012A1 (en) | 2005-07-11 | 2007-01-11 | Steve Yurick | Method, system, and apparatus for facilitating captioning of multi-media content |
US20070244700A1 (en) | 2006-04-12 | 2007-10-18 | Jonathan Kahn | Session File Modification with Selective Replacement of Session File Components |
US7792675B2 (en) * | 2006-04-20 | 2010-09-07 | Vianix Delaware, Llc | System and method for automatic merging of multiple time-stamped transcriptions |
US20080177536A1 (en) | 2007-01-24 | 2008-07-24 | Microsoft Corporation | A/v content editing |
KR20080084303A (en) | 2007-03-16 | 2008-09-19 | 어뉴텍코리아 주식회사 | Technology which is storing easily, quickly and accurately only wanted part from the movie and audio files |
JP2009088990A (en) | 2007-09-28 | 2009-04-23 | Sanyo Electric Co Ltd | Reception apparatus, television broadcast playback method, and television broadcast playback program |
US8332212B2 (en) * | 2008-06-18 | 2012-12-11 | Cogi, Inc. | Method and system for efficient pacing of speech for transcription |
US8131545B1 (en) * | 2008-09-25 | 2012-03-06 | Google Inc. | Aligning a transcript to audio data |
US8712774B2 (en) * | 2009-03-30 | 2014-04-29 | Nuance Communications, Inc. | Systems and methods for generating a hybrid text string from two or more text strings generated by multiple automated speech recognition systems |
US20100332225A1 (en) * | 2009-06-29 | 2010-12-30 | Nexidia Inc. | Transcript alignment |
CN101996631B (en) | 2009-08-28 | 2014-12-03 | 国际商业机器公司 | Method and device for aligning texts |
US8645134B1 (en) * | 2009-11-18 | 2014-02-04 | Google Inc. | Generation of timed text using speech-to-text technology and applications thereof |
US8572488B2 (en) * | 2010-03-29 | 2013-10-29 | Avid Technology, Inc. | Spot dialog editor |
US9066049B2 (en) | 2010-04-12 | 2015-06-23 | Adobe Systems Incorporated | Method and apparatus for processing scripts |
US8560297B2 (en) * | 2010-06-07 | 2013-10-15 | Microsoft Corporation | Locating parallel word sequences in electronic documents |
US8612205B2 (en) * | 2010-06-14 | 2013-12-17 | Xerox Corporation | Word alignment method and system for improved vocabulary coverage in statistical machine translation |
US20120016671A1 (en) | 2010-07-15 | 2012-01-19 | Pawan Jaggi | Tool and method for enhanced human machine collaboration for rapid and accurate transcriptions |
US8826354B2 (en) * | 2010-12-01 | 2014-09-02 | At&T Intellectual Property I, L.P. | Method and system for testing closed caption content of video assets |
US9286886B2 (en) * | 2011-01-24 | 2016-03-15 | Nuance Communications, Inc. | Methods and apparatus for predicting prosody in speech synthesis |
US20120303643A1 (en) | 2011-05-26 | 2012-11-29 | Raymond Lau | Alignment of Metadata |
US9536567B2 (en) * | 2011-09-02 | 2017-01-03 | Nexidia Inc. | Transcript re-sync |
US9536517B2 (en) * | 2011-11-18 | 2017-01-03 | At&T Intellectual Property I, L.P. | System and method for crowd-sourced data labeling |
US9223986B2 (en) | 2012-04-24 | 2015-12-29 | Samsung Electronics Co., Ltd. | Method and system for information content validation in electronic devices |
US9099089B2 (en) * | 2012-08-02 | 2015-08-04 | Audible, Inc. | Identifying corresponding regions of content |
US20140047073A1 (en) * | 2012-08-10 | 2014-02-13 | Marcin Beme | Platform Independent Multimedia Playback Apparatuses, Methods, and Systems |
US8564721B1 (en) | 2012-08-28 | 2013-10-22 | Matthew Berry | Timeline alignment and coordination for closed-caption text using speech recognition transcripts |
IL225540A (en) * | 2013-04-02 | 2015-09-24 | Igal Nir | System, method and computer readable medium for automatic generation of a database for speech recognition from video captions |
US9230547B2 (en) * | 2013-07-10 | 2016-01-05 | Datascription Llc | Metadata extraction of non-transcribed video and audio streams |
KR20150021258A (en) | 2013-08-20 | 2015-03-02 | 삼성전자주식회사 | Display apparatus and control method thereof |
JP6188490B2 (en) | 2013-08-28 | 2017-08-30 | キヤノン株式会社 | Image display apparatus, control method, and computer program |
US9489360B2 (en) * | 2013-09-05 | 2016-11-08 | Audible, Inc. | Identifying extra material in companion content |
US9384731B2 (en) * | 2013-11-06 | 2016-07-05 | Microsoft Technology Licensing, Llc | Detecting speech input phrase confusion risk |
US9413891B2 (en) | 2014-01-08 | 2016-08-09 | Callminer, Inc. | Real-time conversational analytics facility |
US20180034961A1 (en) * | 2014-02-28 | 2018-02-01 | Ultratec, Inc. | Semiautomated Relay Method and Apparatus |
US20180270350A1 (en) | 2014-02-28 | 2018-09-20 | Ultratec, Inc. | Semiautomated relay method and apparatus |
US9672280B2 (en) | 2014-04-10 | 2017-06-06 | Google Inc. | Methods, systems, and media for searching for video content |
US20160019202A1 (en) * | 2014-07-21 | 2016-01-21 | Charles Adams | System, method, and apparatus for review and annotation of audiovisual media content |
US9189514B1 (en) | 2014-09-04 | 2015-11-17 | Lucas J. Myslinski | Optimized fact checking method and system |
US9202469B1 (en) | 2014-09-16 | 2015-12-01 | Citrix Systems, Inc. | Capturing noteworthy portions of audio recordings |
US9842593B2 (en) | 2014-11-14 | 2017-12-12 | At&T Intellectual Property I, L.P. | Multi-level content analysis and response |
EP3026668A1 (en) * | 2014-11-27 | 2016-06-01 | Thomson Licensing | Apparatus and method for generating visual content from an audio signal |
WO2016146978A1 (en) | 2015-03-13 | 2016-09-22 | Trint Limited | Media generating and editing system |
US20160277577A1 (en) * | 2015-03-20 | 2016-09-22 | TopBox, LLC | Audio File Metadata Event Labeling and Data Analysis |
US9558740B1 (en) | 2015-03-30 | 2017-01-31 | Amazon Technologies, Inc. | Disambiguation in speech recognition |
US10332506B2 (en) * | 2015-09-02 | 2019-06-25 | Oath Inc. | Computerized system and method for formatted transcription of multimedia content |
US9984677B2 (en) * | 2015-09-30 | 2018-05-29 | Nice Ltd. | Bettering scores of spoken phrase spotting |
KR20170044386A (en) | 2015-10-15 | 2017-04-25 | 삼성전자주식회사 | Electronic device and control method thereof |
US9852743B2 (en) * | 2015-11-20 | 2017-12-26 | Adobe Systems Incorporated | Automatic emphasis of spoken words |
US10249294B2 (en) * | 2016-09-09 | 2019-04-02 | Electronics And Telecommunications Research Institute | Speech recognition system and method |
US9900632B1 (en) | 2016-12-30 | 2018-02-20 | Echostar Technologies L.L.C. | Viewing suggestions based on closed-captioned content from multiple tuners |
US10372799B2 (en) | 2017-05-03 | 2019-08-06 | Veritone, Inc. | System and method for redacting content |
-
2017
- 2017-08-15 US US15/677,416 patent/US20180130484A1/en not_active Abandoned
- 2017-08-15 US US15/677,399 patent/US10755729B2/en active Active
- 2017-08-15 WO PCT/US2017/046950 patent/WO2018084910A1/en active Application Filing
-
2019
- 2019-02-28 US US16/289,058 patent/US10943600B2/en active Active
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020028760A1 (en) * | 2018-08-02 | 2020-02-06 | Veritone, Inc. | System and method for neural network orchestration |
US11043209B2 (en) | 2018-08-02 | 2021-06-22 | Veritone, Inc. | System and method for neural network orchestration |
US20220054049A1 (en) * | 2018-12-21 | 2022-02-24 | Universite De Montpellier | High-precision temporal measurement of vibro-acoustic events in synchronisation with a sound signal on a touch-screen device |
US11138970B1 (en) * | 2019-12-06 | 2021-10-05 | Asapp, Inc. | System, method, and computer program for creating a complete transcription of an audio recording from separately transcribed redacted and unredacted words |
US11521639B1 (en) | 2021-04-02 | 2022-12-06 | Asapp, Inc. | Speech sentiment analysis using a speech sentiment classifier pretrained with pseudo sentiment labels |
CN113314124A (en) * | 2021-06-15 | 2021-08-27 | 宿迁硅基智能科技有限公司 | Text output method and system, storage medium and electronic device |
US11651139B2 (en) | 2021-06-15 | 2023-05-16 | Nanjing Silicon Intelligence Technology Co., Ltd. | Text output method and system, storage medium, and electronic device |
US11763803B1 (en) | 2021-07-28 | 2023-09-19 | Asapp, Inc. | System, method, and computer program for extracting utterances corresponding to a user problem statement in a conversation between a human agent and a user |
US20230141096A1 (en) * | 2021-11-11 | 2023-05-11 | Sorenson Ip Holdings, Llc | Transcription presentation |
US12067363B1 (en) | 2022-02-24 | 2024-08-20 | Asapp, Inc. | System, method, and computer program for text sanitization |
US20230360640A1 (en) * | 2022-05-03 | 2023-11-09 | Microsoft Technology Licensing, Llc | Keyword-based dialogue summarizer |
Also Published As
Publication number | Publication date |
---|---|
US10943600B2 (en) | 2021-03-09 |
WO2018084910A1 (en) | 2018-05-11 |
US20180130483A1 (en) | 2018-05-10 |
US20190198038A1 (en) | 2019-06-27 |
US10755729B2 (en) | 2020-08-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10943600B2 (en) | Systems and methods for interrelating text transcript information with video and/or audio information | |
Koepke et al. | Audio retrieval with natural language queries: A benchmark study | |
US20220214775A1 (en) | Method for extracting salient dialog usage from live data | |
US10133538B2 (en) | Semi-supervised speaker diarization | |
US10566009B1 (en) | Audio classifier | |
US9298287B2 (en) | Combined activation for natural user interface systems | |
JP2022510479A (en) | Video cutting method, video cutting device, computer equipment and storage medium | |
TW201717062A (en) | Multi-modal fusion based intelligent fault-tolerant video content recognition system and recognition method | |
JP6361351B2 (en) | Method, program and computing system for ranking spoken words | |
US9472209B2 (en) | Deep tagging background noises | |
KR102029276B1 (en) | Answering questions using environmental context | |
CN113806588B (en) | Method and device for searching video | |
US10474706B2 (en) | Organizing speech search results | |
US20160210353A1 (en) | Data lookup and operator for excluding unwanted speech search results | |
DE102017125474A1 (en) | CONTEXTUAL COMMENTING OF INQUIRIES | |
KR20240144131A (en) | Contextualizing and clarifying the question-and-answer process | |
US10770094B2 (en) | Routing audio streams based on semantically generated result sets | |
WO2023130951A1 (en) | Speech sentence segmentation method and apparatus, electronic device, and storage medium | |
US9747891B1 (en) | Name pronunciation recommendation | |
Yang et al. | Lecture video browsing using multimodal information resources | |
US11640426B1 (en) | Background audio identification for query disambiguation | |
US20220415310A1 (en) | Dynamic context-based routing of speech processing | |
CN113889081A (en) | Speech recognition method, medium, device and computing equipment | |
CN113593543B (en) | Intelligent loudspeaker voice service system, method, device and equipment | |
TWI780333B (en) | Method for dynamically processing and playing multimedia files and multimedia play apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: AXON ENTERPRISE, INC., ARIZONA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DIMINO, JOSEPH CHARLES;FALK, SAYCE WILLIAM;ROSSIGNAC-MILON, LEO THOMAS;REEL/FRAME:046627/0285 Effective date: 20170815 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |