KR20170106445A - 배치된 단대단 음성 인식 - Google Patents
배치된 단대단 음성 인식 Download PDFInfo
- Publication number
- KR20170106445A KR20170106445A KR1020177023173A KR20177023173A KR20170106445A KR 20170106445 A KR20170106445 A KR 20170106445A KR 1020177023173 A KR1020177023173 A KR 1020177023173A KR 20177023173 A KR20177023173 A KR 20177023173A KR 20170106445 A KR20170106445 A KR 20170106445A
- Authority
- KR
- South Korea
- Prior art keywords
- training
- model
- layers
- regression
- speech
- Prior art date
Links
- 238000012549 training Methods 0.000 claims abstract description 128
- 238000000034 method Methods 0.000 claims abstract description 114
- 238000013528 artificial neural network Methods 0.000 claims abstract description 37
- 238000013518 transcription Methods 0.000 claims description 48
- 230000035897 transcription Effects 0.000 claims description 48
- 239000011159 matrix material Substances 0.000 claims description 44
- 230000006870 function Effects 0.000 claims description 41
- 230000004913 activation Effects 0.000 claims description 22
- 238000010606 normalization Methods 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 20
- 230000002123 temporal effect Effects 0.000 claims description 5
- 238000003062 neural network model Methods 0.000 claims description 3
- 230000001186 cumulative effect Effects 0.000 claims description 2
- 241000256135 Chironomus thummi Species 0.000 claims 1
- 238000013135 deep learning Methods 0.000 abstract description 9
- 230000015654 memory Effects 0.000 description 41
- 238000012545 processing Methods 0.000 description 27
- 238000011161 development Methods 0.000 description 26
- 230000018109 developmental process Effects 0.000 description 26
- 238000012360 testing method Methods 0.000 description 25
- 230000006872 improvement Effects 0.000 description 24
- 238000001994 activation Methods 0.000 description 18
- 230000005540 biological transmission Effects 0.000 description 15
- 238000005457 optimization Methods 0.000 description 12
- 230000008901 benefit Effects 0.000 description 10
- 230000002457 bidirectional effect Effects 0.000 description 10
- 238000007781 pre-processing Methods 0.000 description 10
- 238000012546 transfer Methods 0.000 description 9
- 239000000872 buffer Substances 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 8
- 238000002474 experimental method Methods 0.000 description 8
- 238000003860 storage Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000013459 approach Methods 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 6
- 230000009467 reduction Effects 0.000 description 6
- 230000002441 reversible effect Effects 0.000 description 6
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 5
- 241000282412 Homo Species 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000006978 adaptation Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 239000010813 municipal solid waste Substances 0.000 description 4
- 238000003672 processing method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 230000002269 spontaneous effect Effects 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 241001672694 Citrus reticulata Species 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000013480 data collection Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000007667 floating Methods 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 238000009434 installation Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- JFLSOKIMYBSASW-UHFFFAOYSA-N 1-chloro-2-[chloro(diphenyl)methyl]benzene Chemical compound ClC1=CC=CC=C1C(Cl)(C=1C=CC=CC=1)C1=CC=CC=C1 JFLSOKIMYBSASW-UHFFFAOYSA-N 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000013138 pruning Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000003931 cognitive performance Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 150000001879 copper Chemical class 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000004870 electrical engineering Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000010304 firing Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000009987 spinning Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000013517 stratification Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 239000010409 thin film Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
Abstract
Description
도면 1("도1")은 본 개시의 실시예에 따른 단대단 딥 러닝 모델의 아키텍처를 나타낸다.
도2는 본 개시의 실시예에 따른 딥 러닝 모델의 트레이닝 방법을 나타낸다.
도3은 본 개시의 실시예에 따른 시퀸스별 배치 정규화(sequence-wise batch normalization) 방법을 나타낸다.
도4는 그래프로 본 개시의 실시예에 따른 배치 정규화를 이용하여 트레이닝한 것과 배치 정규화를 이용하지 않고 트레이닝한 두개의 모델의 트레이닝 곡선을 나타낸다.
도5는 본 개시의 실시예에 따른 커리큘럼 학습 전략(curriculum learning strategy)을 이용하여 RNN 모델을 트레이닝하는 방법을 나타낸다.
도6은 본 개시의 실시예에 따른 출력 전사를 위한 이중 자소 분할(bi-graphemes segmentation)을 이용하여 RNN 모델을 트레이닝하는 방법을 나타낸다.
도7은 본 개시의 실시예에 따른 미래 콘텍스트 크기가 2인 행 콘볼루션 아키텍처를 나타낸다.
도8은 본 개시의 실시예에 따른 단방향 RNN 모델을 구비한 오디오 전사 방법을 나타낸다.
도9는 본 개시의 실시예에 따른 다중 언어에 적용되는 음성 전사 모델에 대해 트레이닝을 진행하는 방법을 나타낸다.
도10은 본 개시의 실시예에 따른 2개의 망에 대한 스케일링 비교를 나타낸다.
도11은 본 개시의 실시예에 따른 CTC(Connectionist Temporal Classification) 기법의 GPU 구현을 위한 순방향 전송 및 역전파를 나타낸다.
도12는 본 개시의 실시예에 따른 CTC 손실 함수의 GPU 구현을 위한 방법을 나타낸다.
도13은 본 개시의 실시예에 따른 음성 전사 트레이닝을 위한 데이터 수집 방법을 나타낸다.
도14는 본 개시의 실시예에 따른 지정한 크기의 배치로 요청을 처리하는 확율을 나타낸다.
도15는 본 개시의 실시예에 따른 서버 부하 함수의 중간값 및 98 백분위수 지연을 나타낸다.
도16은 본 개시의 실시예에 따른 커널의 비교를 나타낸다.
도17은 본 개시의 실시예에 따른 트레이닝 노드의 예시도를 나타내며, 여기서, PLX는 PCI 스위치를 가리키고, 점선 박스는 동일한 PCI 루트 콤플렉스에 의해 연결된 모든 장치를 포함한다.
도18은 본 개시의 실시예에 따른 컴퓨팅 시스템의 간략화된 블록도를 나타낸다.
Claims (20)
- 사용자로부터 입력 오디오를 수신하되, 상기 입력 오디오는 다수의 발언을 포함하는 단계;
각 발언에 대해 일 세트의 스펙트로그램 프레임을 생성하는 단계;
상기 일 세트의 스펙트로그램 프레임을 회귀 신경망(RNN) 모델에 입력하되, 상기 RNN 모델은 하나 또는 다수의 콘볼루션 층 및 하나 또는 다수의 회귀층을 포함하고, 상기 RNN 모델은 트레이닝 데이터 세트로부터 샘플링된 다수의 미니 배치의 트레이닝 발언 시퀸스를 이용하여 사전 트레이닝되고, 다수의 미치 배치는 트레이닝 기간에 배치 정규화되어 상기 하나 또는 다수의 회귀층 중 적어도 하나 중의 사전 활성화를 정규화시키는 단계;
상기 RNN 모델로부터 하나 또는 다수의 예측된 문자의 확율 출력을 획득하는 단계; 및
각 발언의 가장 가능한 전사를 찾아내도록, 언어 모델에 제약된 상기 확율 출력을 이용하여 검색을 진행하되, 상기 언어 모델은, 상기 예측된 문자의 확율 출력으로부터 확정한 문자열을 하나의 단어 또는 다수의 단어로 해석하는 단계;를 포함하는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터 구현 방법. - 제1항에 있어서,
상기 정규화는,
상기 하나 또는 다수의 콘볼루션 층 및 하나 또는 다수의 회귀층 중의 각 은닉 유닛에 대해, 각 미니 배치 중의 각 트레이닝 발언 시퀸스의 길이 상에서 평균값과 분산을 계산하는 단계를 포함하는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터 구현 방법. - 제1항에 있어서,
상기 RNN 모델은,
상기 하나 또는 다수의 콘볼루션 층 상에 위치한 행 콘볼루션 층을 더 포함하는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터 구현 방법. - 제3항에 있어서,
상기 행 콘볼루션 층은 단방향 및 단순 순방향 층(unidirectional and forward-only layer)인 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터 구현 방법. - 제4항에 있어서,
상기 행 콘볼루션 층의 활성화는 현재 시간 스텝 크기와 적어도 하나의 미래 시간 스텝 크기에서 상기 회귀층로부터의 정보를 이용하여 획득되고, 상기 행 콘볼루션 층의 활성화는 상기 현재 시간 스텝 크기에 대응되는 문자 예측에 사용되는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터 구현 방법. - 제1항에 있어서,
상기 예측된 문자는 영어 알파벳 또는 중국어 문자인 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터 구현 방법. - 제1항에 있어서,
상기 입력 오디오는, 상기 입력 오디오의 총 파워가 상기 RNN 모델을 사전 트레이닝하기 위한 일 세트의 트레이닝 샘플이 일치하도록 정규화되는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터 구현 방법. - 제1항에 있어서,
누적 확율이 적어도 역치인 문자들만 고려하기 위해, 상기 언어 모델 중에서 빔 탐색을 진행하는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터 구현 방법. - 제1항에 있어서,
상기 일 세트의 스펙트로그램 프레임을 획득하는 과정에서 기정된 수량의 시편의 스텝 크기를 스트라이드로 취하여 상기 발언에 대한 서브 샘플링을 구현함으로써, 상기 일 세트의 스펙트로그램 프레임을 생성하는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터 구현 방법. - 제10항에 있어서,
상기 전사 모델로부터의 상기 예측된 문자는, 전체 단어, 음절 및 단어 레벨에서 중복되지 않은 n-그램으로부터 선택된 대체 라벨을 포함하는 것을 특징으로 하는 음성 오디오 전사를 위한 컴퓨터 구현 방법. - 하나 또는 다수의 명령어 시퀸스를 포함하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질에 있어서,
상기 명령어 시퀸스가 하나 또는 다수의 마이크로프로세서에 의해 실행될 경우,
입력 오디오를 수신하되, 상기 입력 오디오는 다수의 발언을 포함하는 단계;
각 발언에 대해 일 세트의 스펙트로그램 프레임을 획득하는 단계;
상기 일 세트의 스펙트로그램 프레임을 신경망에 입력하되, 상기 신경망은 하나 또는 다수의 콘볼루션 층 및 하나 또는 다수의 회귀층을 포함하고, 상기 신경망 모델은, 트레이닝 데이터 세트로부터 샘플링된 다수의 미니 배치의 트레이닝 발언 시퀸스를 이용하여 사전 트레이닝되고, 다수의 미치 배치는 트레이닝 기간에 정규화되어 상기 하나 또는 다수의 콘볼루션 층 중 적어도 하나 중의 사전 활성화를 정규화시키는 단계;
사전 트레이닝된 신경망로부터 하나 또는 다수의 예측된 문자를 획득하는 단계; 및
각 발언의 가장 가능한 전사를 찾아내도록, 언어 모델에 제약된 상기 확율 출력을 이용하여 빔 탐색을 진행하되, 상기 언어 모델은, 상기 예측된 문자로부터 확정한 문자열을 하나의 단어 또는 다수의 단어로 해석하는 단계;를 수행하는 것을 특징으로 하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질. - 제11항에 있어서,
상기 단계는, 상기 일 세트의 스펙트로그램 프레임을 획득하는 과정에서, 기정된 수량의 시편의 스텝 크기를 스트라이드로 취하여 상기 발언에 대한 서브 샘플링을 구현하는 단계를 더 포함하는 것을 특징으로 하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질. - 제11항에 있어서,
상기 하나 또는 다수의 예측된 문자는, 영어 알파벳으로부터 강화된 중복되지 않은 바이그램을 포함하는 것을 특징으로 하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질. - 제11항에 있어서,
상기 단계는, 상기 트레이닝 데이터 세트로부터의 통계 정보를 이용하여 상기 입력 오디오를 정규화시키는 단계를 더 포함하는 것을 특징으로 하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질. - 제11항에 있어서,
사전 트레이닝된 신경망은 트레이닝 세트를 이용하여 CTC(Connectionist Temporal Classification) 손실 함수에 의해 트레이닝되는 것을 특징으로 하는 비 일시적인 컴퓨터 판독 가능한 매체 또는 매질. - 발언에 대응되는 일 세트의 스펙트로그램 프레임을 수신하되, 상기 발언은 상기 일 세트의 스펙트로그램 프레임을 획득하는 과정에서, 기정된 수량의 시편의 스텝 크기를 스트라이드로 취하여 서브 샘플링되는 단계;
하나 또는 다수의 회귀층을 이용하여, 상기 일 세트의 스펙트로그램 프레임에 대응되는 특징 매트릭스를 획득하는 단계; 및
상기 하나 또는 다수의 회귀층 상에 위치한 행 콘볼루션 층을 이용하여, 획득한 특징 매트릭스를 기반으로 상기 현재 시간 스텝에 대응되는 하나 또는 다수의 예측된 문자를 획득하되, 상기 행 콘볼루션 층은 단방향 및 단순 순방향 층이고, 상기 예측된 문자는 영어 알파벳으로부터 강화된 중복되지 않은 바이그램을 포함하는 단계;를 포함하되,
상기 특징 매트릭스는 현재 시간 스텝 크기의 은닉 상태와 N 개의 시간 스텝에서의 미래 은닉 상태를 포함하고, N은 1보다 큰 것을 특징으로 하는 음성 전사를 위한 컴퓨터 구현 방법. - 제16항에 있어서,
상기 하나 또는 다수의 회귀층은 단순 순방향 층(forward-only layer)인 것을 특징으로 하는 음성 전사를 위한 컴퓨터 구현 방법. - 제16항에 있어서,
상기 하나 또는 다수의 회귀층은, 트레이닝 데이터 세트로부터 샘플링된 다수의 미니 배치의 트레이닝 발언 시퀸스를 이용하여 사전 트레이닝되고, 다수의 미치 배치는 트레이닝 기간에 배치 정규화되어 상기 하나 또는 다수의 회귀층 중 적어도 하나 중의 사전 활성화를 정규화시키는 것을 특징으로 하는 음성 전사를 위한 컴퓨터 구현 방법. - 제16항에 있어서,
상기 문자 예측은, 획득한 특징 매트릭스와 파라미터 매트릭스 사이에서 합성 연산(convolution operation)을 이용하여 진행되는 것을 특징으로 하는 음성 전사를 위한 컴퓨터 구현 방법. - 제16항에 있어서,
상기 문자 예측은, 언어 모델에서 상기 예측된 문자를 기반으로 가장 가능한 전사에 대한 빔 탐색을 진행하는 단계를 더 포함하는 것을 특징으로 하는 음성 전사를 위한 컴퓨터 구현 방법.
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562260206P | 2015-11-25 | 2015-11-25 | |
US62/260,206 | 2015-11-25 | ||
US15/358,083 US10319374B2 (en) | 2015-11-25 | 2016-11-21 | Deployed end-to-end speech recognition |
US15/358,102 US10332509B2 (en) | 2015-11-25 | 2016-11-21 | End-to-end speech recognition |
US15/358,102 | 2016-11-21 | ||
US15/358,083 | 2016-11-21 | ||
PCT/US2016/063641 WO2017091751A1 (en) | 2015-11-25 | 2016-11-23 | Deployed end-to-end speech recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170106445A true KR20170106445A (ko) | 2017-09-20 |
KR102008077B1 KR102008077B1 (ko) | 2019-08-06 |
Family
ID=58721011
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020177023173A KR102008077B1 (ko) | 2015-11-25 | 2016-11-23 | 배치된 단대단 음성 인식 |
KR1020177023177A KR102033230B1 (ko) | 2015-11-25 | 2016-11-23 | 단대단 음성 인식 |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020177023177A KR102033230B1 (ko) | 2015-11-25 | 2016-11-23 | 단대단 음성 인식 |
Country Status (6)
Country | Link |
---|---|
US (2) | US10319374B2 (ko) |
EP (2) | EP3245652B1 (ko) |
JP (2) | JP6661654B2 (ko) |
KR (2) | KR102008077B1 (ko) |
CN (2) | CN107408384B (ko) |
WO (2) | WO2017091763A1 (ko) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102018346B1 (ko) * | 2018-05-11 | 2019-10-14 | 국방과학연구소 | 음향 신호를 분류하는 방법 및 시스템 |
KR20200095789A (ko) * | 2019-02-01 | 2020-08-11 | 한국전자통신연구원 | 번역 모델 구축 방법 및 장치 |
KR20210146368A (ko) * | 2019-05-03 | 2021-12-03 | 구글 엘엘씨 | 숫자 시퀀스에 대한 종단 간 자동 음성 인식 |
Families Citing this family (280)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US8515052B2 (en) | 2007-12-17 | 2013-08-20 | Wai Wu | Parallel signal processing system and method |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9634855B2 (en) | 2010-05-13 | 2017-04-25 | Alexander Poltorak | Electronic personal interactive device that determines topics of interest using a conversational agent |
DE212014000045U1 (de) | 2013-02-07 | 2015-09-24 | Apple Inc. | Sprach-Trigger für einen digitalen Assistenten |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10229672B1 (en) * | 2015-12-31 | 2019-03-12 | Google Llc | Training acoustic models using connectionist temporal classification |
KR102100977B1 (ko) * | 2016-02-03 | 2020-04-14 | 구글 엘엘씨 | 압축된 순환 신경망 모델 |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US12197817B2 (en) | 2016-06-11 | 2025-01-14 | Apple Inc. | Intelligent device arbitration and control |
US9984683B2 (en) * | 2016-07-22 | 2018-05-29 | Google Llc | Automatic speech recognition using multi-dimensional models |
CN106251859B (zh) * | 2016-07-22 | 2019-05-31 | 百度在线网络技术(北京)有限公司 | 语音识别处理方法和装置 |
JP6577159B1 (ja) | 2016-09-06 | 2019-09-18 | ディープマインド テクノロジーズ リミテッド | ニューラルネットワークを使用したオーディオの生成 |
US11080591B2 (en) * | 2016-09-06 | 2021-08-03 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
EP3497630B1 (en) * | 2016-09-06 | 2020-11-04 | Deepmind Technologies Limited | Processing sequences using convolutional neural networks |
US10224058B2 (en) * | 2016-09-07 | 2019-03-05 | Google Llc | Enhanced multi-channel acoustic models |
US11182566B2 (en) * | 2016-10-03 | 2021-11-23 | Google Llc | Processing text sequences using neural networks |
KR102359216B1 (ko) | 2016-10-26 | 2022-02-07 | 딥마인드 테크놀로지스 리미티드 | 신경망을 이용한 텍스트 시퀀스 처리 |
US10140980B2 (en) * | 2016-12-21 | 2018-11-27 | Google LCC | Complex linear projection for acoustic modeling |
US10529320B2 (en) * | 2016-12-21 | 2020-01-07 | Google Llc | Complex evolution recurrent neural networks |
KR101882906B1 (ko) * | 2017-01-17 | 2018-07-27 | 경북대학교 산학협력단 | 복수 문단 텍스트의 추상적 요약문 생성 장치 및 방법, 그 방법을 수행하기 위한 기록 매체 |
US10049106B2 (en) * | 2017-01-18 | 2018-08-14 | Xerox Corporation | Natural language generation through character-based recurrent neural networks with finite-state prior knowledge |
US11907858B2 (en) * | 2017-02-06 | 2024-02-20 | Yahoo Assets Llc | Entity disambiguation |
US11087213B2 (en) * | 2017-02-10 | 2021-08-10 | Synaptics Incorporated | Binary and multi-class classification systems and methods using one spike connectionist temporal classification |
US11080600B2 (en) * | 2017-02-10 | 2021-08-03 | Synaptics Incorporated | Recurrent neural network based acoustic event classification using complement rule |
US10762891B2 (en) * | 2017-02-10 | 2020-09-01 | Synaptics Incorporated | Binary and multi-class classification systems and methods using connectionist temporal classification |
US10762417B2 (en) * | 2017-02-10 | 2020-09-01 | Synaptics Incorporated | Efficient connectionist temporal classification for binary classification |
US11100932B2 (en) * | 2017-02-10 | 2021-08-24 | Synaptics Incorporated | Robust start-end point detection algorithm using neural network |
US11853884B2 (en) * | 2017-02-10 | 2023-12-26 | Synaptics Incorporated | Many or one detection classification systems and methods |
US10872598B2 (en) | 2017-02-24 | 2020-12-22 | Baidu Usa Llc | Systems and methods for real-time neural text-to-speech |
US10657955B2 (en) * | 2017-02-24 | 2020-05-19 | Baidu Usa Llc | Systems and methods for principled bias reduction in production speech models |
US10373610B2 (en) * | 2017-02-24 | 2019-08-06 | Baidu Usa Llc | Systems and methods for automatic unit selection and target decomposition for sequence labelling |
US10878837B1 (en) * | 2017-03-01 | 2020-12-29 | Snap Inc. | Acoustic neural network scene detection |
US10762427B2 (en) * | 2017-03-01 | 2020-09-01 | Synaptics Incorporated | Connectionist temporal classification using segmented labeled sequence data |
US10540961B2 (en) * | 2017-03-13 | 2020-01-21 | Baidu Usa Llc | Convolutional recurrent neural networks for small-footprint keyword spotting |
US11410024B2 (en) * | 2017-04-28 | 2022-08-09 | Intel Corporation | Tool for facilitating efficiency in machine learning |
US11017291B2 (en) * | 2017-04-28 | 2021-05-25 | Intel Corporation | Training with adaptive runtime and precision profiling |
US10467052B2 (en) * | 2017-05-01 | 2019-11-05 | Red Hat, Inc. | Cluster topology aware container scheduling for efficient data transfer |
DK180048B1 (en) | 2017-05-11 | 2020-02-04 | Apple Inc. | MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION |
US20180330718A1 (en) * | 2017-05-11 | 2018-11-15 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for End-to-End speech recognition |
KR20180124381A (ko) * | 2017-05-11 | 2018-11-21 | 현대자동차주식회사 | 운전자의 상태 판단 시스템 및 그 방법 |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770427A1 (en) | 2017-05-12 | 2018-12-20 | Apple Inc. | LOW-LATENCY INTELLIGENT AUTOMATED ASSISTANT |
DK201770411A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | MULTI-MODAL INTERFACES |
US20180336275A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Intelligent automated assistant for media exploration |
US10896669B2 (en) | 2017-05-19 | 2021-01-19 | Baidu Usa Llc | Systems and methods for multi-speaker neural text-to-speech |
CN107240396B (zh) * | 2017-06-16 | 2023-01-17 | 百度在线网络技术(北京)有限公司 | 说话人自适应方法、装置、设备及存储介质 |
EP3422518B1 (en) * | 2017-06-28 | 2020-06-17 | Siemens Aktiengesellschaft | A method for recognizing contingencies in a power supply network |
KR102483643B1 (ko) * | 2017-08-14 | 2023-01-02 | 삼성전자주식회사 | 모델을 학습하는 방법 및 장치 및 상기 뉴럴 네트워크를 이용한 인식 방법 및 장치 |
KR102410820B1 (ko) * | 2017-08-14 | 2022-06-20 | 삼성전자주식회사 | 뉴럴 네트워크를 이용한 인식 방법 및 장치 및 상기 뉴럴 네트워크를 트레이닝하는 방법 및 장치 |
US10706840B2 (en) * | 2017-08-18 | 2020-07-07 | Google Llc | Encoder-decoder models for sequence to sequence mapping |
US11694066B2 (en) * | 2017-10-17 | 2023-07-04 | Xilinx, Inc. | Machine learning runtime library for neural network acceleration |
US11017761B2 (en) | 2017-10-19 | 2021-05-25 | Baidu Usa Llc | Parallel neural text-to-speech |
US10872596B2 (en) | 2017-10-19 | 2020-12-22 | Baidu Usa Llc | Systems and methods for parallel wave generation in end-to-end text-to-speech |
US10796686B2 (en) | 2017-10-19 | 2020-10-06 | Baidu Usa Llc | Systems and methods for neural text-to-speech using convolutional sequence learning |
CN107680597B (zh) * | 2017-10-23 | 2019-07-09 | 平安科技(深圳)有限公司 | 语音识别方法、装置、设备以及计算机可读存储介质 |
US11556775B2 (en) | 2017-10-24 | 2023-01-17 | Baidu Usa Llc | Systems and methods for trace norm regularization and faster inference for embedded models |
US20190130896A1 (en) * | 2017-10-26 | 2019-05-02 | Salesforce.Com, Inc. | Regularization Techniques for End-To-End Speech Recognition |
US11562287B2 (en) | 2017-10-27 | 2023-01-24 | Salesforce.Com, Inc. | Hierarchical and interpretable skill acquisition in multi-task reinforcement learning |
US10573295B2 (en) | 2017-10-27 | 2020-02-25 | Salesforce.Com, Inc. | End-to-end speech recognition with policy learning |
US11250314B2 (en) * | 2017-10-27 | 2022-02-15 | Cognizant Technology Solutions U.S. Corporation | Beyond shared hierarchies: deep multitask learning through soft layer ordering |
US10535001B2 (en) * | 2017-11-06 | 2020-01-14 | International Business Machines Corporation | Reducing problem complexity when analyzing 3-D images |
AU2018368279A1 (en) * | 2017-11-14 | 2020-05-14 | Magic Leap, Inc. | Meta-learning for multi-task learning for neural networks |
US11977958B2 (en) | 2017-11-22 | 2024-05-07 | Amazon Technologies, Inc. | Network-accessible machine learning model training and hosting system |
CN108334889B (zh) * | 2017-11-30 | 2020-04-03 | 腾讯科技(深圳)有限公司 | 摘要描述生成方法和装置、摘要描述模型训练方法和装置 |
CN107945791B (zh) * | 2017-12-05 | 2021-07-20 | 华南理工大学 | 一种基于深度学习目标检测的语音识别方法 |
CN108171117B (zh) * | 2017-12-05 | 2019-05-21 | 南京南瑞信息通信科技有限公司 | 基于多核异构并行计算的电力人工智能视觉分析系统 |
US10847137B1 (en) * | 2017-12-12 | 2020-11-24 | Amazon Technologies, Inc. | Trigger word detection using neural network waveform processing |
KR102462426B1 (ko) * | 2017-12-14 | 2022-11-03 | 삼성전자주식회사 | 발화의 의미를 분석하기 위한 전자 장치 및 그의 동작 방법 |
US10672388B2 (en) * | 2017-12-15 | 2020-06-02 | Mitsubishi Electric Research Laboratories, Inc. | Method and apparatus for open-vocabulary end-to-end speech recognition |
US10593321B2 (en) * | 2017-12-15 | 2020-03-17 | Mitsubishi Electric Research Laboratories, Inc. | Method and apparatus for multi-lingual end-to-end speech recognition |
US11443178B2 (en) | 2017-12-15 | 2022-09-13 | Interntional Business Machines Corporation | Deep neural network hardening framework |
CN108089958B (zh) * | 2017-12-29 | 2021-06-08 | 珠海市君天电子科技有限公司 | Gpu测试方法、终端设备和计算机可读存储介质 |
CN108229659A (zh) * | 2017-12-29 | 2018-06-29 | 陕西科技大学 | 基于深度学习的钢琴单键音识别方法 |
FR3076378B1 (fr) * | 2017-12-29 | 2020-05-29 | Bull Sas | Procede de formation d'un reseau de neurones pour la reconnaissance d'une sequence de caracteres et procede de reconnaissance associe |
CN108364662B (zh) * | 2017-12-29 | 2021-01-05 | 中国科学院自动化研究所 | 基于成对鉴别任务的语音情感识别方法与系统 |
KR102089076B1 (ko) * | 2018-01-11 | 2020-03-13 | 중앙대학교 산학협력단 | Bcsc를 이용한 딥러닝 방법 및 그 장치 |
CN108256474A (zh) * | 2018-01-17 | 2018-07-06 | 百度在线网络技术(北京)有限公司 | 用于识别菜品的方法和装置 |
CN108417201B (zh) * | 2018-01-19 | 2020-11-06 | 苏州思必驰信息科技有限公司 | 单信道多说话人身份识别方法及系统 |
CN108417202B (zh) * | 2018-01-19 | 2020-09-01 | 苏州思必驰信息科技有限公司 | 语音识别方法及系统 |
CN108491836B (zh) * | 2018-01-25 | 2020-11-24 | 华南理工大学 | 一种自然场景图像中中文文本整体识别方法 |
US10657426B2 (en) * | 2018-01-25 | 2020-05-19 | Samsung Electronics Co., Ltd. | Accelerating long short-term memory networks via selective pruning |
US11182694B2 (en) | 2018-02-02 | 2021-11-23 | Samsung Electronics Co., Ltd. | Data path for GPU machine learning training with key value SSD |
US11527308B2 (en) | 2018-02-06 | 2022-12-13 | Cognizant Technology Solutions U.S. Corporation | Enhanced optimization with composite objectives and novelty-diversity selection |
WO2019157257A1 (en) | 2018-02-08 | 2019-08-15 | Cognizant Technology Solutions U.S. Corporation | System and method for pseudo-task augmentation in deep multitask learning |
US11501076B2 (en) * | 2018-02-09 | 2022-11-15 | Salesforce.Com, Inc. | Multitask learning as question answering |
TWI659411B (zh) * | 2018-03-01 | 2019-05-11 | 大陸商芋頭科技(杭州)有限公司 | 一種多語言混合語音識別方法 |
CN108564954B (zh) * | 2018-03-19 | 2020-01-10 | 平安科技(深圳)有限公司 | 深度神经网络模型、电子装置、身份验证方法和存储介质 |
KR102473447B1 (ko) * | 2018-03-22 | 2022-12-05 | 삼성전자주식회사 | 인공지능 모델을 이용하여 사용자 음성을 변조하기 위한 전자 장치 및 이의 제어 방법 |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US20190318229A1 (en) * | 2018-04-12 | 2019-10-17 | Advanced Micro Devices, Inc. | Method and system for hardware mapping inference pipelines |
CN108538311B (zh) * | 2018-04-13 | 2020-09-15 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频分类方法、装置及计算机可读存储介质 |
US10672414B2 (en) * | 2018-04-13 | 2020-06-02 | Microsoft Technology Licensing, Llc | Systems, methods, and computer-readable media for improved real-time audio processing |
CN112805780B (zh) * | 2018-04-23 | 2024-08-09 | 谷歌有限责任公司 | 使用端到端模型的讲话者分割 |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
KR20210005273A (ko) | 2018-05-10 | 2021-01-13 | 더 보드 어브 트러스티스 어브 더 리랜드 스탠포드 주니어 유니버시티 | 광자 신경망에 대한 활성화 함수를 위한 시스템 및 방법 |
US12026615B2 (en) * | 2018-05-10 | 2024-07-02 | The Board Of Trustees Of The Leland Stanford Junior University | Training of photonic neural networks through in situ backpropagation |
US11086937B2 (en) * | 2018-05-11 | 2021-08-10 | The Regents Of The University Of California | Speech based structured querying |
US11138471B2 (en) * | 2018-05-18 | 2021-10-05 | Google Llc | Augmentation of audiographic images for improved machine learning |
US11462209B2 (en) * | 2018-05-18 | 2022-10-04 | Baidu Usa Llc | Spectrogram to waveform synthesis using convolutional networks |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | VIRTUAL ASSISTANT OPERATION IN MULTI-DEVICE ENVIRONMENTS |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
GB2589478B (en) * | 2018-06-21 | 2022-05-25 | Ibm | Segmenting irregular shapes in images using deep region growing |
CN108984535B (zh) * | 2018-06-25 | 2022-04-05 | 腾讯科技(深圳)有限公司 | 语句翻译的方法、翻译模型训练的方法、设备及存储介质 |
WO2020000390A1 (en) * | 2018-06-29 | 2020-01-02 | Baidu.Com Times Technology (Beijing) Co., Ltd. | Systems and methods for depth estimation via affinity learned with convolutional spatial propagation networks |
CN109147766B (zh) * | 2018-07-06 | 2020-08-18 | 北京爱医声科技有限公司 | 基于端到端深度学习模型的语音识别方法及系统 |
KR102483774B1 (ko) | 2018-07-13 | 2023-01-02 | 구글 엘엘씨 | 종단 간 스트리밍 키워드 탐지 |
US11335333B2 (en) | 2018-07-20 | 2022-05-17 | Google Llc | Speech recognition with sequence-to-sequence models |
CN110752973B (zh) * | 2018-07-24 | 2020-12-25 | Tcl科技集团股份有限公司 | 一种终端设备的控制方法、装置和终端设备 |
CN108962230B (zh) * | 2018-07-27 | 2019-04-23 | 重庆因普乐科技有限公司 | 基于忆阻器的语音识别方法 |
US10380997B1 (en) * | 2018-07-27 | 2019-08-13 | Deepgram, Inc. | Deep learning internal state index-based search and classification |
JP7209330B2 (ja) * | 2018-07-30 | 2023-01-20 | 国立研究開発法人情報通信研究機構 | 識別器、学習済モデル、学習方法 |
US11107463B2 (en) | 2018-08-01 | 2021-08-31 | Google Llc | Minimum word error rate training for attention-based sequence-to-sequence models |
CN110825665B (zh) * | 2018-08-10 | 2021-11-05 | 昆仑芯(北京)科技有限公司 | 数据获取单元和应用于控制器的数据获取方法 |
US10650812B2 (en) * | 2018-08-13 | 2020-05-12 | Bank Of America Corporation | Deterministic multi-length sliding window protocol for contiguous string entity |
CN109003601A (zh) * | 2018-08-31 | 2018-12-14 | 北京工商大学 | 一种针对低资源土家语的跨语言端到端语音识别方法 |
CN112639964B (zh) * | 2018-09-04 | 2024-07-26 | Oppo广东移动通信有限公司 | 利用深度信息识别语音的方法、系统及计算机可读介质 |
US10963721B2 (en) | 2018-09-10 | 2021-03-30 | Sony Corporation | License plate number recognition based on three dimensional beam search |
CN109271926B (zh) * | 2018-09-14 | 2021-09-10 | 西安电子科技大学 | 基于gru深度卷积网络的智能辐射源识别方法 |
CN109215662B (zh) * | 2018-09-18 | 2023-06-20 | 平安科技(深圳)有限公司 | 端对端语音识别方法、电子装置及计算机可读存储介质 |
JP7043373B2 (ja) * | 2018-09-18 | 2022-03-29 | ヤフー株式会社 | 情報処理装置、情報処理方法、およびプログラム |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US10672382B2 (en) * | 2018-10-15 | 2020-06-02 | Tencent America LLC | Input-feeding architecture for attention based end-to-end speech recognition |
US10891951B2 (en) * | 2018-10-17 | 2021-01-12 | Ford Global Technologies, Llc | Vehicle language processing |
EP3640856A1 (en) | 2018-10-19 | 2020-04-22 | Fujitsu Limited | A method, apparatus and computer program to carry out a training procedure in a convolutional neural network |
KR20200045128A (ko) * | 2018-10-22 | 2020-05-04 | 삼성전자주식회사 | 모델 학습 방법 및 장치, 및 데이터 인식 방법 |
CN109447253B (zh) * | 2018-10-26 | 2021-04-27 | 杭州比智科技有限公司 | 显存分配的方法、装置、计算设备及计算机存储介质 |
CN112970063B (zh) | 2018-10-29 | 2024-10-18 | 杜比国际公司 | 用于利用生成模型的码率质量可分级编码的方法及设备 |
US11640519B2 (en) * | 2018-10-31 | 2023-05-02 | Sony Interactive Entertainment Inc. | Systems and methods for domain adaptation in neural networks using cross-domain batch normalization |
US11494612B2 (en) | 2018-10-31 | 2022-11-08 | Sony Interactive Entertainment Inc. | Systems and methods for domain adaptation in neural networks using domain classifier |
US11526759B2 (en) * | 2018-11-05 | 2022-12-13 | International Business Machines Corporation | Large model support in deep learning |
CN109523994A (zh) * | 2018-11-13 | 2019-03-26 | 四川大学 | 一种基于胶囊神经网络的多任务语音分类方法 |
CN109492233B (zh) * | 2018-11-14 | 2023-10-17 | 北京捷通华声科技股份有限公司 | 一种机器翻译方法和装置 |
US11250838B2 (en) * | 2018-11-16 | 2022-02-15 | Deepmind Technologies Limited | Cross-modal sequence distillation |
US11238845B2 (en) * | 2018-11-21 | 2022-02-01 | Google Llc | Multi-dialect and multilingual speech recognition |
US11736363B2 (en) * | 2018-11-30 | 2023-08-22 | Disney Enterprises, Inc. | Techniques for analyzing a network and increasing network availability |
US10388272B1 (en) | 2018-12-04 | 2019-08-20 | Sorenson Ip Holdings, Llc | Training speech recognition systems using word sequences |
US10573312B1 (en) | 2018-12-04 | 2020-02-25 | Sorenson Ip Holdings, Llc | Transcription generation from multiple speech recognition systems |
US11170761B2 (en) | 2018-12-04 | 2021-11-09 | Sorenson Ip Holdings, Llc | Training of speech recognition systems |
US11017778B1 (en) | 2018-12-04 | 2021-05-25 | Sorenson Ip Holdings, Llc | Switching between speech recognition systems |
KR102681637B1 (ko) | 2018-12-13 | 2024-07-05 | 현대자동차주식회사 | 문제소음 발음원 식별을 위한 소음데이터의 인공지능 장치 및 전처리 방법 |
KR20210106546A (ko) | 2018-12-24 | 2021-08-30 | 디티에스, 인코포레이티드 | 딥 러닝 이미지 분석을 사용한 룸 음향 시뮬레이션 |
JP7206898B2 (ja) * | 2018-12-25 | 2023-01-18 | 富士通株式会社 | 学習装置、学習方法および学習プログラム |
CN111369978B (zh) * | 2018-12-26 | 2024-05-17 | 北京搜狗科技发展有限公司 | 一种数据处理方法、装置和用于数据处理的装置 |
KR102744417B1 (ko) | 2018-12-28 | 2024-12-19 | 한국전자통신연구원 | 오디오 신호를 위한 손실 함수 결정 방법 및 손실 함수 결정 장치 |
CN111429889B (zh) * | 2019-01-08 | 2023-04-28 | 百度在线网络技术(北京)有限公司 | 基于截断注意力的实时语音识别的方法、装置、设备以及计算机可读存储介质 |
US11322136B2 (en) | 2019-01-09 | 2022-05-03 | Samsung Electronics Co., Ltd. | System and method for multi-spoken language detection |
WO2020154538A1 (en) | 2019-01-23 | 2020-07-30 | Google Llc | Generating neural network outputs using insertion operations |
CN109783822B (zh) * | 2019-01-24 | 2023-04-18 | 中国—东盟信息港股份有限公司 | 一种基于验证码的数据样本识别系统及其方法 |
CN111489742B (zh) * | 2019-01-28 | 2023-06-27 | 北京猎户星空科技有限公司 | 声学模型训练方法、语音识别方法、装置及电子设备 |
CN109859743B (zh) * | 2019-01-29 | 2023-12-08 | 腾讯科技(深圳)有限公司 | 音频识别方法、系统和机器设备 |
KR102691895B1 (ko) | 2019-01-29 | 2024-08-06 | 삼성전자주식회사 | 가속 컴퓨팅 환경을 제공하는 서버 및 제어 방법 |
JP7028203B2 (ja) * | 2019-02-07 | 2022-03-02 | 日本電信電話株式会社 | 音声認識装置、音声認識方法、プログラム |
JP7218601B2 (ja) * | 2019-02-12 | 2023-02-07 | 日本電信電話株式会社 | 学習データ取得装置、モデル学習装置、それらの方法、およびプログラム |
CN110059813B (zh) | 2019-02-13 | 2021-04-06 | 创新先进技术有限公司 | 利用gpu集群更新卷积神经网络的方法、装置及设备 |
US11037547B2 (en) | 2019-02-14 | 2021-06-15 | Tencent America LLC | Token-wise training for attention based end-to-end speech recognition |
US10861441B2 (en) * | 2019-02-14 | 2020-12-08 | Tencent America LLC | Large margin training for attention-based end-to-end speech recognition |
US11481639B2 (en) | 2019-02-26 | 2022-10-25 | Cognizant Technology Solutions U.S. Corporation | Enhanced optimization with composite objectives and novelty pulsation |
EP3938898A4 (en) * | 2019-03-13 | 2023-03-29 | Cognizant Technology Solutions U.S. Corporation | SYSTEM AND METHOD FOR IMPLEMENTING MODULAR UNIVERSAL RESETTING FOR DEEP MULTITASKING LEARNING ACROSS VARIOUS AREAS |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
CN111709513B (zh) * | 2019-03-18 | 2023-06-09 | 百度在线网络技术(北京)有限公司 | 长短期记忆网络lstm的训练系统、方法及电子设备 |
EP3948692A4 (en) | 2019-03-27 | 2023-03-29 | Cognizant Technology Solutions U.S. Corporation | PROCESS AND SYSTEM CONTAINING A SCALABLE SUBSTITUTE-ASSISTED PRESCRIPTIONS OPTIMIZATION ENGINE |
US11182457B2 (en) | 2019-03-28 | 2021-11-23 | International Business Machines Corporation | Matrix-factorization based gradient compression |
US11011156B2 (en) * | 2019-04-11 | 2021-05-18 | International Business Machines Corporation | Training data modification for training model |
CN109887497B (zh) * | 2019-04-12 | 2021-01-29 | 北京百度网讯科技有限公司 | 语音识别的建模方法、装置及设备 |
CN110033760B (zh) | 2019-04-15 | 2021-01-29 | 北京百度网讯科技有限公司 | 语音识别的建模方法、装置及设备 |
US11676006B2 (en) | 2019-04-16 | 2023-06-13 | Microsoft Technology Licensing, Llc | Universal acoustic modeling using neural mixture models |
JP7336537B2 (ja) * | 2019-04-16 | 2023-08-31 | グーグル エルエルシー | 組み合わせで行うエンドポイント決定と自動音声認識 |
US10997967B2 (en) | 2019-04-18 | 2021-05-04 | Honeywell International Inc. | Methods and systems for cockpit speech recognition acoustic model training with multi-level corpus data augmentation |
US11468879B2 (en) * | 2019-04-29 | 2022-10-11 | Tencent America LLC | Duration informed attention network for text-to-speech analysis |
US20200349425A1 (en) * | 2019-04-30 | 2020-11-05 | Fujitsu Limited | Training time reduction in automatic data augmentation |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
JP7234415B2 (ja) * | 2019-05-06 | 2023-03-07 | グーグル エルエルシー | 音声認識のためのコンテキストバイアス |
CN110211565B (zh) * | 2019-05-06 | 2023-04-04 | 平安科技(深圳)有限公司 | 方言识别方法、装置及计算机可读存储介质 |
KR102460676B1 (ko) | 2019-05-07 | 2022-10-31 | 한국전자통신연구원 | 밀집 연결된 하이브리드 뉴럴 네트워크를 이용한 음성 처리 장치 및 방법 |
US20220215252A1 (en) * | 2019-05-07 | 2022-07-07 | Imagia Cybernetics Inc. | Method and system for initializing a neural network |
CN110222578B (zh) * | 2019-05-08 | 2022-12-27 | 腾讯科技(深圳)有限公司 | 对抗测试看图说话系统的方法和装置 |
CN110085249B (zh) * | 2019-05-09 | 2021-03-16 | 南京工程学院 | 基于注意力门控的循环神经网络的单通道语音增强方法 |
CN111832699A (zh) * | 2019-05-13 | 2020-10-27 | 谷歌有限责任公司 | 用于神经网络的计算高效富于表达的输出层 |
JP7229847B2 (ja) * | 2019-05-13 | 2023-02-28 | 株式会社日立製作所 | 対話装置、対話方法、及び対話コンピュータプログラム |
CN113924619A (zh) * | 2019-05-28 | 2022-01-11 | 谷歌有限责任公司 | 通过流式端到端模型的大规模多语言语音识别 |
CN112017676B (zh) * | 2019-05-31 | 2024-07-16 | 京东科技控股股份有限公司 | 音频处理方法、装置和计算机可读存储介质 |
US11289073B2 (en) * | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11227599B2 (en) | 2019-06-01 | 2022-01-18 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US10716089B1 (en) * | 2019-06-03 | 2020-07-14 | Mapsted Corp. | Deployment of trained neural network based RSS fingerprint dataset |
CN110189766B (zh) * | 2019-06-14 | 2021-04-06 | 西南科技大学 | 一种基于神经网络的语音风格转移方法 |
CN110299132B (zh) * | 2019-06-26 | 2021-11-02 | 京东数字科技控股有限公司 | 一种语音数字识别方法和装置 |
CN110288682B (zh) | 2019-06-28 | 2023-09-26 | 北京百度网讯科技有限公司 | 用于控制三维虚拟人像口型变化的方法和装置 |
WO2021010562A1 (en) | 2019-07-15 | 2021-01-21 | Samsung Electronics Co., Ltd. | Electronic apparatus and controlling method thereof |
KR20210008788A (ko) | 2019-07-15 | 2021-01-25 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
US11244673B2 (en) * | 2019-07-19 | 2022-02-08 | Microsoft Technologly Licensing, LLC | Streaming contextual unidirectional models |
KR20210014949A (ko) * | 2019-07-31 | 2021-02-10 | 삼성전자주식회사 | 음성 인식을 위한 인공신경망에서의 디코딩 방법 및 장치 |
CN110473554B (zh) * | 2019-08-08 | 2022-01-25 | Oppo广东移动通信有限公司 | 音频校验方法、装置、存储介质及电子设备 |
CN114207711A (zh) | 2019-08-13 | 2022-03-18 | 三星电子株式会社 | 用于识别用户的语音的系统和方法 |
WO2021029643A1 (en) | 2019-08-13 | 2021-02-18 | Samsung Electronics Co., Ltd. | System and method for modifying speech recognition result |
EP3931826A4 (en) | 2019-08-13 | 2022-05-11 | Samsung Electronics Co., Ltd. | SERVER SUPPORTING VOICE RECOGNITION OF A DEVICE AND METHOD OF OPERATING THE SERVER |
CN110459209B (zh) * | 2019-08-20 | 2021-05-28 | 深圳追一科技有限公司 | 语音识别方法、装置、设备及存储介质 |
US11151979B2 (en) | 2019-08-23 | 2021-10-19 | Tencent America LLC | Duration informed attention network (DURIAN) for audio-visual synthesis |
US11158303B2 (en) * | 2019-08-27 | 2021-10-26 | International Business Machines Corporation | Soft-forgetting for connectionist temporal classification based automatic speech recognition |
US11551675B2 (en) | 2019-09-03 | 2023-01-10 | Samsung Electronics Co., Ltd. | Electronic device and method for controlling the electronic device thereof |
CN110459208B (zh) * | 2019-09-09 | 2022-01-11 | 中科极限元(杭州)智能科技股份有限公司 | 一种基于知识迁移的序列到序列语音识别模型训练方法 |
CN110600020B (zh) * | 2019-09-12 | 2022-05-17 | 上海依图信息技术有限公司 | 一种梯度传输方法及装置 |
US11302309B2 (en) * | 2019-09-13 | 2022-04-12 | International Business Machines Corporation | Aligning spike timing of models for maching learning |
CN110807365B (zh) * | 2019-09-29 | 2022-02-11 | 浙江大学 | 一种基于gru与一维cnn神经网络融合的水下目标识别方法 |
CN112738634B (zh) * | 2019-10-14 | 2022-08-02 | 北京字节跳动网络技术有限公司 | 视频文件的生成方法、装置、终端及存储介质 |
US11681911B2 (en) * | 2019-10-15 | 2023-06-20 | Naver Corporation | Method and system for training neural sequence-to-sequence models by incorporating global features |
CN110704197B (zh) | 2019-10-17 | 2022-12-09 | 北京小米移动软件有限公司 | 处理内存访问开销的方法、装置及介质 |
CN110875035A (zh) * | 2019-10-24 | 2020-03-10 | 广州多益网络股份有限公司 | 新型多任务联合的语音识别训练架构和方法 |
KR102203786B1 (ko) * | 2019-11-14 | 2021-01-15 | 오로라월드 주식회사 | 스마트 토이를 이용한 인터렉션 서비스 제공방법 및 시스템 |
CN110930979B (zh) * | 2019-11-29 | 2020-10-30 | 百度在线网络技术(北京)有限公司 | 一种语音识别模型训练方法、装置以及电子设备 |
CN111312228A (zh) * | 2019-12-09 | 2020-06-19 | 中国南方电网有限责任公司 | 一种基于端到端的应用于电力企业客服的语音导航方法 |
CN111048082B (zh) * | 2019-12-12 | 2022-09-06 | 中国电子科技集团公司第二十八研究所 | 一种改进的端到端语音识别方法 |
CN113077785B (zh) * | 2019-12-17 | 2022-07-12 | 中国科学院声学研究所 | 一种端到端的多语言连续语音流语音内容识别方法及系统 |
CN111079945B (zh) * | 2019-12-18 | 2021-02-05 | 北京百度网讯科技有限公司 | 端到端模型的训练方法及装置 |
CN111145729B (zh) * | 2019-12-23 | 2022-10-28 | 厦门快商通科技股份有限公司 | 语音识别模型训练方法、系统、移动终端及存储介质 |
CN111063336A (zh) * | 2019-12-30 | 2020-04-24 | 天津中科智能识别产业技术研究院有限公司 | 一种基于深度学习的端对端语音识别系统 |
US11183178B2 (en) | 2020-01-13 | 2021-11-23 | Microsoft Technology Licensing, Llc | Adaptive batching to reduce recognition latency |
CN111382581B (zh) * | 2020-01-21 | 2023-05-19 | 沈阳雅译网络技术有限公司 | 一种机器翻译中的一次剪枝压缩方法 |
EP4085451B1 (en) * | 2020-01-28 | 2024-04-10 | Google LLC | Language-agnostic multilingual modeling using effective script normalization |
CN111292727B (zh) * | 2020-02-03 | 2023-03-24 | 北京声智科技有限公司 | 一种语音识别方法及电子设备 |
CN111428750A (zh) * | 2020-02-20 | 2020-07-17 | 商汤国际私人有限公司 | 一种文本识别模型训练及文本识别方法、装置及介质 |
CN111210807B (zh) * | 2020-02-21 | 2023-03-31 | 厦门快商通科技股份有限公司 | 语音识别模型训练方法、系统、移动终端及存储介质 |
CN111397870B (zh) * | 2020-03-08 | 2021-05-14 | 中国地质大学(武汉) | 一种基于多样化集成卷积神经网络的机械故障预测方法 |
US11747902B2 (en) | 2020-03-11 | 2023-09-05 | Apple Inc. | Machine learning configurations modeled using contextual categorical labels for biosignals |
CN111246026A (zh) * | 2020-03-11 | 2020-06-05 | 兰州飞天网景信息产业有限公司 | 一种基于卷积神经网络和连接性时序分类的录音处理方法 |
CN111415667B (zh) * | 2020-03-25 | 2024-04-23 | 中科极限元(杭州)智能科技股份有限公司 | 一种流式端到端语音识别模型训练和解码方法 |
US12217156B2 (en) * | 2020-04-01 | 2025-02-04 | Sony Group Corporation | Computing temporal convolution networks in real time |
US12136411B2 (en) | 2020-04-03 | 2024-11-05 | International Business Machines Corporation | Training of model for processing sequence data |
US12099934B2 (en) * | 2020-04-07 | 2024-09-24 | Cognizant Technology Solutions U.S. Corporation | Framework for interactive exploration, evaluation, and improvement of AI-generated solutions |
US12020693B2 (en) | 2020-04-29 | 2024-06-25 | Samsung Electronics Co., Ltd. | System and method for out-of-vocabulary phrase support in automatic speech recognition |
US11061543B1 (en) | 2020-05-11 | 2021-07-13 | Apple Inc. | Providing relevant data items based on context |
US11796794B2 (en) | 2020-05-12 | 2023-10-24 | The Board Of Trustees Of The Leland Stanford Junior University | Multi-objective, robust constraints enforced global topology optimizer for optical devices |
US20210358490A1 (en) * | 2020-05-18 | 2021-11-18 | Nvidia Corporation | End of speech detection using one or more neural networks |
CN111798828B (zh) * | 2020-05-29 | 2023-02-14 | 厦门快商通科技股份有限公司 | 合成音频检测方法、系统、移动终端及存储介质 |
US11775841B2 (en) | 2020-06-15 | 2023-10-03 | Cognizant Technology Solutions U.S. Corporation | Process and system including explainable prescriptions through surrogate-assisted evolution |
US11646009B1 (en) * | 2020-06-16 | 2023-05-09 | Amazon Technologies, Inc. | Autonomously motile device with noise suppression |
US11490204B2 (en) | 2020-07-20 | 2022-11-01 | Apple Inc. | Multi-device audio adjustment coordination |
US11438683B2 (en) | 2020-07-21 | 2022-09-06 | Apple Inc. | User identification using headphones |
CN111816169B (zh) * | 2020-07-23 | 2022-05-13 | 思必驰科技股份有限公司 | 中英语种混杂语音识别模型训练方法和装置 |
US11875797B2 (en) * | 2020-07-23 | 2024-01-16 | Pozotron Inc. | Systems and methods for scripted audio production |
KR102462932B1 (ko) * | 2020-08-03 | 2022-11-04 | 주식회사 딥브레인에이아이 | 텍스트 전처리 장치 및 방법 |
US11488604B2 (en) | 2020-08-19 | 2022-11-01 | Sorenson Ip Holdings, Llc | Transcription of audio |
KR102409873B1 (ko) * | 2020-09-02 | 2022-06-16 | 네이버 주식회사 | 증강된 일관성 정규화를 이용한 음성 인식 모델 학습 방법 및 시스템 |
CN112188004B (zh) * | 2020-09-28 | 2022-04-05 | 精灵科技有限公司 | 基于机器学习的障碍呼叫检测系统及其控制方法 |
CN112233655B (zh) * | 2020-09-28 | 2024-07-16 | 上海声瀚信息科技有限公司 | 一种提高语音命令词识别性能的神经网络训练方法 |
JP2023545988A (ja) * | 2020-10-05 | 2023-11-01 | グーグル エルエルシー | トランスフォーマトランスデューサ:ストリーミング音声認識と非ストリーミング音声認識を統合する1つのモデル |
KR102429656B1 (ko) * | 2020-10-08 | 2022-08-08 | 서울대학교산학협력단 | 화자 인식을 위한 음성인식기 기반 풀링 기법의 화자 임베딩 추출 방법 및 시스템, 그리고 이를 위한 기록매체 |
US12093802B2 (en) | 2020-10-20 | 2024-09-17 | International Business Machines Corporation | Gated unit for a gated recurrent neural network |
CN112259080B (zh) * | 2020-10-20 | 2021-06-22 | 北京讯众通信技术股份有限公司 | 一种基于神经网络模型的语音识别方法 |
US11593560B2 (en) * | 2020-10-21 | 2023-02-28 | Beijing Wodong Tianjun Information Technology Co., Ltd. | System and method for relation extraction with adaptive thresholding and localized context pooling |
CN112466282B (zh) * | 2020-10-22 | 2023-11-28 | 北京仿真中心 | 一种面向航天专业领域的语音识别系统和方法 |
CN112420024B (zh) * | 2020-10-23 | 2022-09-09 | 四川大学 | 一种全端到端的中英文混合空管语音识别方法及装置 |
CN112329836B (zh) * | 2020-11-02 | 2024-12-27 | 成都网安科技发展有限公司 | 基于深度学习的文本分类方法、装置、服务器及存储介质 |
CN112614484B (zh) | 2020-11-23 | 2022-05-20 | 北京百度网讯科技有限公司 | 特征信息挖掘方法、装置及电子设备 |
CN112669852B (zh) * | 2020-12-15 | 2023-01-31 | 北京百度网讯科技有限公司 | 内存分配方法、装置及电子设备 |
CN112786017B (zh) * | 2020-12-25 | 2024-04-09 | 北京猿力未来科技有限公司 | 语速检测模型的训练方法及装置、语速检测方法及装置 |
US11790906B2 (en) * | 2021-01-25 | 2023-10-17 | Google Llc | Resolving unique personal identifiers during corresponding conversations between a voice bot and a human |
KR20230141828A (ko) | 2021-02-04 | 2023-10-10 | 딥마인드 테크놀로지스 리미티드 | 적응형 그래디언트 클리핑을 사용하는 신경 네트워크들 |
CN113421574B (zh) * | 2021-06-18 | 2024-05-24 | 腾讯音乐娱乐科技(深圳)有限公司 | 音频特征提取模型的训练方法、音频识别方法及相关设备 |
CN113535510B (zh) * | 2021-06-24 | 2024-01-26 | 北京理工大学 | 一种大规模数据中心数据采集的自适应抽样模型优化方法 |
CN113327600B (zh) * | 2021-06-30 | 2024-07-23 | 北京有竹居网络技术有限公司 | 一种语音识别模型的训练方法、装置及设备 |
US12112200B2 (en) | 2021-09-13 | 2024-10-08 | International Business Machines Corporation | Pipeline parallel computing using extended memory |
CN118043885A (zh) | 2021-09-30 | 2024-05-14 | 谷歌有限责任公司 | 用于半监督语音识别的对比孪生网络 |
US20230186525A1 (en) * | 2021-12-13 | 2023-06-15 | Tencent America LLC | System, method, and computer program for content adaptive online training for multiple blocks in neural image compression |
CN114548501B (zh) * | 2022-01-14 | 2024-06-18 | 北京全路通信信号研究设计院集团有限公司 | 一种均衡性检查方法、系统及设备 |
CN114842829A (zh) * | 2022-03-29 | 2022-08-02 | 北京理工大学 | 一种抑制语音要素异常点的文本驱动语音合成方法 |
US12136413B1 (en) * | 2022-03-31 | 2024-11-05 | Amazon Technologies, Inc. | Domain-specific parameter pre-fixes for tuning automatic speech recognition |
US11978436B2 (en) | 2022-06-03 | 2024-05-07 | Apple Inc. | Application vocabulary integration with a digital assistant |
CN114743554A (zh) * | 2022-06-09 | 2022-07-12 | 武汉工商学院 | 基于物联网的智能家居交互方法及装置 |
KR102547001B1 (ko) * | 2022-06-28 | 2023-06-23 | 주식회사 액션파워 | 하향식 방식을 이용한 오류 검출 방법 |
US20240339123A1 (en) * | 2023-04-06 | 2024-10-10 | Samsung Electronics Co., Ltd. | System and method for keyword spotting in noisy environments |
Family Cites Families (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5790754A (en) | 1994-10-21 | 1998-08-04 | Sensory Circuits, Inc. | Speech recognition apparatus for consumer electronic applications |
US5749066A (en) | 1995-04-24 | 1998-05-05 | Ericsson Messaging Systems Inc. | Method and apparatus for developing a neural network for phoneme recognition |
JP2996926B2 (ja) | 1997-03-11 | 2000-01-11 | 株式会社エイ・ティ・アール音声翻訳通信研究所 | 音素シンボルの事後確率演算装置及び音声認識装置 |
US6292772B1 (en) * | 1998-12-01 | 2001-09-18 | Justsystem Corporation | Method for identifying the language of individual words |
AUPQ439299A0 (en) * | 1999-12-01 | 1999-12-23 | Silverbrook Research Pty Ltd | Interface system |
US7035802B1 (en) | 2000-07-31 | 2006-04-25 | Matsushita Electric Industrial Co., Ltd. | Recognition system using lexical trees |
US7219085B2 (en) * | 2003-12-09 | 2007-05-15 | Microsoft Corporation | System and method for accelerating and optimizing the processing of machine learning techniques using a graphics processing unit |
US20060031069A1 (en) | 2004-08-03 | 2006-02-09 | Sony Corporation | System and method for performing a grapheme-to-phoneme conversion |
GB0507036D0 (en) | 2005-04-07 | 2005-05-11 | Ibm | Method and system for language identification |
US20110035215A1 (en) * | 2007-08-28 | 2011-02-10 | Haim Sompolinsky | Method, device and system for speech recognition |
JP4869268B2 (ja) * | 2008-03-04 | 2012-02-08 | 日本放送協会 | 音響モデル学習装置およびプログラム |
US8332212B2 (en) * | 2008-06-18 | 2012-12-11 | Cogi, Inc. | Method and system for efficient pacing of speech for transcription |
US8781833B2 (en) | 2008-07-17 | 2014-07-15 | Nuance Communications, Inc. | Speech recognition semantic classification training |
US8886531B2 (en) | 2010-01-13 | 2014-11-11 | Rovi Technologies Corporation | Apparatus and method for generating an audio fingerprint and using a two-stage query |
US20130317755A1 (en) | 2012-05-04 | 2013-11-28 | New York University | Methods, computer-accessible medium, and systems for score-driven whole-genome shotgun sequence assembly |
US10354650B2 (en) * | 2012-06-26 | 2019-07-16 | Google Llc | Recognizing speech with mixed speech recognition models to generate transcriptions |
US8831957B2 (en) * | 2012-08-01 | 2014-09-09 | Google Inc. | Speech recognition models based on location indicia |
CN102760436B (zh) * | 2012-08-09 | 2014-06-11 | 河南省烟草公司开封市公司 | 一种语音词库筛选方法 |
US9177550B2 (en) | 2013-03-06 | 2015-11-03 | Microsoft Technology Licensing, Llc | Conservatively adapting a deep neural network in a recognition system |
US9153231B1 (en) * | 2013-03-15 | 2015-10-06 | Amazon Technologies, Inc. | Adaptive neural network speech recognition models |
US9418650B2 (en) | 2013-09-25 | 2016-08-16 | Verizon Patent And Licensing Inc. | Training speech recognition using captions |
CN103591637B (zh) | 2013-11-19 | 2015-12-02 | 长春工业大学 | 一种集中供热二次网运行调节方法 |
US9189708B2 (en) | 2013-12-31 | 2015-11-17 | Google Inc. | Pruning and label selection in hidden markov model-based OCR |
CN103870863B (zh) * | 2014-03-14 | 2016-08-31 | 华中科技大学 | 制备隐藏二维码图像全息防伪标签的方法及其识别装置 |
US9390712B2 (en) | 2014-03-24 | 2016-07-12 | Microsoft Technology Licensing, Llc. | Mixed speech recognition |
US20150309987A1 (en) | 2014-04-29 | 2015-10-29 | Google Inc. | Classification of Offensive Words |
CN104035751B (zh) * | 2014-06-20 | 2016-10-12 | 深圳市腾讯计算机系统有限公司 | 基于多图形处理器的数据并行处理方法及装置 |
US10540957B2 (en) * | 2014-12-15 | 2020-01-21 | Baidu Usa Llc | Systems and methods for speech transcription |
US10733979B2 (en) * | 2015-10-09 | 2020-08-04 | Google Llc | Latency constraints for acoustic modeling |
-
2016
- 2016-11-21 US US15/358,083 patent/US10319374B2/en active Active
- 2016-11-21 US US15/358,102 patent/US10332509B2/en active Active
- 2016-11-23 JP JP2017544340A patent/JP6661654B2/ja active Active
- 2016-11-23 CN CN201680010871.9A patent/CN107408384B/zh active Active
- 2016-11-23 KR KR1020177023173A patent/KR102008077B1/ko active IP Right Grant
- 2016-11-23 WO PCT/US2016/063661 patent/WO2017091763A1/en active Application Filing
- 2016-11-23 EP EP16869294.5A patent/EP3245652B1/en active Active
- 2016-11-23 JP JP2017544352A patent/JP6629872B2/ja active Active
- 2016-11-23 WO PCT/US2016/063641 patent/WO2017091751A1/en active Application Filing
- 2016-11-23 CN CN201680010873.8A patent/CN107408111B/zh active Active
- 2016-11-23 KR KR1020177023177A patent/KR102033230B1/ko active IP Right Grant
- 2016-11-23 EP EP16869302.6A patent/EP3245597B1/en active Active
Non-Patent Citations (3)
Title |
---|
Awni Hannun et al., ‘Deep speech: Scaling up end-to-end speech recognition’, Cornell University Library, pp. 1~12, December 2014.* * |
Sergey Ioffe et al., ‘Batch normalization: Accerlerating deep network training by reducing internal covalate shift’, Cornell University Library, pp.1~11, March 2015.* * |
Tara N. Sainath et al., ‘Convolutional long short-term memory, fully connected deep neural networks’, ICASSP 2015, pp.4580~4584, April 2015.* * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102018346B1 (ko) * | 2018-05-11 | 2019-10-14 | 국방과학연구소 | 음향 신호를 분류하는 방법 및 시스템 |
KR20200095789A (ko) * | 2019-02-01 | 2020-08-11 | 한국전자통신연구원 | 번역 모델 구축 방법 및 장치 |
KR20210146368A (ko) * | 2019-05-03 | 2021-12-03 | 구글 엘엘씨 | 숫자 시퀀스에 대한 종단 간 자동 음성 인식 |
Also Published As
Publication number | Publication date |
---|---|
KR102033230B1 (ko) | 2019-10-16 |
JP6629872B2 (ja) | 2020-01-15 |
EP3245652A1 (en) | 2017-11-22 |
EP3245597B1 (en) | 2020-08-26 |
JP2018513399A (ja) | 2018-05-24 |
EP3245597A4 (en) | 2018-05-30 |
CN107408384A (zh) | 2017-11-28 |
CN107408111B (zh) | 2021-03-30 |
US20170148431A1 (en) | 2017-05-25 |
EP3245652A4 (en) | 2018-05-30 |
KR102008077B1 (ko) | 2019-08-06 |
WO2017091763A1 (en) | 2017-06-01 |
CN107408111A (zh) | 2017-11-28 |
CN107408384B (zh) | 2020-11-27 |
US20170148433A1 (en) | 2017-05-25 |
EP3245597A1 (en) | 2017-11-22 |
EP3245652B1 (en) | 2019-07-10 |
JP2018513398A (ja) | 2018-05-24 |
WO2017091751A1 (en) | 2017-06-01 |
JP6661654B2 (ja) | 2020-03-11 |
US10332509B2 (en) | 2019-06-25 |
US10319374B2 (en) | 2019-06-11 |
KR20170107015A (ko) | 2017-09-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102033230B1 (ko) | 단대단 음성 인식 | |
US11620986B2 (en) | Cold fusing sequence-to-sequence models with language models | |
Xiong et al. | Toward human parity in conversational speech recognition | |
CN107077842B (zh) | 用于语音转录的系统和方法 | |
Sundermeyer et al. | Comparison of feedforward and recurrent neural network language models | |
KR101970041B1 (ko) | 하이브리드 지피유/씨피유(gpu/cpu) 데이터 처리 방법 | |
Enarvi et al. | Automatic speech recognition with very large conversational finnish and estonian vocabularies | |
Scanzio et al. | Parallel implementation of artificial neural network training for speech recognition | |
Abdelhamid et al. | End-to-end arabic speech recognition: A review | |
Suyanto et al. | End-to-End speech recognition models for a low-resourced Indonesian Language | |
You et al. | Memory access optimized VLSI for 5000-word continuous speech recognition | |
Evrard | Transformers in automatic speech recognition | |
Buthpitiya et al. | A parallel implementation of viterbi training for acoustic models using graphics processing units | |
Chetupalli et al. | Context dependent RNNLM for automatic transcription of conversations | |
Liu et al. | Speech recognition systems on the Cell Broadband Engine processor | |
Karkada et al. | Training Speech Recognition Models on HPC Infrastructure | |
Djuraev et al. | An In-Depth Analysis of Automatic Speech Recognition System | |
Liu et al. | Cross Languages One-Versus-All Speech | |
Chen | Cued rnnlm toolkit | |
Dua et al. | Cepstral and acoustic ternary pattern based hybrid feature extraction approach for end-to-end bangla speech recognition | |
Liu et al. | Cross Languages One-Versus-All Speech Emotion Classifier | |
Pinto Rivero | Acceleration of automatic speech recognition for low-power devices | |
Letswamotse | Optimized dynamic programming search for automatic speech recognition on a Graphics Processing Unit (GPU) platform using Compute Unified Device Architecture (CUDA) | |
Ma et al. | Scaling down: applying large vocabulary hybrid HMM-MLP methods to telephone recognition of digits and natural numbers | |
Zhao et al. | Segmental neural net optimization for continuous speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PA0105 | International application |
Patent event date: 20170818 Patent event code: PA01051R01D Comment text: International Patent Application |
|
A201 | Request for examination | ||
PA0201 | Request for examination |
Patent event code: PA02012R01D Patent event date: 20170821 Comment text: Request for Examination of Application |
|
PG1501 | Laying open of application | ||
E902 | Notification of reason for refusal | ||
PE0902 | Notice of grounds for rejection |
Comment text: Notification of reason for refusal Patent event date: 20181129 Patent event code: PE09021S01D |
|
E701 | Decision to grant or registration of patent right | ||
PE0701 | Decision of registration |
Patent event code: PE07011S01D Comment text: Decision to Grant Registration Patent event date: 20190719 |
|
GRNT | Written decision to grant | ||
PR0701 | Registration of establishment |
Comment text: Registration of Establishment Patent event date: 20190731 Patent event code: PR07011E01D |
|
PR1002 | Payment of registration fee |
Payment date: 20190731 End annual number: 3 Start annual number: 1 |
|
PG1601 | Publication of registration | ||
PR1001 | Payment of annual fee |
Payment date: 20220704 Start annual number: 4 End annual number: 4 |
|
PR1001 | Payment of annual fee |
Payment date: 20230628 Start annual number: 5 End annual number: 5 |
|
PR1001 | Payment of annual fee |
Payment date: 20240702 Start annual number: 6 End annual number: 6 |