Sartiukova et al., 2023 - Google Patents

Remote Voice Control of Computer Based on Convolutional Neural Network

Sartiukova et al., 2023

Document ID: 1577792600495573540
Author: Sartiukova A; Markiv O; Vysotska V; Shakleina I; Sokulska N; Romanets I
Publication year: 2023
Publication venue: 2023 IEEE 12th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

External Links

Cited by

Snippet

This paper dwells upon the solution of the task of creating the voice assistant to control the personal computer based on Windows in English using third-party language engines and the voice recognition models. The Python programming language, the Speech Recognition …

Continue reading at ieeexplore.ieee.org (other versions)

238000013527 convolutional neural network 0 title description 11

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- G10L15/265—Speech recognisers specially adapted for particular applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Taking into account non-speech caracteristics
- G10L2015/228—Taking into account non-speech caracteristics of application context
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids transforming into visible information
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer

Similar Documents

Publication	Publication Date	Title
US11503155B2 (en)	2022-11-15	Interactive voice-control method and apparatus, device and medium
US11727914B2 (en)	2023-08-15	Intent recognition and emotional text-to-speech learning
JP7508533B2 (en)	2024-07-01	Speaker diarization using speaker embeddings and trained generative models
CN111933129B (en)	2021-01-05	Audio processing method, language model training method and device and computer equipment
US11087094B2 (en)	2021-08-10	System and method for generation of conversation graphs
WO2021051544A1 (en)	2021-03-25	Voice recognition method and device
CN107481720B (en)	2021-03-19	Explicit voiceprint recognition method and device
US11847168B2 (en)	2023-12-19	Training model with model-provided candidate action
WO2019018061A1 (en)	2019-01-24	Automatic integration of image capture and recognition in a voice-based query to understand intent
US11830482B2 (en)	2023-11-28	Method and apparatus for speech interaction, and computer storage medium
US11574637B1 (en)	2023-02-07	Spoken language understanding models
EP3593346B1 (en)	2024-01-10	Graphical data selection and presentation of digital content
CN114330371A (en)	2022-04-12	Session intention identification method and device based on prompt learning and electronic equipment
CN112002346A (en)	2020-11-27	Gender and age identification method, device, equipment and storage medium based on voice
Sartiukova et al.	2023	Remote Voice Control of Computer Based on Convolutional Neural Network
CN109887490A (en)	2019-06-14	The method and apparatus of voice for identification
WO2024114303A1 (en)	2024-06-06	Phoneme recognition method and apparatus, electronic device and storage medium
CN112150103B (en)	2023-11-28	Schedule setting method, schedule setting device and storage medium
OUKAS et al.	2024	ArabAlg: A new Dataset for Arabic Speech Commands Recognition for Machine Learning Purposes
CN113066473A (en)	2021-07-02	Voice synthesis method and device, storage medium and electronic equipment
Yang et al.	2023	Research and Design of Intelligent Voice Customer Service System
US12112752B1 (en)	2024-10-08	Cohort determination in natural language processing
WO2021139737A1 (en)	2021-07-15	Method and system for man-machine interaction
Rizvi et al.	2021	Speech recognition using long short term memory RNN
Zhou et al.	2023	Speech Emotion Recognition Based on 1D-CNNs-LSTM Hybrid Model