CN111722713A - Multi-mode fused gesture keyboard input method, device, system and storage medium - Google Patents
Multi-mode fused gesture keyboard input method, device, system and storage medium Download PDFInfo
- Publication number
- CN111722713A CN111722713A CN202010534338.0A CN202010534338A CN111722713A CN 111722713 A CN111722713 A CN 111722713A CN 202010534338 A CN202010534338 A CN 202010534338A CN 111722713 A CN111722713 A CN 111722713A
- Authority
- CN
- China
- Prior art keywords
- data
- gesture
- module
- information
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000005452 bending Methods 0.000 claims abstract description 44
- 230000004927 fusion Effects 0.000 claims abstract description 19
- 238000000605 extraction Methods 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims description 21
- 238000001914 filtration Methods 0.000 claims description 16
- 238000013528 artificial neural network Methods 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 15
- 230000003183 myoelectrical effect Effects 0.000 claims description 15
- 238000007781 pre-processing Methods 0.000 claims description 12
- 230000009467 reduction Effects 0.000 claims description 11
- 238000012549 training Methods 0.000 claims description 9
- 210000003205 muscle Anatomy 0.000 claims description 6
- 238000012937 correction Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 239000002184 metal Substances 0.000 claims description 3
- 230000005057 finger movement Effects 0.000 claims description 2
- 230000003993 interaction Effects 0.000 abstract description 4
- 230000002452 interceptive effect Effects 0.000 abstract description 3
- 210000003811 finger Anatomy 0.000 description 20
- 230000006870 function Effects 0.000 description 15
- 238000011176 pooling Methods 0.000 description 11
- 238000013527 convolutional neural network Methods 0.000 description 10
- 210000004027 cell Anatomy 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 210000004247 hand Anatomy 0.000 description 6
- 238000005259 measurement Methods 0.000 description 6
- 230000004913 activation Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001054 cortical effect Effects 0.000 description 1
- 238000002567 electromyography Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 210000005224 forefinger Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 210000004932 little finger Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/014—Hand-worn input/output arrangements, e.g. data gloves
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/015—Input arrangements based on nervous system activity detection, e.g. brain waves [EEG] detection, electromyograms [EMG] detection, electrodermal response detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/02—Preprocessing
- G06F2218/04—Denoising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/12—Classification; Matching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Psychiatry (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Social Psychology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Dermatology (AREA)
- Neurology (AREA)
- Neurosurgery (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a multi-mode fused gesture keyboard input method, equipment, a system and a storage medium, wherein the method comprises the following steps: obtaining IMU sensor data, myoelectricity sensor data and bending sensor data of a user key; shooting a hand area of a user to obtain hand image data; inputting the 4 preprocessed data into a corresponding classifier for feature extraction, and then performing decision fusion to identify the data as a corresponding gesture key input signal. The apparatus comprises: a data glove or head-mounted device; the system comprises: data gloves and head-mounted devices; the storage medium includes: a processor and a memory. The invention realizes the interaction between gestures and virtual keyboard characters in the head-mounted equipment, displays character images in real time, and presents an interactive interface with more three-dimensional scene, more abundant information and more natural and intimate environment for the keyboard input of people.
Description
Technical Field
The invention relates to the fields of gesture recognition, computer vision and human-computer interaction, in particular to a multi-mode fused gesture keyboard input method, device and system and a storage medium.
Background
Since birth, computers inevitably input various information such as operation commands and data information. The most basic function of an input device is to convert various forms of information into a form suitable for computer processing. The keyboard is a commonly used input device, which is composed of a group of switch matrixes, including: number keys, letter keys, symbol keys, function keys, control keys and the like. Each key has its unique code in the computer. When a key is pressed, the keyboard interface sends the binary code of the key into the computer host and displays the key character on the display. The keyboard interface circuit mostly adopts a single-chip microprocessor, and controls the work of the whole keyboard, such as self-checking of the keyboard, keyboard scanning, generation and sending of key codes, communication with a host computer and the like when the keyboard is powered on.
Although the physical keyboard input is the most common input mode, certain complex environments have certain limitations on the keyboard, for example, when the keyboard is used outdoors, the keyboard is carried with an external keyboard, but no fixed supporting surface is provided, and the keyboard is troublesome to use; moreover, the occupied space is large, and the carrying is very inconvenient.
Disclosure of Invention
The invention provides a multi-mode fused gesture keyboard input method, equipment, a system and a storage medium, which realize the interaction between gestures and virtual keyboard characters in head-mounted equipment, display character images in real time, present an interactive interface with more three-dimensional scene, more abundant information and more natural and intimate environment for the keyboard input of people, and are described in detail as follows:
a multi-modal fused gesture keyboard input method, the method comprising:
obtaining IMU sensor data, myoelectricity sensor data and bending sensor data of a user key;
shooting a hand area of a user to obtain hand image data;
inputting the 4 preprocessed data into a corresponding classifier for feature extraction, and then performing decision fusion to identify the data as a corresponding gesture key input signal.
A multi-modal fused gesture keyboard input device, the device comprising: a data glove, the data glove comprising:
the IMU sensor module is used for recording gestures when two hands move and motion information when a key is pressed;
the myoelectric sensor module is formed by connecting muscle pulse detection modules in a surrounding way, the inner side of the myoelectric sensor module is provided with a metal contact for detecting muscle pulses by being close to arms, and the myoelectric sensor module is placed at the lower arm and is connected with the micro control unit at the back of the hand through a data line;
the top layer of the bending sensor module consists of a first flexible film and a pressure-sensitive layer compounded on the first flexible film, and the bottom layer consists of a second flexible film and a conductive circuit compounded on the second flexible film and used for collecting a deformation signal generated when a finger is bent;
the first preprocessing module is used for filtering and denoising the collected motion data of the inertia measuring unit, the collected motion data of the myoelectric unit and the collected deformation data of the bending sensor.
A multi-modal fused gesture keyboard input device, the device comprising: a head-mounted device; the head-mounted device includes:
the binocular camera module is used for acquiring a gesture image of finger movement of a user, and recording image information of double-hand keys in a multi-frame mode;
the second preprocessing module is used for carrying out noise reduction processing on the acquired gesture image;
the characteristic extraction module is used for carrying out neural network classification training on the acquired inertial data information, the acquired electromyographic data information, the acquired bending deformation data information and the acquired gesture image data information to obtain respective gesture key identification prediction results;
and the decision fusion module is used for carrying out average weighting processing on the gesture key identification prediction result to obtain a final fusion prediction result, and an automatic error correction function of fuzzy intention reasoning is arranged in the module.
A multi-modal fused gesture keyboard input system, the system comprising: a data glove and a head-mounted device,
the data glove is used for acquiring inertia, myoelectricity and deformation motion information of the movement of the finger keys of the user, recording the gesture key motion information of the user, generating preprocessed finger motion information of the user through noise reduction and filtering processing, and then sending the preprocessed finger motion information to the head-mounted equipment;
the head-mounted equipment is used for acquiring image information of the movement of the finger key of the user and filtering and denoising the image; receiving three kinds of data information from the data glove, and then respectively carrying out neural network classification training on the three kinds of data information and the gesture image data information to obtain respective gesture key prediction results; and then carrying out average weighting processing on the gesture prediction results of the four models to obtain a final fusion prediction result, and then displaying the predicted characters.
A readable storage medium for multimodal fusion of gesture keyboard inputs, having stored thereon a computer program which, when executed by a processor, carries out the above-mentioned method steps.
The technical scheme provided by the invention has the beneficial effects that:
the invention aims to develop a gesture keyboard input system as a virtual reality input tool. Capturing gestures has applications in many situations, such as virtual reality video games, surgical training systems, sign language input, and so forth, and many methods have been developed. In the method, four types of sensing methods, namely a computer vision method, an Inertial Measurement Unit (IMU), an electromyography sensor and a bending sensor, are combined to obtain a better sampling result. It is well known that computer vision sensors can determine the pose of an object by linear or non-linear pose estimation. However, the sampling rate is typically limited by a camera frame rate of 30 to 60 frames per second. In addition, pose estimation also requires feature extraction and pose discovery, which makes the system very complex and slow. Typically, a low cost embedded system processor does not handle such computations well. The IMU sensor and the electromyographic sensor can obtain similar measurement results, and have higher sampling rate, such as 1000Hz or more. Due to the different forms of the outputs, such as velocity or acceleration in translation and rotation, signal integration is used to obtain absolute measurements of the translation position and the rotation angle. Thus, drift and noise can be problematic. Combining these four types of sensors, namely visual and IMU, myoelectric, morph, is a potential solution.
In summary, the method uses vision, IMU, myoelectricity, deformation based methods to measure hand gestures. For a vision-based method, a binocular camera attached to head-mounted equipment is used, and the scheme has strong robustness and is easy to implement. For data gloves, self-contained hardware is built based on some off-the-shelf electronics modules, which allows for better control of the algorithms for the sensor fusion scheme. In addition, finger tracking is also important for sensing hand gestures. Therefore, the invention adopts the flexible sensor to carry out system construction and test, and obtains satisfactory results. This makes the system suitable for a variety of real application scenarios, and even virtual reality applications.
Drawings
FIG. 1 is a schematic flow diagram of a gesture keyboard input method for a data glove;
FIG. 2 is a flow chart of a gesture keyboard input method for a head-mounted device;
FIG. 3 is a schematic diagram of a data glove hardware framework;
FIG. 4 is a circuit diagram;
FIG. 5 is a hardware schematic diagram of a head mounted device;
FIG. 6 is a process level software framework diagram.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
The invention provides a multi-mode fused gesture keyboard input method, equipment, a system and a storage medium thereof. Hand IMU sensor data, myoelectricity sensor data, the crooked degree data of bending sensor measurement hand that the integration data gloves obtained, hand image data that the head-mounted camera was shot carry out linkage processing to above-mentioned 4 data for gesture keyboard input system has very big promotion in environmental adaptation ability and instruction identification accuracy. The device is simple to wear and easy to operate.
Example 1
The embodiment of the invention provides a multi-mode fused gesture keyboard input method, which is shown in a figure 1 and a figure 2 and comprises the following steps:
acquiring IMU sensor data, myoelectricity sensor data and bending sensor data of a user key by wearing a data glove;
the IMU sensor data, the myoelectricity sensor data and the bending sensor data are filtered and denoised by the preprocessing module and then are sent to the head-mounted equipment through the Bluetooth module for processing; shooting a hand area of a user through a miniature camera on the head-mounted equipment to obtain hand image data of the user; and (3) putting the preprocessed IMU sensor data, myoelectric sensor data, deformation data (namely bending sensor data) and hand image data into corresponding classifiers for feature extraction and decision fusion, and identifying the hand image data as corresponding gesture key input signals.
Example 2
A multi-modal fused gesture keyboard input device, comprising: the data glove is a glove with a plurality of data gloves,
as shown in fig. 3, wherein the data glove comprises: the device comprises an IMU sensor module, a myoelectric sensor module, a bending sensor module, a first preprocessing module, a first Bluetooth module and a wireless charging module;
the IMU sensor module is a six-axis inertial measurement unit motion sensor and is used for recording gestures during movement of two hands and motion information during key pressing, the IMU sensor module comprises three-axis accelerometers for recording acceleration information and three-axis gyroscopes (x, y and z axes) for recording angular velocity information, and the IMU sensor module is provided with five sensors which are respectively positioned at the finger backs of five fingers and connected with a micro control unit at the hand back through a flexible circuit board;
the myoelectricity sensor module is formed by connecting six muscle pulse detection modules in a surrounding mode, the inner side of the myoelectricity sensor module is provided with a metal contact used for detecting muscle pulses by pressing close to arms, and the myoelectricity sensor module is placed at the lower arm and connected with the micro control unit at the back of the hand through a data line.
Flexible circuit boards are known to those skilled in the art, and the connection of the sensors on the glove device is realized via the circuit boards.
Referring to fig. 3, the object looped on the arm is the myoelectric sensor module. The long strip-shaped object connected by the hand is the bending sensor.
The top layer of the bending sensor module consists of a first flexible film and a pressure-sensitive layer compounded on the first flexible film, the bottom layer of the bending sensor module consists of a second flexible film and a conductive circuit compounded on the second flexible film, the bending sensor module is used for collecting deformation signals generated when fingers are bent so as to measure the bending degree of the fingers and distinguish key gestures, and each finger is attached to one bending sensor module and is connected with the micro control unit through the conductive circuit;
in concrete implementation, when the bending sensor module is bent, the first pressure-sensitive layer on the top layer can be connected with a circuit disconnected from the bottom layer, the resistance of the bending sensor changes along with the bending deformation degree, the bending sensor is similar to a variable resistance, and referring to fig. 4, in order to measure the resistance value R1 of the bending sensor, the sensor and a constant resistance R need to be connected2In series, by reading R2Voltage V across2From the formulaThe bending sensor resistance can be obtainedValue of wherein V1The relationship between the resistance and the degree of bending of the bending sensor used in the present invention is such that the higher the degree of bending, the smaller the resistance of the bending sensor, and therefore, the degree of bending and the constant resistance R of the bending sensor can be found from the relationship between the resistance and the degree of bending and the above formula2Voltage V across2Is in direct proportion. The higher the bending degree of the bending sensor, the lower the resistance of the bending sensor, and V2The larger the size, and vice versa V2The smaller. Thus passing through V2The value represents a gesture deformation signal.
In the specific implementation, the structure of the pressure-sensitive layer is well known to those skilled in the art, and the detailed description of the embodiment of the present invention is omitted.
The first preprocessing module is mainly used for filtering and denoising the collected motion data of the inertia measuring unit, the collected motion data of the myoelectricity unit and the collected deformation data of the bending sensor, and therefore the effectiveness of the data is guaranteed.
The first Bluetooth module mainly sends the data information of the inertia measurement unit, the myoelectricity movement unit and the bending sensor of the data glove to the mixed reality glasses for processing. In practical application, the mixed reality glasses are a known technology in the field, are head-mounted equipment mixed reality scene presenting equipment, and are added with a binocular camera module for shooting hand head portraits and other functional modules.
Wherein, wireless module of charging is mainly used for lasting for the data gloves charge, and wireless charging has also improved keyboard system's convenience.
Example 3
A multi-modal fused gesture keyboard input device, comprising: a head-mounted device for use in a portable electronic device,
as shown in fig. 5, the head-mounted device includes: the system comprises a binocular camera module, a second preprocessing module, a feature extraction module, a decision fusion module, a second Bluetooth module, a display module and a power supply module;
the binocular camera module is located the head-mounted apparatus below for acquire the gesture image of wearing data gloves user finger motion, adopt the binocular camera of 50 frames, be used for the image information of multiframe record both hands button:
the second preprocessing module is used for carrying out noise reduction processing on the acquired gesture image;
the characteristic extraction module is used for carrying out neural network classification training on inertial data information, myoelectric data information and bending deformation data information acquired by the data glove and gesture image data information acquired by a binocular camera module of the head-mounted equipment to obtain respective gesture key identification prediction results;
the decision fusion module is used for carrying out average weighting processing on the gesture prediction results of the four models obtained by the feature extraction module to obtain a final fusion prediction result, and the decision fusion module has an automatic error correction function of fuzzy intention reasoning; the second Bluetooth module is used for receiving inertia, myoelectricity and deformation data information sent by the data glove;
the display module is used for displaying the prediction result obtained by the decision fusion module in the head-mounted equipment;
the power module is used for supplying power to the head-mounted equipment.
Example 4
A multi-modal fused gesture keyboard input system, comprising: data gloves and head-mounted devices;
as shown in fig. 1, the data glove is configured to acquire inertia, myoelectricity, and deformation motion information of movement of a finger key of a user, record the gesture key motion information of the user, perform noise reduction filtering processing through a first preprocessing module of the data glove, generate preprocessed finger motion information of the user, and send the preprocessed finger motion information to the head-mounted device through a first bluetooth module of the data glove;
as shown in fig. 2, the head-mounted device is configured to acquire image information of the movement of the finger key of the user through an internal binocular camera module, and perform filtering and noise reduction on an image through a second preprocessing module of the head-mounted device; the second Bluetooth module is used for receiving three kinds of data information of the preprocessed gloves from the data gloves, then inputting the three kinds of data information and gesture image data information from the binocular camera module into the feature extraction module, and respectively carrying out neural network classification training to obtain respective gesture key prediction results; and then, inputting the results of the four data into a decision fusion module, carrying out average weighting processing on the gesture prediction results of the four models to obtain a final fusion prediction result, and then displaying the predicted characters through a display module.
Example 5
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the method steps of embodiment 1 described above.
Example 6
The data processing procedure in the above embodiments 1-4 is further described below with reference to specific calculation formulas, which are described in detail below:
first part data preprocessing
In the data processing part, after obtaining the multi-modal information of the input layer: the method comprises the steps of carrying out noise reduction and filtering on IMU motion information, adopting a Butterworth filter, band-pass filtering of 9-300Hz and a 50Hz wave trap, carrying out feature extraction after filtering, and carrying out feature dimension reduction on IMU motion sensor data due to the adoption of 1000Hz high sampling frequency.
As for the electromyographic motion information, the electromyographic sensor only needs to carry out filtering processing on the electromyographic signals because the electromyographic sensor internally comprises the amplifying circuit. The wavelet transformation can effectively represent the signal characteristics of a time domain and a frequency domain, so that all channel electromyographic data are filtered by using the wavelet transformation based on db4, and the dimension reduction method is the same as the above method;
for the data information of the bending sensor, the physiological structures of human hands are the same, so that the bending degrees are similar when the gesture is executed, and the influence of individual difference is small. And (3) acquiring the jumping points of the deformation signal and the electrical appliance noise of the element, selecting a Gaussian filter, and setting the window size of the Gaussian filter to be 5.
And carrying out noise reduction and filtering on image information acquired by the binocular camera, wherein a Chebyshev filter, 30hz-150hz band-pass filtering and a 50hz wave trap are adopted.
When the user uses the standard typing method (the thumb has no corresponding characters), when the user performs the key-press action, each key-press has a corresponding finger or two fingers to move, so that a pre-classification treatment can be performed, the two hands are equally divided into four categories of little finger, ring finger, middle finger and forefinger, namely if a key-press action is detected, the key-press action can be classified into one of the four categories according to a gesture signal.
As shown in fig. 6, the feature extraction part mainly includes two types, the first type is that the IMU and electromyogram information of the data glove adopt LSTM model, the second type is that the deformation information of the data glove and the image information of the binocular camera adopt CNN model. First, the first type of information processing, LSTM, is a special RNN that learns long-term dependencies and is suitable for processing long-term sequence information, and its internal structure has an input gate, a forgetting gate, and an output gate, and its internal operation principle is as follows: the LSTM first step is used to determine what information can pass through the memory cell, this determination is controlled by the forgetting gate layer through the activation function, which produces a value of 0 to 1 based on the output of the previous time and the input of the current time to determine whether to pass or partially pass the information learned at the previous time, and the formula of this step is as follows:
ft=σ(Wf·[ht-1,xt]+bf)
(1)
where σ is the activation function, ht-1Is the last time output, xtIs the current input, bfIs an offset, ftIs a forgetting door, WfIs the weight.
The second step is to generate new information that needs to be updated. The step comprises two parts, namely an input gate layer for determining which values are used for updating through a sigmoid activation function, and a tanh layer for generating new candidate valuesIt may be added to the memory unit as a candidate for the current layer generation. The values generated by the two parts are combined for updating.
it=σ(Wi·[ht-1,xt]+bi) (2)
Where σ is the activation function, ht-1Is the last time output, xtIs the current input, biAnd bcIs the offset, tanh is an activation function, ltIs an output gate, WiIs a weight, WcIs the weight.
The third step is to update the old memory cell by first multiplying the old memory cell by ftForget to lose the unwanted information and then re-communicate withAnd adding to obtain a candidate value. The formula is as follows:
wherein f istIs the output of the forgetting gate,is a candidate value for a new memory cell, Ct-1Is the value of the old memory cell, CtIs the value of the new cell.
The final step is the output of the LSTM model, firstly, an initial output is obtained through a sigmoid layer, and then C is processed by using tanhtScaling the value to be between-1 and 1, and multiplying the value by the output obtained by sigmoid pair by pair to obtain the output of the model, wherein the formula of the step is as follows:
Ot=σ(W0·[ht-1,xt]+b0) (5)
ht=Ot·tanh(Ct) (6)
wherein, OtIs the output of an output gate, W0Is the weight, ht-1Is the last momentOutput of (a), xtIs the current input, b0Is an offset, CtIs a new memory cell.
Second part neural network prediction
The second type of information processing employs a CNN (convolutional neural network), which is a type of feed-forward neural network that includes convolution calculation and has a deep structure, and the overall structure mode is input-convolutional layer-pooling layer-full-connection layer-output. The function of the convolutional layer is to extract the characteristics of input data, the convolutional layer internally comprises a plurality of convolutional kernels, and each element forming the convolutional kernels corresponds to a weight coefficient and a deviation amount, and is similar to a neuron of a feedforward neural network. After the feature extraction is performed on the convolutional layer, the output feature map is transmitted to the pooling layer for feature selection and information filtering. The pooling layer comprises a preset pooling function, the function of the pooling function is to replace the result of a single point in the feature map with the feature map statistic of the adjacent area, and the model adopts LPPooling, which is a type of pooling model developed by the inspired visual cortical inner-layer hierarchical structure, and is generally expressed in the form of:
in the formula (I), the compound is shown in the specification,is the output of the pooling model, S0Step length, (i, j) are pixels, p is a preset parameter, and the parameter p in the method is described by taking 1 as an example, that is, taking an average value in a pooling area, which is also called average pooling. The pooling layer is followed by a fully-connected layer, which in a convolutional neural network is equivalent to the hidden layer in a conventional feed-forward neural network. The fully-connected layer is located at the last part of the hidden layer of the convolutional neural network and only signals are transmitted to other fully-connected layers. The characteristic diagram loses a space topological structure in a full connection layer, is expanded into vectors and maps the vectors to an output end through an excitation function, and introduces a nonlinear factor, so that a neural network can arbitrarily approximate any nonlinear lineThe rod neural network can be applied to a plurality of nonlinear models to better fit gesture data through a linear function.
The CNN model and the LSTM model are obtained by pre-training in advance, and data of the pre-training model is data of a user in a suspended state by adopting a standard typing method. The two types of models output a probability matrix, the output of the CNN model is set as a probability matrix x, the output of the LSTM model is set as a probability matrix y, corresponding elements of four matrixes of the four types of data are weighted in an average manner to obtain a final prediction matrix z of the neural network, and z characters corresponding to the maximum probability are selected as output. The first type is that IMU and myoelectric information of data glove data adopt LSTM model, the second type is that deformation information of data glove and image information of binocular camera adopt CNN model, so each kind of data corresponds to a matrix. IMU data is output as a probability matrix y1 through an LSTM model, electromyographic data is output as a probability matrix y2 through the LSTM model, deformation data is output as a probability matrix x1 through a CNN model, and image data is output as a probability matrix x2 through the CNN model. And (4) carrying out average weighting on corresponding elements of four matrixes of the four data to obtain a final prediction matrix z of the neural network.
z is a x1+ b x2+ c y1+ d y 2. Where a, b, c, d, 1/4 are the average weighted weights.
Third part fuzzy-conscious reasoning
The main principle of the part is to utilize the prior knowledge and regularity of the pinyin of the Chinese characters, namely in the pinyin of the Chinese characters, the letters next following each letter are limited, and the probability of the occurrence of other letters is 0.
For example: the pinyin at the beginning of a is followed by 'i', 'o' and 'n', and the probability of other characters is set to be zero.
In conclusion, the invention simplifies the control mode of keyboard input and realizes the multi-modal gesture keyboard input technology. Through the virtual reality technology, the interaction between gestures and virtual keyboard characters is realized in the head-mounted equipment, the character images are displayed in real time, and an interactive interface with a more stereoscopic scene, richer information and a more natural and more intimate environment is presented for the keyboard input of people. Compared with a single sensor or an image control mode, the invention has the advantages of low classification risk, diversified classifiable modes, strong environmental adaptability and simple operation, and hands, eyes and a keyboard can be more naturally and coordinately matched in the input process, thereby fully realizing the comprehensive and dynamic advantage combination of the system. The gesture recognition mode based on the IMU, the myoelectricity, the bending sensor and the image has high reliability and recognition accuracy, compared with the gesture control mode based on the image recognition, the gesture recognition method is not influenced by ambient light and background color, the acquired data are stable, and the signal processing is simple. The invention is not easily influenced by weather such as haze, overcast and rainy, thunderstorm and the like which come suddenly when facing a complex environment, and can not be influenced when an object is accidentally shielded between a hand and the camera equipment. The virtual reality equipment can display some important information such as an input interface in a virtual screen in front of the display interface, and the character state input by the keyboard can be conveniently mastered in real time. The touch typing can be realized in an immersive mode, and the experience of being personally on the scene is brought to people.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (8)
1. A multi-modal fused gesture keyboard input method, the method comprising:
obtaining IMU sensor data, myoelectricity sensor data and bending sensor data of a user key;
shooting a hand area of a user to obtain hand image data;
inputting the 4 preprocessed data into a corresponding classifier for feature extraction, and then performing decision fusion to identify the data as a corresponding gesture key input signal.
2. The method according to claim 1, wherein the feature extraction is specifically:
extracting features of IMU sensor data and myoelectric sensor data by adopting an LSTM model; and (3) adopting a CNN (compressed natural number) model to extract characteristics of the bending sensor data and the hand image data.
3. The multi-modal fused gesture keyboard input method of claim 2,
the output of the CNN model is a probability matrix, the output of the LSTM model is another probability matrix, corresponding elements of four matrixes of the four data are weighted in an average mode to obtain a final prediction matrix of the neural network, and characters corresponding to the maximum probability in the prediction matrix are selected as output.
4. A multi-modal fused gesture keyboard input device, the device comprising: a data glove, the data glove comprising:
the IMU sensor module is used for recording gestures when two hands move and motion information when a key is pressed;
the myoelectric sensor module is formed by connecting muscle pulse detection modules in a surrounding way, the inner side of the myoelectric sensor module is provided with a metal contact for detecting muscle pulses by being close to arms, and the myoelectric sensor module is placed at the lower arm and is connected with the micro control unit at the back of the hand through a data line;
the top layer of the bending sensor module consists of a first flexible film and a pressure-sensitive layer compounded on the first flexible film, and the bottom layer consists of a second flexible film and a conductive circuit compounded on the second flexible film and used for collecting a deformation signal generated when a finger is bent;
the first preprocessing module is used for filtering and denoising the collected motion data of the inertia measuring unit, the collected motion data of the myoelectric unit and the collected deformation data of the bending sensor.
5. A multi-modal fused gesture keyboard input device, the device comprising: a head-mounted device; the head-mounted device includes:
the binocular camera module is used for acquiring a gesture image of finger movement of a user, and recording image information of double-hand keys in a multi-frame mode;
the second preprocessing module is used for carrying out noise reduction processing on the acquired gesture image;
the characteristic extraction module is used for carrying out neural network classification training on the acquired inertial data information, the acquired electromyographic data information, the acquired bending deformation data information and the acquired gesture image data information to obtain respective gesture key identification prediction results;
and the decision fusion module is used for carrying out average weighting processing on the gesture key identification prediction result to obtain a final fusion prediction result, and an automatic error correction function of fuzzy intention reasoning is arranged in the module.
6. The multi-modal fused gesture keyboard input device of claim 5, wherein the fuzzy intent inference is specifically: by using the prior knowledge and regularity of Chinese character pinyin, when a certain key is input, an error correction matrix is given according to the known last letter.
7. A multi-modal fused gesture keyboard input system, the system comprising: a data glove and a head-mounted device,
the data glove is used for acquiring inertia, myoelectricity and deformation motion information of the movement of the finger keys of the user, recording the gesture key motion information of the user, generating preprocessed finger motion information of the user through noise reduction and filtering processing, and then sending the preprocessed finger motion information to the head-mounted equipment;
the head-mounted equipment is used for acquiring image information of the movement of the finger key of the user and filtering and denoising the image; receiving three kinds of data information from the data glove, and then respectively carrying out neural network classification training on the three kinds of data information and the gesture image data information to obtain respective gesture key prediction results; and then carrying out average weighting processing on the gesture prediction results of the four models to obtain a final fusion prediction result, and then displaying the predicted characters.
8. A storage medium readable by a multi-modal fused gesture keyboard input, having a computer program stored thereon, which, when executed by a processor, performs the method steps of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010534338.0A CN111722713A (en) | 2020-06-12 | 2020-06-12 | Multi-mode fused gesture keyboard input method, device, system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010534338.0A CN111722713A (en) | 2020-06-12 | 2020-06-12 | Multi-mode fused gesture keyboard input method, device, system and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111722713A true CN111722713A (en) | 2020-09-29 |
Family
ID=72568061
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010534338.0A Pending CN111722713A (en) | 2020-06-12 | 2020-06-12 | Multi-mode fused gesture keyboard input method, device, system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111722713A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111158476A (en) * | 2019-12-25 | 2020-05-15 | 中国人民解放军军事科学院国防科技创新研究院 | Key identification method, system, equipment and storage medium of virtual keyboard |
CN112053421A (en) * | 2020-10-14 | 2020-12-08 | 腾讯科技(深圳)有限公司 | Signal noise reduction processing method, device, equipment and storage medium |
CN112434669A (en) * | 2020-12-14 | 2021-03-02 | 武汉纺织大学 | Multi-information fusion human behavior detection method and system |
CN112603758A (en) * | 2020-12-21 | 2021-04-06 | 上海交通大学宁波人工智能研究院 | Gesture recognition method based on sEMG and IMU information fusion |
CN113505822A (en) * | 2021-06-30 | 2021-10-15 | 中国矿业大学 | Multi-scale information fusion upper limb action classification method based on surface electromyographic signals |
CN113505711A (en) * | 2021-07-16 | 2021-10-15 | 北京理工大学 | Real-time gesture recognition system based on electromyographic signals and Leap Motion |
CN115170773A (en) * | 2022-05-24 | 2022-10-11 | 上海锡鼎智能科技有限公司 | Virtual classroom action interaction system and method based on metauniverse |
CN116301388A (en) * | 2023-05-11 | 2023-06-23 | 环球数科集团有限公司 | Man-machine interaction scene system for intelligent multi-mode combined application |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106527738A (en) * | 2016-12-08 | 2017-03-22 | 东北大学 | Multi-information somatosensory interaction glove system and method for virtual reality system |
CN110865704A (en) * | 2019-10-21 | 2020-03-06 | 浙江大学 | Gesture interaction device and method for 360-degree suspended light field three-dimensional display system |
-
2020
- 2020-06-12 CN CN202010534338.0A patent/CN111722713A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106527738A (en) * | 2016-12-08 | 2017-03-22 | 东北大学 | Multi-information somatosensory interaction glove system and method for virtual reality system |
CN110865704A (en) * | 2019-10-21 | 2020-03-06 | 浙江大学 | Gesture interaction device and method for 360-degree suspended light field three-dimensional display system |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111158476A (en) * | 2019-12-25 | 2020-05-15 | 中国人民解放军军事科学院国防科技创新研究院 | Key identification method, system, equipment and storage medium of virtual keyboard |
CN112053421A (en) * | 2020-10-14 | 2020-12-08 | 腾讯科技(深圳)有限公司 | Signal noise reduction processing method, device, equipment and storage medium |
CN112053421B (en) * | 2020-10-14 | 2023-06-23 | 腾讯科技(深圳)有限公司 | Signal noise reduction processing method, device, equipment and storage medium |
CN112434669A (en) * | 2020-12-14 | 2021-03-02 | 武汉纺织大学 | Multi-information fusion human behavior detection method and system |
CN112434669B (en) * | 2020-12-14 | 2023-09-26 | 武汉纺织大学 | Human body behavior detection method and system based on multi-information fusion |
CN112603758A (en) * | 2020-12-21 | 2021-04-06 | 上海交通大学宁波人工智能研究院 | Gesture recognition method based on sEMG and IMU information fusion |
CN113505822A (en) * | 2021-06-30 | 2021-10-15 | 中国矿业大学 | Multi-scale information fusion upper limb action classification method based on surface electromyographic signals |
CN113505822B (en) * | 2021-06-30 | 2022-02-15 | 中国矿业大学 | Multi-scale information fusion upper limb action classification method based on surface electromyographic signals |
CN113505711A (en) * | 2021-07-16 | 2021-10-15 | 北京理工大学 | Real-time gesture recognition system based on electromyographic signals and Leap Motion |
CN115170773A (en) * | 2022-05-24 | 2022-10-11 | 上海锡鼎智能科技有限公司 | Virtual classroom action interaction system and method based on metauniverse |
CN116301388A (en) * | 2023-05-11 | 2023-06-23 | 环球数科集团有限公司 | Man-machine interaction scene system for intelligent multi-mode combined application |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111722713A (en) | Multi-mode fused gesture keyboard input method, device, system and storage medium | |
US11567573B2 (en) | Neuromuscular text entry, writing and drawing in augmented reality systems | |
Shukor et al. | A new data glove approach for Malaysian sign language detection | |
CN102915111B (en) | A kind of wrist gesture control system and method | |
Luzhnica et al. | A sliding window approach to natural hand gesture recognition using a custom data glove | |
Ahmed et al. | Real-time sign language framework based on wearable device: analysis of MSL, DataGlove, and gesture recognition | |
CN111966217B (en) | Unmanned aerial vehicle control method and system based on gestures and eye movements | |
CN107589782A (en) | Method and apparatus for the ability of posture control interface of wearable device | |
CN111857334B (en) | Human gesture letter recognition method and device, computer equipment and storage medium | |
Drumond et al. | An LSTM recurrent network for motion classification from sparse data | |
CN111708433A (en) | Gesture data acquisition glove and sign language and gesture recognition method based on gesture data acquisition glove | |
Aditya et al. | Recent trends in HCI: A survey on data glove, LEAP motion and microsoft kinect | |
Jiang et al. | Development of a real-time hand gesture recognition wristband based on sEMG and IMU sensing | |
CN110633004A (en) | Interaction method, device and system based on human body posture estimation | |
CN111158476A (en) | Key identification method, system, equipment and storage medium of virtual keyboard | |
Nagalapuram et al. | Controlling media player with hand gestures using convolutional neural network | |
Enikeev et al. | Recognition of sign language using leap motion controller data | |
CN115023732A (en) | Information processing apparatus, information processing method, and information processing program | |
CN111640183A (en) | AR data display control method and device | |
Avadut et al. | A deep learning based iot framework for assistive healthcare using gesture based interface | |
Pansare et al. | Gestuelle: A system to recognize dynamic hand gestures using hidden Markov model to control windows applications | |
Rustagi et al. | Virtual Control Using Hand-Tracking | |
Dhamanskar et al. | Human computer interaction using hand gestures and voice | |
Khan et al. | Gesture recognition using Open-CV | |
Pradeep et al. | Advancement of sign language recognition through technology using python and OpenCV |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200929 |
|
WD01 | Invention patent application deemed withdrawn after publication |