CN111722713A

CN111722713A - Multi-mode fused gesture keyboard input method, device, system and storage medium

Info

Publication number: CN111722713A
Application number: CN202010534338.0A
Authority: CN
Inventors: 刘璇恒; 陶文源; 闫野; 邓宝松; 马权智; 赵涛; 印二威; 谢良
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2020-09-29

Abstract

The invention discloses a multi-mode fused gesture keyboard input method, equipment, a system and a storage medium, wherein the method comprises the following steps: obtaining IMU sensor data, myoelectricity sensor data and bending sensor data of a user key; shooting a hand area of a user to obtain hand image data; inputting the 4 preprocessed data into a corresponding classifier for feature extraction, and then performing decision fusion to identify the data as a corresponding gesture key input signal. The apparatus comprises: a data glove or head-mounted device; the system comprises: data gloves and head-mounted devices; the storage medium includes: a processor and a memory. The invention realizes the interaction between gestures and virtual keyboard characters in the head-mounted equipment, displays character images in real time, and presents an interactive interface with more three-dimensional scene, more abundant information and more natural and intimate environment for the keyboard input of people.

Description

Multi-mode fused gesture keyboard input method, device, system and storage medium

Technical Field

The invention relates to the fields of gesture recognition, computer vision and human-computer interaction, in particular to a multi-mode fused gesture keyboard input method, device and system and a storage medium.

Background

Since birth, computers inevitably input various information such as operation commands and data information. The most basic function of an input device is to convert various forms of information into a form suitable for computer processing. The keyboard is a commonly used input device, which is composed of a group of switch matrixes, including: number keys, letter keys, symbol keys, function keys, control keys and the like. Each key has its unique code in the computer. When a key is pressed, the keyboard interface sends the binary code of the key into the computer host and displays the key character on the display. The keyboard interface circuit mostly adopts a single-chip microprocessor, and controls the work of the whole keyboard, such as self-checking of the keyboard, keyboard scanning, generation and sending of key codes, communication with a host computer and the like when the keyboard is powered on.

Although the physical keyboard input is the most common input mode, certain complex environments have certain limitations on the keyboard, for example, when the keyboard is used outdoors, the keyboard is carried with an external keyboard, but no fixed supporting surface is provided, and the keyboard is troublesome to use; moreover, the occupied space is large, and the carrying is very inconvenient.

Disclosure of Invention

The invention provides a multi-mode fused gesture keyboard input method, equipment, a system and a storage medium, which realize the interaction between gestures and virtual keyboard characters in head-mounted equipment, display character images in real time, present an interactive interface with more three-dimensional scene, more abundant information and more natural and intimate environment for the keyboard input of people, and are described in detail as follows:

a multi-modal fused gesture keyboard input method, the method comprising:

obtaining IMU sensor data, myoelectricity sensor data and bending sensor data of a user key;

shooting a hand area of a user to obtain hand image data;

inputting the 4 preprocessed data into a corresponding classifier for feature extraction, and then performing decision fusion to identify the data as a corresponding gesture key input signal.

A multi-modal fused gesture keyboard input device, the device comprising: a data glove, the data glove comprising:

the IMU sensor module is used for recording gestures when two hands move and motion information when a key is pressed;

the myoelectric sensor module is formed by connecting muscle pulse detection modules in a surrounding way, the inner side of the myoelectric sensor module is provided with a metal contact for detecting muscle pulses by being close to arms, and the myoelectric sensor module is placed at the lower arm and is connected with the micro control unit at the back of the hand through a data line;

the top layer of the bending sensor module consists of a first flexible film and a pressure-sensitive layer compounded on the first flexible film, and the bottom layer consists of a second flexible film and a conductive circuit compounded on the second flexible film and used for collecting a deformation signal generated when a finger is bent;

the first preprocessing module is used for filtering and denoising the collected motion data of the inertia measuring unit, the collected motion data of the myoelectric unit and the collected deformation data of the bending sensor.

A multi-modal fused gesture keyboard input device, the device comprising: a head-mounted device; the head-mounted device includes:

the binocular camera module is used for acquiring a gesture image of finger movement of a user, and recording image information of double-hand keys in a multi-frame mode;

the second preprocessing module is used for carrying out noise reduction processing on the acquired gesture image;

the characteristic extraction module is used for carrying out neural network classification training on the acquired inertial data information, the acquired electromyographic data information, the acquired bending deformation data information and the acquired gesture image data information to obtain respective gesture key identification prediction results;

and the decision fusion module is used for carrying out average weighting processing on the gesture key identification prediction result to obtain a final fusion prediction result, and an automatic error correction function of fuzzy intention reasoning is arranged in the module.

A multi-modal fused gesture keyboard input system, the system comprising: a data glove and a head-mounted device,

the data glove is used for acquiring inertia, myoelectricity and deformation motion information of the movement of the finger keys of the user, recording the gesture key motion information of the user, generating preprocessed finger motion information of the user through noise reduction and filtering processing, and then sending the preprocessed finger motion information to the head-mounted equipment;

the head-mounted equipment is used for acquiring image information of the movement of the finger key of the user and filtering and denoising the image; receiving three kinds of data information from the data glove, and then respectively carrying out neural network classification training on the three kinds of data information and the gesture image data information to obtain respective gesture key prediction results; and then carrying out average weighting processing on the gesture prediction results of the four models to obtain a final fusion prediction result, and then displaying the predicted characters.

A readable storage medium for multimodal fusion of gesture keyboard inputs, having stored thereon a computer program which, when executed by a processor, carries out the above-mentioned method steps.

The technical scheme provided by the invention has the beneficial effects that:

the invention aims to develop a gesture keyboard input system as a virtual reality input tool. Capturing gestures has applications in many situations, such as virtual reality video games, surgical training systems, sign language input, and so forth, and many methods have been developed. In the method, four types of sensing methods, namely a computer vision method, an Inertial Measurement Unit (IMU), an electromyography sensor and a bending sensor, are combined to obtain a better sampling result. It is well known that computer vision sensors can determine the pose of an object by linear or non-linear pose estimation. However, the sampling rate is typically limited by a camera frame rate of 30 to 60 frames per second. In addition, pose estimation also requires feature extraction and pose discovery, which makes the system very complex and slow. Typically, a low cost embedded system processor does not handle such computations well. The IMU sensor and the electromyographic sensor can obtain similar measurement results, and have higher sampling rate, such as 1000Hz or more. Due to the different forms of the outputs, such as velocity or acceleration in translation and rotation, signal integration is used to obtain absolute measurements of the translation position and the rotation angle. Thus, drift and noise can be problematic. Combining these four types of sensors, namely visual and IMU, myoelectric, morph, is a potential solution.

In summary, the method uses vision, IMU, myoelectricity, deformation based methods to measure hand gestures. For a vision-based method, a binocular camera attached to head-mounted equipment is used, and the scheme has strong robustness and is easy to implement. For data gloves, self-contained hardware is built based on some off-the-shelf electronics modules, which allows for better control of the algorithms for the sensor fusion scheme. In addition, finger tracking is also important for sensing hand gestures. Therefore, the invention adopts the flexible sensor to carry out system construction and test, and obtains satisfactory results. This makes the system suitable for a variety of real application scenarios, and even virtual reality applications.

Drawings

FIG. 1 is a schematic flow diagram of a gesture keyboard input method for a data glove;

FIG. 2 is a flow chart of a gesture keyboard input method for a head-mounted device;

FIG. 3 is a schematic diagram of a data glove hardware framework;

FIG. 4 is a circuit diagram;

FIG. 5 is a hardware schematic diagram of a head mounted device;

FIG. 6 is a process level software framework diagram.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

The invention provides a multi-mode fused gesture keyboard input method, equipment, a system and a storage medium thereof. Hand IMU sensor data, myoelectricity sensor data, the crooked degree data of bending sensor measurement hand that the integration data gloves obtained, hand image data that the head-mounted camera was shot carry out linkage processing to above-mentioned 4 data for gesture keyboard input system has very big promotion in environmental adaptation ability and instruction identification accuracy. The device is simple to wear and easy to operate.

Example 1

The embodiment of the invention provides a multi-mode fused gesture keyboard input method, which is shown in a figure 1 and a figure 2 and comprises the following steps:

acquiring IMU sensor data, myoelectricity sensor data and bending sensor data of a user key by wearing a data glove;

the IMU sensor data, the myoelectricity sensor data and the bending sensor data are filtered and denoised by the preprocessing module and then are sent to the head-mounted equipment through the Bluetooth module for processing; shooting a hand area of a user through a miniature camera on the head-mounted equipment to obtain hand image data of the user; and (3) putting the preprocessed IMU sensor data, myoelectric sensor data, deformation data (namely bending sensor data) and hand image data into corresponding classifiers for feature extraction and decision fusion, and identifying the hand image data as corresponding gesture key input signals.

Example 2

A multi-modal fused gesture keyboard input device, comprising: the data glove is a glove with a plurality of data gloves,

as shown in fig. 3, wherein the data glove comprises: the device comprises an IMU sensor module, a myoelectric sensor module, a bending sensor module, a first preprocessing module, a first Bluetooth module and a wireless charging module;

the IMU sensor module is a six-axis inertial measurement unit motion sensor and is used for recording gestures during movement of two hands and motion information during key pressing, the IMU sensor module comprises three-axis accelerometers for recording acceleration information and three-axis gyroscopes (x, y and z axes) for recording angular velocity information, and the IMU sensor module is provided with five sensors which are respectively positioned at the finger backs of five fingers and connected with a micro control unit at the hand back through a flexible circuit board;

the myoelectricity sensor module is formed by connecting six muscle pulse detection modules in a surrounding mode, the inner side of the myoelectricity sensor module is provided with a metal contact used for detecting muscle pulses by pressing close to arms, and the myoelectricity sensor module is placed at the lower arm and connected with the micro control unit at the back of the hand through a data line.

Flexible circuit boards are known to those skilled in the art, and the connection of the sensors on the glove device is realized via the circuit boards.

Referring to fig. 3, the object looped on the arm is the myoelectric sensor module. The long strip-shaped object connected by the hand is the bending sensor.

The top layer of the bending sensor module consists of a first flexible film and a pressure-sensitive layer compounded on the first flexible film, the bottom layer of the bending sensor module consists of a second flexible film and a conductive circuit compounded on the second flexible film, the bending sensor module is used for collecting deformation signals generated when fingers are bent so as to measure the bending degree of the fingers and distinguish key gestures, and each finger is attached to one bending sensor module and is connected with the micro control unit through the conductive circuit;

in concrete implementation, when the bending sensor module is bent, the first pressure-sensitive layer on the top layer can be connected with a circuit disconnected from the bottom layer, the resistance of the bending sensor changes along with the bending deformation degree, the bending sensor is similar to a variable resistance, and referring to fig. 4, in order to measure the resistance value R1 of the bending sensor, the sensor and a constant resistance R need to be connected₂In series, by reading R₂Voltage V across₂From the formula

The bending sensor resistance can be obtainedValue of wherein V₁The relationship between the resistance and the degree of bending of the bending sensor used in the present invention is such that the higher the degree of bending, the smaller the resistance of the bending sensor, and therefore, the degree of bending and the constant resistance R of the bending sensor can be found from the relationship between the resistance and the degree of bending and the above formula₂Voltage V across₂Is in direct proportion. The higher the bending degree of the bending sensor, the lower the resistance of the bending sensor, and V₂The larger the size, and vice versa V₂The smaller. Thus passing through V₂The value represents a gesture deformation signal.

In the specific implementation, the structure of the pressure-sensitive layer is well known to those skilled in the art, and the detailed description of the embodiment of the present invention is omitted.

The first preprocessing module is mainly used for filtering and denoising the collected motion data of the inertia measuring unit, the collected motion data of the myoelectricity unit and the collected deformation data of the bending sensor, and therefore the effectiveness of the data is guaranteed.

The first Bluetooth module mainly sends the data information of the inertia measurement unit, the myoelectricity movement unit and the bending sensor of the data glove to the mixed reality glasses for processing. In practical application, the mixed reality glasses are a known technology in the field, are head-mounted equipment mixed reality scene presenting equipment, and are added with a binocular camera module for shooting hand head portraits and other functional modules.

Wherein, wireless module of charging is mainly used for lasting for the data gloves charge, and wireless charging has also improved keyboard system's convenience.

Example 3

A multi-modal fused gesture keyboard input device, comprising: a head-mounted device for use in a portable electronic device,

as shown in fig. 5, the head-mounted device includes: the system comprises a binocular camera module, a second preprocessing module, a feature extraction module, a decision fusion module, a second Bluetooth module, a display module and a power supply module;

the binocular camera module is located the head-mounted apparatus below for acquire the gesture image of wearing data gloves user finger motion, adopt the binocular camera of 50 frames, be used for the image information of multiframe record both hands button:

the characteristic extraction module is used for carrying out neural network classification training on inertial data information, myoelectric data information and bending deformation data information acquired by the data glove and gesture image data information acquired by a binocular camera module of the head-mounted equipment to obtain respective gesture key identification prediction results;

the decision fusion module is used for carrying out average weighting processing on the gesture prediction results of the four models obtained by the feature extraction module to obtain a final fusion prediction result, and the decision fusion module has an automatic error correction function of fuzzy intention reasoning; the second Bluetooth module is used for receiving inertia, myoelectricity and deformation data information sent by the data glove;

the display module is used for displaying the prediction result obtained by the decision fusion module in the head-mounted equipment;

the power module is used for supplying power to the head-mounted equipment.

Example 4

A multi-modal fused gesture keyboard input system, comprising: data gloves and head-mounted devices;

as shown in fig. 1, the data glove is configured to acquire inertia, myoelectricity, and deformation motion information of movement of a finger key of a user, record the gesture key motion information of the user, perform noise reduction filtering processing through a first preprocessing module of the data glove, generate preprocessed finger motion information of the user, and send the preprocessed finger motion information to the head-mounted device through a first bluetooth module of the data glove;

as shown in fig. 2, the head-mounted device is configured to acquire image information of the movement of the finger key of the user through an internal binocular camera module, and perform filtering and noise reduction on an image through a second preprocessing module of the head-mounted device; the second Bluetooth module is used for receiving three kinds of data information of the preprocessed gloves from the data gloves, then inputting the three kinds of data information and gesture image data information from the binocular camera module into the feature extraction module, and respectively carrying out neural network classification training to obtain respective gesture key prediction results; and then, inputting the results of the four data into a decision fusion module, carrying out average weighting processing on the gesture prediction results of the four models to obtain a final fusion prediction result, and then displaying the predicted characters through a display module.

Example 5

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the method steps of embodiment 1 described above.

Example 6

The data processing procedure in the above embodiments 1-4 is further described below with reference to specific calculation formulas, which are described in detail below:

first part data preprocessing

In the data processing part, after obtaining the multi-modal information of the input layer: the method comprises the steps of carrying out noise reduction and filtering on IMU motion information, adopting a Butterworth filter, band-pass filtering of 9-300Hz and a 50Hz wave trap, carrying out feature extraction after filtering, and carrying out feature dimension reduction on IMU motion sensor data due to the adoption of 1000Hz high sampling frequency.

As for the electromyographic motion information, the electromyographic sensor only needs to carry out filtering processing on the electromyographic signals because the electromyographic sensor internally comprises the amplifying circuit. The wavelet transformation can effectively represent the signal characteristics of a time domain and a frequency domain, so that all channel electromyographic data are filtered by using the wavelet transformation based on db4, and the dimension reduction method is the same as the above method;

for the data information of the bending sensor, the physiological structures of human hands are the same, so that the bending degrees are similar when the gesture is executed, and the influence of individual difference is small. And (3) acquiring the jumping points of the deformation signal and the electrical appliance noise of the element, selecting a Gaussian filter, and setting the window size of the Gaussian filter to be 5.

And carrying out noise reduction and filtering on image information acquired by the binocular camera, wherein a Chebyshev filter, 30hz-150hz band-pass filtering and a 50hz wave trap are adopted.

When the user uses the standard typing method (the thumb has no corresponding characters), when the user performs the key-press action, each key-press has a corresponding finger or two fingers to move, so that a pre-classification treatment can be performed, the two hands are equally divided into four categories of little finger, ring finger, middle finger and forefinger, namely if a key-press action is detected, the key-press action can be classified into one of the four categories according to a gesture signal.

As shown in fig. 6, the feature extraction part mainly includes two types, the first type is that the IMU and electromyogram information of the data glove adopt LSTM model, the second type is that the deformation information of the data glove and the image information of the binocular camera adopt CNN model. First, the first type of information processing, LSTM, is a special RNN that learns long-term dependencies and is suitable for processing long-term sequence information, and its internal structure has an input gate, a forgetting gate, and an output gate, and its internal operation principle is as follows: the LSTM first step is used to determine what information can pass through the memory cell, this determination is controlled by the forgetting gate layer through the activation function, which produces a value of 0 to 1 based on the output of the previous time and the input of the current time to determine whether to pass or partially pass the information learned at the previous time, and the formula of this step is as follows:

f_t＝σ(W_f·[h_t-1，x_t]+b_f)

(1)

where σ is the activation function, h_t-1Is the last time output, x_tIs the current input, b_fIs an offset, f_tIs a forgetting door, W_fIs the weight.

The second step is to generate new information that needs to be updated. The step comprises two parts, namely an input gate layer for determining which values are used for updating through a sigmoid activation function, and a tanh layer for generating new candidate values

It may be added to the memory unit as a candidate for the current layer generation. The values generated by the two parts are combined for updating.

i_t＝σ(W_i·[h_t-1，x_t]+b_i) (2)

Where σ is the activation function, h_t-1Is the last time output, x_tIs the current input, b_iAnd b_cIs the offset, tanh is an activation function, l_tIs an output gate, W_iIs a weight, W_cIs the weight.

The third step is to update the old memory cell by first multiplying the old memory cell by f_tForget to lose the unwanted information and then re-communicate with

And adding to obtain a candidate value. The formula is as follows:

wherein f is_tIs the output of the forgetting gate,

is a candidate value for a new memory cell, C_t-1Is the value of the old memory cell, C_tIs the value of the new cell.

The final step is the output of the LSTM model, firstly, an initial output is obtained through a sigmoid layer, and then C is processed by using tanh_tScaling the value to be between-1 and 1, and multiplying the value by the output obtained by sigmoid pair by pair to obtain the output of the model, wherein the formula of the step is as follows:

O_t＝σ(W₀·[h_t-1，x_t]+b₀) (5)

h_t＝O_t·tanh(C_t) (6)

wherein, O_tIs the output of an output gate, W₀Is the weight, h_t-1Is the last momentOutput of (a), x_tIs the current input, b₀Is an offset, C_tIs a new memory cell.

Second part neural network prediction

The second type of information processing employs a CNN (convolutional neural network), which is a type of feed-forward neural network that includes convolution calculation and has a deep structure, and the overall structure mode is input-convolutional layer-pooling layer-full-connection layer-output. The function of the convolutional layer is to extract the characteristics of input data, the convolutional layer internally comprises a plurality of convolutional kernels, and each element forming the convolutional kernels corresponds to a weight coefficient and a deviation amount, and is similar to a neuron of a feedforward neural network. After the feature extraction is performed on the convolutional layer, the output feature map is transmitted to the pooling layer for feature selection and information filtering. The pooling layer comprises a preset pooling function, the function of the pooling function is to replace the result of a single point in the feature map with the feature map statistic of the adjacent area, and the model adopts L_PPooling, which is a type of pooling model developed by the inspired visual cortical inner-layer hierarchical structure, and is generally expressed in the form of:

in the formula (I), the compound is shown in the specification,

is the output of the pooling model, S₀Step length, (i, j) are pixels, p is a preset parameter, and the parameter p in the method is described by taking 1 as an example, that is, taking an average value in a pooling area, which is also called average pooling. The pooling layer is followed by a fully-connected layer, which in a convolutional neural network is equivalent to the hidden layer in a conventional feed-forward neural network. The fully-connected layer is located at the last part of the hidden layer of the convolutional neural network and only signals are transmitted to other fully-connected layers. The characteristic diagram loses a space topological structure in a full connection layer, is expanded into vectors and maps the vectors to an output end through an excitation function, and introduces a nonlinear factor, so that a neural network can arbitrarily approximate any nonlinear lineThe rod neural network can be applied to a plurality of nonlinear models to better fit gesture data through a linear function.

The CNN model and the LSTM model are obtained by pre-training in advance, and data of the pre-training model is data of a user in a suspended state by adopting a standard typing method. The two types of models output a probability matrix, the output of the CNN model is set as a probability matrix x, the output of the LSTM model is set as a probability matrix y, corresponding elements of four matrixes of the four types of data are weighted in an average manner to obtain a final prediction matrix z of the neural network, and z characters corresponding to the maximum probability are selected as output. The first type is that IMU and myoelectric information of data glove data adopt LSTM model, the second type is that deformation information of data glove and image information of binocular camera adopt CNN model, so each kind of data corresponds to a matrix. IMU data is output as a probability matrix y1 through an LSTM model, electromyographic data is output as a probability matrix y2 through the LSTM model, deformation data is output as a probability matrix x1 through a CNN model, and image data is output as a probability matrix x2 through the CNN model. And (4) carrying out average weighting on corresponding elements of four matrixes of the four data to obtain a final prediction matrix z of the neural network.

z is a x1+ b x2+ c y1+ d y 2. Where a, b, c, d, 1/4 are the average weighted weights.

Third part fuzzy-conscious reasoning

The main principle of the part is to utilize the prior knowledge and regularity of the pinyin of the Chinese characters, namely in the pinyin of the Chinese characters, the letters next following each letter are limited, and the probability of the occurrence of other letters is 0.

For example: the pinyin at the beginning of a is followed by 'i', 'o' and 'n', and the probability of other characters is set to be zero.

In conclusion, the invention simplifies the control mode of keyboard input and realizes the multi-modal gesture keyboard input technology. Through the virtual reality technology, the interaction between gestures and virtual keyboard characters is realized in the head-mounted equipment, the character images are displayed in real time, and an interactive interface with a more stereoscopic scene, richer information and a more natural and more intimate environment is presented for the keyboard input of people. Compared with a single sensor or an image control mode, the invention has the advantages of low classification risk, diversified classifiable modes, strong environmental adaptability and simple operation, and hands, eyes and a keyboard can be more naturally and coordinately matched in the input process, thereby fully realizing the comprehensive and dynamic advantage combination of the system. The gesture recognition mode based on the IMU, the myoelectricity, the bending sensor and the image has high reliability and recognition accuracy, compared with the gesture control mode based on the image recognition, the gesture recognition method is not influenced by ambient light and background color, the acquired data are stable, and the signal processing is simple. The invention is not easily influenced by weather such as haze, overcast and rainy, thunderstorm and the like which come suddenly when facing a complex environment, and can not be influenced when an object is accidentally shielded between a hand and the camera equipment. The virtual reality equipment can display some important information such as an input interface in a virtual screen in front of the display interface, and the character state input by the keyboard can be conveniently mastered in real time. The touch typing can be realized in an immersive mode, and the experience of being personally on the scene is brought to people.

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A multi-modal fused gesture keyboard input method, the method comprising:

shooting a hand area of a user to obtain hand image data;

2. The method according to claim 1, wherein the feature extraction is specifically:

extracting features of IMU sensor data and myoelectric sensor data by adopting an LSTM model; and (3) adopting a CNN (compressed natural number) model to extract characteristics of the bending sensor data and the hand image data.

3. The multi-modal fused gesture keyboard input method of claim 2,

the output of the CNN model is a probability matrix, the output of the LSTM model is another probability matrix, corresponding elements of four matrixes of the four data are weighted in an average mode to obtain a final prediction matrix of the neural network, and characters corresponding to the maximum probability in the prediction matrix are selected as output.

4. A multi-modal fused gesture keyboard input device, the device comprising: a data glove, the data glove comprising:

5. A multi-modal fused gesture keyboard input device, the device comprising: a head-mounted device; the head-mounted device includes:

6. The multi-modal fused gesture keyboard input device of claim 5, wherein the fuzzy intent inference is specifically: by using the prior knowledge and regularity of Chinese character pinyin, when a certain key is input, an error correction matrix is given according to the known last letter.

7. A multi-modal fused gesture keyboard input system, the system comprising: a data glove and a head-mounted device,

8. A storage medium readable by a multi-modal fused gesture keyboard input, having a computer program stored thereon, which, when executed by a processor, performs the method steps of claim 1.