TWI717141B

TWI717141B - Gesture recognition method and mobile device

Info

Publication number: TWI717141B
Application number: TW108144983A
Authority: TW
Inventors: 張勝仁; 梁俊明; 劉郁昌; 劉旭航; 林家煌
Original assignee: 中華電信股份有限公司
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2021-01-21
Also published as: TW202123070A

Abstract

The invention provides a gesture recognition method and a mobile device. The method includes: obtaining a first image, and determining a first ambient light intensity based on the first image; and finding a first brightness level of the first ambient light intensity; selecting a first gesture recognition model corresponding to the first brightness level; identifying a first gesture appearing in the first image with the first gesture recognition model.

Description

Gesture recognition method and mobile device

本發明是有關於一種影像辨識技術，且特別是有關於一種手勢辨識方法及行動裝置。The present invention relates to an image recognition technology, and particularly relates to a gesture recognition method and mobile device.

隨著科技的發展，現今已有許多可讓使用者以輸入手勢的方式來控制電子裝置的機制。然而，由於現有的手勢辨識機制多半未因應於環境光強度的變化而選用不同的手勢辨識機制，從而令手勢辨識的效果不彰。With the development of technology, there are many mechanisms that allow users to input gestures to control electronic devices. However, since most of the existing gesture recognition mechanisms do not select different gesture recognition mechanisms in response to changes in the intensity of the ambient light, the effect of gesture recognition is poor.

有鑑於此，本發明提供一種手勢辨識方法及行動裝置，其可用於解決上述技術問題。In view of this, the present invention provides a gesture recognition method and mobile device, which can be used to solve the above technical problems.

本發明提供一種手勢辨識方法，適於一行動裝置，包括：取得一第一影像，並基於第一影像判定一第一環境光強度；從多個預設亮度級別中找出第一環境光強度中所屬的一第一亮度級別，其中前述預設亮度級別個別對應於多個手勢辨識模型；從前述手勢辨識模型中選擇對應於第一亮度級別的一第一手勢辨識模型；以第一手勢辨識模型辨識出現於第一影像中的一第一手勢。The present invention provides a gesture recognition method suitable for a mobile device, including: obtaining a first image, and determining a first ambient light intensity based on the first image; finding the first ambient light intensity from a plurality of preset brightness levels A first brightness level in which the aforementioned preset brightness levels respectively correspond to a plurality of gesture recognition models; select a first gesture recognition model corresponding to the first brightness level from the aforementioned gesture recognition models; use the first gesture recognition The model recognizes a first gesture appearing in the first image.

本發明提供一種行動裝置，其包括取像電路及處理器。處理器耦接取像電路，並經配置以：控制取像電路取得第一影像，並基於第一影像判定一第一環境光強度；從多個預設亮度級別中找出第一環境光強度中所屬的一第一亮度級別，其中前述預設亮度級別個別對應於多個手勢辨識模型；從前述手勢辨識模型中選擇對應於第一亮度級別的一第一手勢辨識模型；以第一手勢辨識模型辨識出現於第一影像中的一第一手勢。The invention provides a mobile device, which includes an image capturing circuit and a processor. The processor is coupled to the imaging circuit and is configured to: control the imaging circuit to obtain a first image, and determine a first ambient light intensity based on the first image; find the first ambient light intensity from a plurality of preset brightness levels A first brightness level in which the aforementioned preset brightness levels respectively correspond to a plurality of gesture recognition models; select a first gesture recognition model corresponding to the first brightness level from the aforementioned gesture recognition models; use the first gesture recognition The model recognizes a first gesture appearing in the first image.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.

概略而言，本發明的行動裝置可儲存有對應於不同預設亮度級別的多個手勢辨識模型，藉以在判定影像中的環境光強度之後，選用對應的手勢辨識模型來進行手勢辨識，藉以提升辨識的準確度，其細節詳述如下。In summary, the mobile device of the present invention can store multiple gesture recognition models corresponding to different preset brightness levels, so that after determining the ambient light intensity in the image, the corresponding gesture recognition model is selected for gesture recognition, thereby improving The details of the recognition accuracy are detailed below.

請參照圖1，其是依據本發明之一實施例繪示的行動裝置示意圖。在本發明的實施例中，行動裝置100例如是智慧眼鏡、智慧型手機、平板電腦或其他類似的智慧型裝置，但可不限於此。如圖1所示，行動裝置100可包括取像電路102及處理器104。取像電路102例如是電荷耦合元件（Charge coupled device，CCD）鏡頭、互補式金氧半電晶體（Complementary metal oxide semiconductor transistors，CMOS）鏡頭或其他任何可用於拍攝/擷取影像的電路，但可不限於此。Please refer to FIG. 1, which is a schematic diagram of a mobile device according to an embodiment of the present invention. In the embodiment of the present invention, the mobile device 100 is, for example, smart glasses, a smart phone, a tablet computer or other similar smart devices, but it may not be limited thereto. As shown in FIG. 1, the mobile device 100 may include an image capturing circuit 102 and a processor 104. The image capturing circuit 102 is, for example, a charge coupled device (CCD) lens, a complementary metal oxide semiconductor transistors (CMOS) lens or any other circuit that can be used to capture/capture images, but it may not Limited to this.

處理器104耦接於取像電路102，並可為一般用途處理器、特殊用途處理器、傳統的處理器、數位訊號處理器、多個微處理器（microprocessor）、一個或多個結合數位訊號處理器核心的微處理器、控制器、微控制器、特殊應用積體電路（Application Specific Integrated Circuit，ASIC）、現場可程式閘陣列電路（Field Programmable Gate Array，FPGA）、任何其他種類的積體電路、狀態機、基於進階精簡指令集機器（Advanced RISC Machine，ARM）的處理器以及類似品。The processor 104 is coupled to the image capturing circuit 102, and may be a general purpose processor, a special purpose processor, a traditional processor, a digital signal processor, multiple microprocessors, one or more combined digital signals Processor core microprocessors, controllers, microcontrollers, Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), any other types of integrated circuits Circuits, state machines, processors based on Advanced RISC Machine (ARM) and similar products.

在本發明的實施例中，處理器104可藉由存取/執行特定的模組、程式碼來實現本發明提出的手勢辨識方法，其細節詳述如下。In the embodiment of the present invention, the processor 104 can implement the gesture recognition method proposed by the present invention by accessing/executing specific modules and program codes. The details of the method are as follows.

請參照圖2，其是依據本發明之一實施例繪示的手勢辨識方法流程圖。本實施例的方法可由圖1的行動裝置100執行，以下即搭配圖1所示的元件說明圖2各步驟的細節。Please refer to FIG. 2, which is a flowchart of a gesture recognition method according to an embodiment of the present invention. The method of this embodiment can be executed by the mobile device 100 in FIG. 1. The details of each step in FIG. 2 are described below with the components shown in FIG. 1.

首先，在步驟S210中，處理器104可控制取像電路102取得第一影像，並基於第一影像判定第一環境光強度。在一實施例中，處理器104例如可調用適當的光強度辨識應用程式介面（application programming interface，API）（例如Android系統中的ARCore LightEstimate getPixelIntensity()的API）來判定第一影像所對應的第一環境光強度，但可不限於此。First, in step S210, the processor 104 may control the image capturing circuit 102 to obtain a first image, and determine the first ambient light intensity based on the first image. In one embodiment, the processor 104 may call an appropriate light intensity recognition application programming interface (API) (for example, the API of ARCore LightEstimate getPixelIntensity() in the Android system) to determine the first image corresponding to the first image. An ambient light intensity, but not limited to this.

之後，在步驟S220中，處理器104可從多個預設亮度級別中找出第一環境光強度中所屬的第一亮度級別，其中前述預設亮度級別個別對應於多個手勢辨識模型。After that, in step S220, the processor 104 may find the first brightness level to which the first ambient light intensity belongs from a plurality of preset brightness levels, where the preset brightness levels individually correspond to a plurality of gesture recognition models.

在一實施例中，上述手勢辨識模型可由其他裝置（例如電腦裝置）經過一定的訓練過程之後所產生，並提供予行動裝置100使用。為便於理解，以下將假設本發明所考慮的預設亮度級別包括明亮級別、一般級別及昏暗級別，且其個別對應於一亮度範圍。但其並非用以限定本發明可能的實施方式。In one embodiment, the aforementioned gesture recognition model can be generated by other devices (such as a computer device) after a certain training process, and provided to the mobile device 100 for use. For ease of understanding, it will be assumed below that the preset brightness levels considered in the present invention include a bright level, a normal level, and a dim level, and each of them corresponds to a brightness range. But it is not used to limit the possible implementation of the present invention.

具體而言，在上述訓練過程中，電腦裝置可先取得一批訓練影像資料，其可包括對應於不同環境光強度的各式手勢影像。之後，對於每個訓練影像資料，電腦裝置可調用適當的光強度辨識API（例如上述LightEstimate getPixelIntensity()的API），以取得各訓練影像資料對應的環境光強度（以

表示），並據以將各訓練影像資料分為對應於上述預設亮度級別的多個訓練資料群組。例如，電腦裝置可將

高於0.7的訓練影像資料歸類至對應於明亮級別的第一訓練資料群組，以及將

介於0.4及0.7之間的訓練影像資料歸類至對應於一般級別的第二訓練資料群組。針對

介於0.15及0.4之間的訓練影像資料，電腦裝置可將其歸類至對應於昏暗級別的第三訓練資料群組，但可不限於此。 Specifically, in the above training process, the computer device may first obtain a batch of training image data, which may include various gesture images corresponding to different ambient light intensities. After that, for each training image data, the computer device can call the appropriate light intensity recognition API (such as the above LightEstimate getPixelIntensity() API) to obtain the ambient light intensity corresponding to each training image data (in

Indicates), and each training image data is divided into multiple training data groups corresponding to the above-mentioned preset brightness level. For example, a computer device can

The training image data higher than 0.7 are classified into the first training data group corresponding to the brightness level, and the

The training image data between 0.4 and 0.7 are classified into the second training data group corresponding to the general level. against

The training image data between 0.15 and 0.4 can be classified by the computer device into the third training data group corresponding to the dim level, but it is not limited to this.

在取得對應於明亮級別、一般級別及昏暗級別等預設亮度級別的第一、第二及第三訓練資料群組之後，電腦裝置可基於各訓練資料群組訓練對應的手勢辨識模型。例如，電腦裝置可利用第一訓練資料群組訓練一第一卷積網路模型，以產生對應於明亮級別的手勢辨識模型。另外，電腦裝置可利用第二訓練資料群組訓練一第二卷積網路模型，以產生對應於明亮級別的手勢辨識模型。再者，電腦裝置可利用第三訓練資料群組訓練一第三卷積網路模型，以產生對應於明亮級別的手勢辨識模型。After obtaining the first, second, and third training data groups corresponding to preset brightness levels such as the bright level, the normal level, and the dim level, the computer device can train a corresponding gesture recognition model based on each training data group. For example, the computer device can use the first training data group to train a first convolutional network model to generate a gesture recognition model corresponding to the brightness level. In addition, the computer device can use the second training data group to train a second convolutional network model to generate a gesture recognition model corresponding to the brightness level. Furthermore, the computer device can use the third training data group to train a third convolutional network model to generate a gesture recognition model corresponding to the brightness level.

在一實施例中，對於各訓練資料群組，電腦裝置可使用TensorFlow Lite建立輕量化靜態手勢的多層卷積網路模型，其例如包括N個卷積層作為特徵提取層，而每一個卷積層代表了進行一次的卷積(Convolution2D)、一次的批次正規化(Batch Normalization)與一次的線性整流函數(ReLU)以增加非線性特性。在一實施例中，接續卷積層的輸出層可回歸出176個值，其分別代表著八個預設錨箱，其長與寬的像素值分別為(2.26, 3.78)、(2.89,5.77)、(3.25, 4.01)、(3.63, 5.38)、(3.85, 6.79)、(4.56, 4.38)、(4.81, 7.65)、(5.29, 5.36)與其對應之座標值、物件信心值與各類別之分別的信心值，以賦予其辨別多種手勢（例如圖3所例示的17種手勢（即，手勢0~16））之能力。In one embodiment, for each training data group, the computer device can use TensorFlow Lite to create a lightweight static gesture multi-layer convolutional network model, which includes, for example, N convolutional layers as feature extraction layers, and each convolutional layer represents A convolution (Convolution2D), a batch normalization (Batch Normalization) and a linear rectification function (ReLU) are performed once to increase the nonlinear characteristics. In one embodiment, the output layer of the convolutional layer can return 176 values, which represent eight preset anchor boxes, and the pixel values of the length and width are (2.26, 3.78) and (2.89, 5.77). , (3.25, 4.01), (3.63, 5.38), (3.85, 6.79), (4.56, 4.38), (4.81, 7.65), (5.29, 5.36) and their corresponding coordinate values, object confidence values and the difference between each category The confidence value of to give it the ability to distinguish multiple gestures (such as the 17 gestures illustrated in Figure 3 (ie, gestures 0 to 16)).

在一實施例中，電腦裝置還可使用imgaug套件並編寫相關自動化程式處理每一張蒐集的訓練影像資料，而相關的處理項目例如包括：左右翻轉(Fliplr)、上下左右平移(translate)、縮放(scale)、模糊(GaussianBlur, averageBlur, MedianBlur)、雜訊(AdditiveGaussianNoise)、挖空(Dropout)、明暗度(Add, Multiply)、增加對比度(ContrastNormalization)，但可不限於此。In one embodiment, the computer device can also use the imgaug package and write related automated programs to process each of the collected training image data, and related processing items include, for example: Fliplr, translate, zoom (scale), blur (GaussianBlur, averageBlur, MedianBlur), noise (AdditiveGaussianNoise), knockout (Dropout), brightness (Add, Multiply), increase contrast (ContrastNormalization), but not limited to this.

換言之，上述對應於各預設亮度級別的手勢辨識模型皆可用於辨識各式手勢（例如圖3所例示的17種手勢），惟所對應的環境光強度有所不同。並且，在完成上述手勢辨識模型的訓練之後，電腦裝置可將各手勢辨識模型以TensorFlow Lite轉換器進行壓縮，並將壓縮後的手勢辨識模型提供予行動裝置100，以讓行動裝置100據以進行手勢辨識操作，但可不限於此。In other words, the aforementioned gesture recognition model corresponding to each preset brightness level can be used to recognize various gestures (for example, the 17 gestures illustrated in FIG. 3), but the corresponding ambient light intensity is different. Moreover, after completing the training of the aforementioned gesture recognition model, the computer device can compress each gesture recognition model with a TensorFlow Lite converter, and provide the compressed gesture recognition model to the mobile device 100 so that the mobile device 100 can proceed accordingly. Gesture recognition operation, but not limited to this.

在一實施例中，上述手勢辨識模型個別可包括N個卷積層及1個輸出層，且N例如是小於等於15的正整數。在一較佳實施例中，N例如可小於等於12。在一實施例中，N可等於10，而其相應的配置可如下表1所示。類型神經元數量步幅輸出維度卷積層1 32 2 112x112 卷積層2 32 1 112x112 卷積層3 64 2 56x56 卷積層4 32 1 56x56 卷積層5 128 2 28x28 卷積層6 61 1 28x28 卷積層7 256 2 14x14 卷積層8 128 1 14x14 卷積層9 256 2 7x7 卷積層10 256 1 7x7 輸出層 8(4+1+17)=176 1 7x7 表1 在本發明的實施例中，由於所採用的卷積層數量較少，故可令行動裝置100能夠以較佳的效率進行後續的手勢辨識操作。 In an embodiment, the aforementioned gesture recognition model may individually include N convolutional layers and one output layer, and N is, for example, a positive integer less than or equal to 15. In a preferred embodiment, N may be less than or equal to 12, for example. In an embodiment, N may be equal to 10, and its corresponding configuration may be as shown in Table 1 below. Types of Number of neurons Stride Output dimension Convolutional layer 1 32 2 112x112 Convolutional layer 2 32 1 112x112 Convolutional layer 3 64 2 56x56 Convolutional layer 4 32 1 56x56 Convolutional layer 5 128 2 28x28 Convolutional layer 6 61 1 28x28 Convolutional layer 7 256 2 14x14 Convolutional layer 8 128 1 14x14 Convolutional layer 9 256 2 7x7 Convolutional layer 10 256 1 7x7 Output layer 8(4+1+17)=176 1 7x7 Table 1 In the embodiment of the present invention, since the number of convolutional layers used is small, the mobile device 100 can perform subsequent gesture recognition operations with better efficiency.

因此，在取得第一影像對應的第一環境光強度之後，處理器104例如可判定其位於上述預設亮度級別中何者的亮度範圍內，以決定第一環境光強度對應的第一亮度級別。例如，若第一環境光強度高於0.7，則第一環境光強度對應的第一亮度級別例如是明亮級別。若第一環境光強度介於0.4及0.7，則第一環境光強度對應的第一亮度級別例如是一般級別。第一環境光強度0.15及0.4，則第一環境光強度對應的第一亮度級別例如是昏暗級別，但可不限於此。Therefore, after obtaining the first ambient light intensity corresponding to the first image, the processor 104 may, for example, determine which of the preset brightness levels is within the brightness range to determine the first brightness level corresponding to the first ambient light intensity. For example, if the first ambient light intensity is higher than 0.7, the first brightness level corresponding to the first ambient light intensity is, for example, a bright level. If the first ambient light intensity is between 0.4 and 0.7, the first brightness level corresponding to the first ambient light intensity is, for example, a normal level. If the first ambient light intensity is 0.15 and 0.4, the first brightness level corresponding to the first ambient light intensity is, for example, a dim level, but it is not limited to this.

接著，在步驟S230中，處理器104可從前述手勢辨識模型中選擇對應於第一亮度級別的第一手勢辨識模型。亦即，處理器104可從對應於明亮級別、一般級別及昏暗級別的手勢辨識模型中，取出對應於第一亮度級別的手勢辨識模型作為第一手勢辨識模型。Next, in step S230, the processor 104 may select a first gesture recognition model corresponding to the first brightness level from the aforementioned gesture recognition models. That is, the processor 104 may extract the gesture recognition model corresponding to the first brightness level from the gesture recognition models corresponding to the bright level, the normal level, and the dim level as the first gesture recognition model.

之後，在步驟S240中，處理器104可以第一手勢辨識模型辨識出現於第一影像中的第一手勢。After that, in step S240, the processor 104 may recognize the first gesture appearing in the first image by the first gesture recognition model.

由上可知，本發明的行動裝置100可依據影像所對應的環境光強度選用對應的手勢辨識模型，藉以達到較佳的手勢辨識效果。It can be seen from the above that the mobile device 100 of the present invention can select the corresponding gesture recognition model according to the ambient light intensity corresponding to the image, so as to achieve a better gesture recognition effect.

此外，在一些實施例中，由於行動裝置100周圍的環境光強度可能隨時間而改變，故本發明更提出了替換所採用的手勢辨識模型的機制，以適應性地依據當下的環境光強度選用適合的手勢辨識模型，相關細節將輔以圖4作說明。In addition, in some embodiments, since the ambient light intensity around the mobile device 100 may change over time, the present invention further proposes a mechanism for replacing the adopted gesture recognition model to adaptively select the model based on the current ambient light intensity. For a suitable gesture recognition model, the relevant details will be illustrated in Figure 4.

請參照圖4，其是依據本發明之一實施例繪示的適應性替換手勢辨識模型的流程圖。在本實施例中，在選用第一手勢辨識模型進行手勢辨識操作之後，處理器104可接著在步驟S401中控制取像電路102取得多個第二影像，並判定前述第二影像對應的多個第二環境光強度。之後，在步驟S402中，處理器104可以第一手勢辨識模型辨識出現於前述第二影像中的多個第二手勢，其中前述第二手勢個別對應於多個辨識信心值（即第一手勢辨識模型對於各第二手勢的辨識信心程度）。Please refer to FIG. 4, which is a flowchart of an adaptive replacement gesture recognition model according to an embodiment of the present invention. In this embodiment, after selecting the first gesture recognition model to perform the gesture recognition operation, the processor 104 can then control the imaging circuit 102 to obtain a plurality of second images in step S401, and determine the plurality of second images corresponding to the aforementioned second images. The second ambient light intensity. After that, in step S402, the processor 104 may recognize a plurality of second gestures appearing in the aforementioned second image using the first gesture recognition model, wherein the aforementioned second gestures individually correspond to multiple recognition confidence values (ie, the first The degree of confidence in the recognition of each second gesture by the gesture recognition model).

接著，在步驟S403中，處理器104可對前述辨識信心值執行第一K-means演算法，以將前述辨識信心值區分為多個信心值群組。在一實施例中，處理器104可將上述辨識信心值依對應的時間繪製為一圖式，此圖式的橫軸例如是時間，縱軸例如是辨識信心值。之後，處理器104可基於此圖式的內容執行第一K-means演算法，以將對應於不同時間的辨識信心值區分為多個（例如K1個）信心值群組。Next, in step S403, the processor 104 may execute the first K-means algorithm on the aforementioned recognition confidence value to classify the aforementioned recognition confidence value into multiple confidence value groups. In one embodiment, the processor 104 may plot the above-mentioned recognition confidence value according to the corresponding time as a graph. The horizontal axis of the graph is time, for example, and the vertical axis is the recognition confidence value, for example. After that, the processor 104 may execute the first K-means algorithm based on the content of the schema to classify the recognition confidence values corresponding to different times into multiple (for example, K1) confidence value groups.

接著，處理器104可在步驟S404中取得各信心值群組中的信心參考值（例如是各信心值群組的質心，但可不限於此），並據以估計前述信心值群組對應的信心平均值。亦即，處理器104可計算各信心值群組中的信心參考值（共K1個）的平均值作為上述信心平均值。Then, the processor 104 may obtain the confidence reference value of each confidence value group (for example, the centroid of each confidence value group, but not limited to this) in step S404, and estimate the corresponding confidence value group Confidence average. That is, the processor 104 may calculate the average value of the confidence reference values (K1 in total) in each confidence value group as the above-mentioned confidence average value.

之後，處理器104可在步驟S405判斷此信心平均值是否高於一信心門限值（例如0.65）。在一實施例中，反應於處理器104判定信心平均值高於一信心門限值，此代表第一手勢辨識模型的辨識效果仍具一定水準，故處理器104可執行步驟S406以判定不需更換第一手勢辨識模型。然而，在另一實施例中，若處理器104判定信心平均值未高於信心門限值，此代表第一手勢辨識模型的辨識效果已逐漸下降，故處理器104可進一步執行以下手段以決定是否替換第一手勢辨識模型。After that, the processor 104 may determine whether the confidence average value is higher than a confidence threshold (for example, 0.65) in step S405. In one embodiment, in response to the processor 104 determining that the confidence average value is higher than a confidence threshold, which represents that the recognition effect of the first gesture recognition model is still at a certain level, the processor 104 may perform step S406 to determine that no replacement is required The first gesture recognition model. However, in another embodiment, if the processor 104 determines that the confidence average value is not higher than the confidence threshold, this means that the recognition effect of the first gesture recognition model has gradually decreased. Therefore, the processor 104 may further execute the following methods to determine whether Replace the first gesture recognition model.

具體而言，處理器104可在步驟S407中對前述第二影像對應的前述第二環境光強度執行第二K-means演算法，以將前述第二環境光強度區分為多個光強度群組。在一實施例中，處理器104可將上述第二環境光強度依對應的時間繪製為一圖式，此圖式的橫軸例如是時間，縱軸例如是環境光強度。之後，處理器104可基於此圖式的內容執行第二K-means演算法，以將對應於不同時間的第二環境光強度區分為多個（例如K2個）光強度群組。Specifically, the processor 104 may perform a second K-means algorithm on the second ambient light intensity corresponding to the second image in step S407, so as to divide the second ambient light intensity into multiple light intensity groups . In an embodiment, the processor 104 may plot the above-mentioned second ambient light intensity according to the corresponding time as a graph, where the horizontal axis of the graph is, for example, time, and the vertical axis is, for example, the ambient light intensity. After that, the processor 104 may execute a second K-means algorithm based on the content of this pattern to classify the second ambient light intensity corresponding to different times into multiple (for example, K2) light intensity groups.

之後，處理器104可在步驟S408取得各光強度群組中的光強度參考值（例如是各光強度群組的質心，但可不限於此），並據以估計前述光強度群組對應的光強度平均值。亦即，處理器104可計算各光強度群組中的光強度參考值（共K2個）的平均值作為上述光強度平均值。After that, the processor 104 may obtain the light intensity reference value in each light intensity group (for example, the centroid of each light intensity group, but not limited to this) in step S408, and estimate the corresponding light intensity group Average light intensity. That is, the processor 104 may calculate the average value of the light intensity reference values (K2 in total) in each light intensity group as the above-mentioned average light intensity.

接著，處理器104可在步驟S409判斷此光強度平均值是否低於一光強度門限值（例如0.15）。若是，此即代表當下的環境光強度可能已過於低落，不適於進行後續的手勢辨識操作，故處理器104可在步驟S410中判定停止執行手勢辨識操作。Next, the processor 104 may determine whether the average light intensity is lower than a light intensity threshold (for example, 0.15) in step S409. If so, this means that the current ambient light intensity may be too low to be suitable for subsequent gesture recognition operations, so the processor 104 may determine to stop performing the gesture recognition operations in step S410.

另外，若處理器104判定光強度平均值未低於光強度門限值，則處理器104可在步驟S411中判斷光強度平均值是否屬於第一亮度級別。若是，即代表第一手勢辨識模型仍適用於當下的環境，故處理器104可執行步驟S406以判定不需更換第一手勢辨識模型。在一實施例中，處理器104可進一步將所考慮的第二影像擷取並發送至後端影像庫伺服器，以進一步分析辨識信心值不高的原因，但可不限於此。In addition, if the processor 104 determines that the average light intensity is not lower than the light intensity threshold, the processor 104 may determine whether the average light intensity belongs to the first brightness level in step S411. If so, it means that the first gesture recognition model is still applicable to the current environment, so the processor 104 may execute step S406 to determine that the first gesture recognition model does not need to be changed. In one embodiment, the processor 104 may further capture and send the considered second image to the back-end image database server to further analyze the reason for the low recognition confidence value, but it is not limited to this.

另一方面，若處理器104判定光強度平均值不屬於第一亮度級別，此即代表第一手勢辨識模型已不適用於當下的環境。因此，處理器104可執行步驟S412以從前述預設亮度級別中找出光強度平均值所屬的第二亮度級別，並從前述手勢辨識模型中選擇對應於第二亮度級別的第二手勢辨識模型，而此步驟的作法類似於步驟S220，故此處不再另行贅述。之後，處理器104可在步驟S413中以第二手勢辨識模型取代第一手勢辨識模型，並接續執行手勢辨識操作。On the other hand, if the processor 104 determines that the average light intensity does not belong to the first brightness level, it means that the first gesture recognition model is no longer suitable for the current environment. Therefore, the processor 104 may perform step S412 to find the second brightness level to which the average light intensity belongs from the aforementioned preset brightness levels, and select the second gesture recognition corresponding to the second brightness level from the aforementioned gesture recognition model. Model, and the method of this step is similar to step S220, so it will not be repeated here. After that, the processor 104 may replace the first gesture recognition model with the second gesture recognition model in step S413, and continue to perform the gesture recognition operation.

由上可知，本發明還可適應性地依據當下的環境光變化改變所選用的手勢辨識模型，藉以在各式環境中皆具有較佳的辨識效果。It can be seen from the above that the present invention can also adaptively change the selected gesture recognition model according to the current ambient light changes, so that it has a better recognition effect in various environments.

請參照圖5，其是依據本發明之一實施例繪示的應用情境圖。在本實施例中，假設參觀民眾佩戴有本發明提出的行動裝置100（例如AR眼鏡）於展覽館內移動。當參觀民眾移動至昏暗環境中，本發明的行動裝置100可自動地選用對應的手勢辨識模型（例如是對應於昏暗級別的模型）來進行手勢辨識操作。之後，當參觀民眾從昏暗環境移動至明亮環境之後，本發明的行動裝置100可自動地選用其他的手勢辨識模型（例如是對應於明亮級別的模型）來進行手勢辨識操作。藉此，本發明可以在各式不同亮度的環境中皆具有較佳的辨識效果。Please refer to FIG. 5, which is an application scenario diagram drawn according to an embodiment of the present invention. In this embodiment, it is assumed that visitors wearing the mobile device 100 (such as AR glasses) proposed by the present invention move in the exhibition hall. When the visitors move to a dark environment, the mobile device 100 of the present invention can automatically select a corresponding gesture recognition model (for example, a model corresponding to a dark level) to perform gesture recognition operations. Later, when the visitors move from a dark environment to a bright environment, the mobile device 100 of the present invention can automatically select other gesture recognition models (for example, models corresponding to the brightness level) to perform gesture recognition operations. In this way, the present invention can have a better recognition effect in various environments with different brightness.

在其他實施例中，本發明還可因應於所偵測到的手勢調整控制行動裝置100的方式，以下將作進一步說明。In other embodiments, the present invention can also adjust the way of controlling the mobile device 100 according to the detected gesture, which will be further described below.

概略而言，本發明的行動裝置100所進行的手勢辨識操作大致可具有以下三種模式：靜態手勢模式、動態手勢模式及滑鼠手勢模式。在靜態手勢中，行動裝置100可依據所偵測到的手勢執行對應的單一特定操作。在一實施例中，假設行動裝置100於靜態手勢模式中偵測到使用者輸入第一預設手勢（例如圖3的手勢1），並以特定方式（例如大幅橫移）移動第一預設手勢，則行動裝置100可從靜態手勢模式切換為滑鼠手勢模式。In summary, the gesture recognition operation performed by the mobile device 100 of the present invention can roughly have the following three modes: a static gesture mode, a dynamic gesture mode, and a mouse gesture mode. In the static gesture, the mobile device 100 can perform a corresponding single specific operation according to the detected gesture. In one embodiment, it is assumed that the mobile device 100 detects that the user inputs a first preset gesture (such as gesture 1 in FIG. 3) in the static gesture mode, and moves the first preset in a specific manner (such as a large horizontal movement) Gesture, the mobile device 100 can switch from the static gesture mode to the mouse gesture mode.

在滑鼠手勢模式中，行動裝置100可擷取使用者手勢中手指頭的移動軌跡，並相應地將此移動軌跡顯示於顯示器上，以供使用者參考。在一實施例中，假設行動裝置100於滑鼠手勢模式中偵測到使用者輸入第二預設手勢（例如圖3的手勢7），則行動裝置100可從滑鼠手勢模式切換為靜態手勢模式。In the mouse gesture mode, the mobile device 100 can capture the movement track of the finger in the user's gesture, and accordingly display the movement track on the display for the user's reference. In one embodiment, assuming that the mobile device 100 detects a second preset gesture input by the user in the mouse gesture mode (for example, gesture 7 in FIG. 3), the mobile device 100 can switch from the mouse gesture mode to a static gesture mode.

另外，假設行動裝置100於靜態手勢模式中偵測到使用者輸入無大幅移動的第三預設手勢（例如圖3的手勢5），則行動裝置100可從靜態手勢模式切換為動態手勢模式。在動態手勢模式中，行動裝置100可依據使用者手勢的揮動方向切換行動裝置100所顯示的頁面，但可不限於此。In addition, assuming that the mobile device 100 detects a third preset gesture (such as gesture 5 in FIG. 3) that does not move significantly in the static gesture mode, the mobile device 100 can switch from the static gesture mode to the dynamic gesture mode. In the dynamic gesture mode, the mobile device 100 can switch the page displayed by the mobile device 100 according to the swiping direction of the user's gesture, but it is not limited to this.

以上各手勢模式的相關細節進一步說明如下。為便於說明，以下假設行動裝置100預設為處於靜態手勢模式，但本發明可不限於此。在靜態手勢模式中，處理器104可控制取像電路102取得多個第三影像，並辨識出現於前述第三影像中的多個第三手勢及各第三手勢在對應的第三影像中的手部位置。The relevant details of the above gesture modes are further explained as follows. For ease of description, it is assumed that the mobile device 100 is preset to be in the static gesture mode, but the present invention is not limited to this. In the static gesture mode, the processor 104 can control the imaging circuit 102 to obtain a plurality of third images, and recognize the plurality of third gestures appearing in the aforementioned third image and the position of each third gesture in the corresponding third image. Hand position.

在一實施例中，反應於判定前述第三手勢皆對應於第一預設手勢（例如圖3的手勢1），且各第三影像中的手部位置（例如指尖位置）呈現的移動情形符合第一預設條件，處理器104可將行動裝置100從靜態手勢模式切換為滑鼠手勢模式。在一實施例中，處理器104可取得第一預設手勢的指尖在連續的10張第三影像中的位置，並依時間取得這些位置兩兩之間的位移值（共9個）。若這些位移值中有5個超過一閾值，則處理器104可判定各第三影像中的手部位置呈現的移動情形符合第一預設條件，進而將行動裝置100從靜態手勢模式切換為滑鼠手勢模式。In one embodiment, it is reflected in determining that the aforementioned third gesture corresponds to the first preset gesture (such as gesture 1 in FIG. 3), and the movement of the hand position (such as the fingertip position) in each third image When the first preset condition is met, the processor 104 can switch the mobile device 100 from the static gesture mode to the mouse gesture mode. In one embodiment, the processor 104 may obtain the positions of the fingertips of the first preset gesture in 10 consecutive third images, and obtain the displacement values between these positions (9 in total) according to time. If five of these displacement values exceed a threshold, the processor 104 can determine that the movement of the hand position in each third image meets the first preset condition, and then switch the mobile device 100 from the static gesture mode to the sliding mode. Mouse gesture mode.

在滑鼠手勢模式中，處理器104可控制取像電路102取得多個第四影像，並辨識出現於前述第四影像中的多個第四手勢及各第四手勢在對應的第四影像中的手部位置。之後，處理器104可依據各第四手勢在對應的第四影像中的手部位置（例如指尖位置）在行動裝置上呈現移動軌跡。In the mouse gesture mode, the processor 104 can control the imaging circuit 102 to obtain multiple fourth images, and recognize multiple fourth gestures appearing in the aforementioned fourth image and each fourth gesture in the corresponding fourth image Position of the hand. After that, the processor 104 may present a movement track on the mobile device according to the hand position (for example, the fingertip position) of each fourth gesture in the corresponding fourth image.

此外，在滑鼠手勢模式中，若處理器104判定偵測到對應於第二預設手勢（例如圖3中的手勢7）的第四手勢，則處理器104可將行動裝置100從滑鼠手勢模式切換為靜態手勢模式。In addition, in the mouse gesture mode, if the processor 104 determines that a fourth gesture corresponding to a second preset gesture (such as gesture 7 in FIG. 3) is detected, the processor 104 can remove the mobile device 100 from the mouse The gesture mode is switched to static gesture mode.

在一實施例中，在靜態手勢模式中，反應於處理器104判定前述第三手勢中連續的多個手勢皆對應於一第三預設手勢（例如圖3的手勢5），且前述手勢對應的手部位置呈現的移動情形符合一第二預設條件，將行動裝置100從靜態手勢模式切換為一動態手勢模式。在一實施例中，若處理器104偵測到連續10張對應於手勢5的影像，且手勢5的位置在此10張影像中皆無大幅變化，則處理器104可判定上述手勢對應的手部位置呈現的移動情形符合第二預設條件，進而將行動裝置100從靜態手勢模式切換為動態手勢模式。In one embodiment, in the static gesture mode, the response is that the processor 104 determines that the consecutive multiple gestures in the third gesture correspond to a third preset gesture (such as gesture 5 in FIG. 3), and the gesture corresponds to The movement situation presented by the hand position meets a second preset condition, which switches the mobile device 100 from a static gesture mode to a dynamic gesture mode. In one embodiment, if the processor 104 detects 10 consecutive images corresponding to gesture 5, and the position of gesture 5 does not change significantly in these 10 images, the processor 104 can determine the hand corresponding to the gesture The movement situation presented by the location meets the second preset condition, and the mobile device 100 is switched from the static gesture mode to the dynamic gesture mode.

在動態手勢模式中，處理器104可控制取像電路102取得多個第五影像，並辨識出現於前述第五影像中的多個第五手勢及各第五手勢在對應的第五影像中的手部位置。之後，處理器104可依據各第五手勢在對應的第五影像中的手部位置的移動方向切換行動裝置的顯示頁面。In the dynamic gesture mode, the processor 104 can control the imaging circuit 102 to obtain a plurality of fifth images, and recognize the plurality of fifth gestures appearing in the aforementioned fifth image and the position of each fifth gesture in the corresponding fifth image. Hand position. After that, the processor 104 may switch the display page of the mobile device according to the movement direction of the hand position in the corresponding fifth image of each fifth gesture.

在一實施例中，在動態手勢模式中，處理器104可在判定使用者手勢的揮動方向之後，相應地回傳下、上、左、右等四種方向資料。並且，當手勢在連續多個第五影像中出現大幅度移動，處理器104可相應地切換行動裝置100所顯示的頁面。在一實施例中，當手勢的移動幅度達到螢幕的長、寬的三分之一時，可判定為手勢已出現大幅度移動，但可不限於此。在此情況下，假設螢幕長300、寬600，則當手勢在螢幕中揮動往左或往右超過200時即可觸發左揮或右揮，而上下揮動超過100時即可觸發上或下揮。In one embodiment, in the dynamic gesture mode, the processor 104 may return data of four directions including down, up, left, and right after determining the direction of the user's gesture. Moreover, when the gesture appears to move significantly in the consecutive fifth images, the processor 104 can switch the page displayed by the mobile device 100 accordingly. In one embodiment, when the movement range of the gesture reaches one third of the length and width of the screen, it can be determined that the gesture has moved significantly, but it is not limited to this. In this case, assuming the screen is 300 long and 600 wide, when the gesture is swiped to the left or right over 200 on the screen, the left or right swipe can be triggered, and the up or down swipe can be triggered when the gesture exceeds 100. .

此外，為了達成可以連續右揮而不是一直產生左右揮動，我們在每次揮動被觸發時都會有M幀停止偵測，讓我們可以完成如同翻書一樣的連續同方向揮動。In addition, in order to achieve continuous right swing instead of always producing left and right swings, we will stop the detection in M frames every time a swing is triggered, so that we can complete continuous swings in the same direction like turning a book.

請參照圖6，其是依據本發明之一實施例繪示的動態手勢模式的應用情境示意圖。在本實施例中，假設在時間點T1~T5分別偵測到影像611~616。如圖6所示，在時間點T4時，由影像614中手勢位置（即，座標(100, 150)）及影像611中的手勢位置（即，座標(100, 50)），影像614中手勢對應的變化量為(0, 100)。假設螢幕的寬度為300，則可知時間點T4的手勢已被大幅度（向右）移動，故處理器104可判定使用者已輸入右揮的手勢，並可相應地進行換頁等操作，但可不限於此。並且，在時間點T5及T6中，處理器104可停止辨識手勢，以實現連續揮動的效果。Please refer to FIG. 6, which is a schematic diagram of an application scenario of a dynamic gesture mode according to an embodiment of the present invention. In this embodiment, it is assumed that images 611 to 616 are detected at time points T1 to T5, respectively. As shown in FIG. 6, at time T4, from the position of the gesture in the image 614 (ie, the coordinates (100, 150)) and the position of the gesture in the image 611 (ie, the coordinates (100, 50)), the gesture in the image 614 The corresponding change is (0, 100). Assuming that the width of the screen is 300, it can be known that the gesture at time T4 has been moved significantly (to the right), so the processor 104 can determine that the user has input a right swipe gesture, and can perform operations such as page changing accordingly, but not Limited to this. In addition, at the time points T5 and T6, the processor 104 may stop recognizing gestures to achieve the effect of continuous waving.

此外，在一實施例中，若處理器104判定上述第五手勢之一對應於第二預設手勢，則處理器104可將行動裝置104從動態手勢模式切換為靜態手勢模式，但可不限於此。In addition, in an embodiment, if the processor 104 determines that one of the fifth gestures corresponds to the second preset gesture, the processor 104 can switch the mobile device 104 from the dynamic gesture mode to the static gesture mode, but it may not be limited to this .

請參照圖7，其是依據本發明之一實施例繪示的靜態/動態手勢模式的示意圖。在本實施例中，使用者例如可藉由輸入圖7左下角的三種手勢之一來進行所需的操作。例如，使用者可輸入圖3的手勢9來執行「選擇設備」的功能，或是藉由輸入圖3的手勢12來執行「呼叫專家」的功能。此外，使用者亦可藉由輸入手勢5（即，第三預設手勢）來讓處理器104相應地將行動裝置100切換為動態手勢模式。在此情況下，使用者即可藉由左、右揮動手勢來切換行動裝置100所顯示的設備頁面。Please refer to FIG. 7, which is a schematic diagram of a static/dynamic gesture mode according to an embodiment of the present invention. In this embodiment, the user can perform a desired operation by inputting one of the three gestures in the lower left corner of FIG. 7, for example. For example, the user can input the gesture 9 of FIG. 3 to perform the function of "selecting a device", or input the gesture 12 of FIG. 3 to perform the function of "calling an expert". In addition, the user can also input gesture 5 (ie, the third preset gesture) to enable the processor 104 to switch the mobile device 100 to the dynamic gesture mode accordingly. In this case, the user can switch the device page displayed by the mobile device 100 by swiping left and right gestures.

請參照圖8，其是依據本發明之一實施例繪示的滑鼠手勢模式示意圖。如圖8所示，在滑鼠手勢模式中，行動裝置100可依據使用者手勢的移動而相應地在所顯示的畫面800上顯示對應的移動軌跡810。Please refer to FIG. 8, which is a schematic diagram illustrating a mouse gesture mode according to an embodiment of the present invention. As shown in FIG. 8, in the mouse gesture mode, the mobile device 100 can display a corresponding movement track 810 on the displayed screen 800 according to the movement of the user's gesture.

綜上所述，本發明透過多層感知器神經網路與光源強度偵測技術發展一套適應各種光源環境來達成靜動態手勢控制的方法並可提高辨識手勢控制正確率與成功率。利用手和手指的姿勢及追蹤手指位置與虛擬裝置互動來取代電腦滑鼠的控制，讓人機互動應用更為直覺與方便，來取代實體滑鼠、鍵盤等傳統控制設備。In summary, the present invention develops a method for achieving static and dynamic gesture control by adapting to various light source environments through the multilayer perceptron neural network and light source intensity detection technology, and can improve the accuracy and success rate of recognition gesture control. Using hand and finger gestures and tracking finger positions to interact with virtual devices to replace computer mouse control, human-computer interaction applications are more intuitive and convenient, replacing traditional control devices such as physical mice and keyboards.

並且，本發明至少具有以下特點：（1）可讓使用者不需要實體裝置以手勢方式來達成不同裝置間的切換與互動；（2）利用多層感知器神經網路，透過深度卷積神經模型自動學習手部的影像特徵，偵測出影像中手部、手部姿勢及辨識追蹤其位置跟位移方向，並且利用追蹤手指及辨識手勢的技術與虛擬裝置互動；（3）透過對環境光源強度偵測載入適合的手勢偵測模型，有效提升手勢辨識正確率與手勢控制成功率。In addition, the present invention has at least the following features: (1) It allows users to switch and interact between different devices without the need for a physical device to use gestures; (2) Using a multilayer perceptron neural network, through a deep convolutional neural model Automatically learn the image characteristics of the hand, detect the hand and hand posture in the image, identify and track its position and displacement direction, and use the technology of tracking fingers and recognizing gestures to interact with the virtual device; (3) Through the intensity of the ambient light source Detect and load a suitable gesture detection model to effectively improve the accuracy of gesture recognition and the success rate of gesture control.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention shall be determined by the scope of the attached patent application.

100:行動裝置 102:取像電路 104:處理器 611~616:影像 800:畫面 810:移動軌跡 S210~S240、S411~S413:步驟 T1~T6:時間點100: mobile device 102: Acquisition circuit 104: processor 611~616: Video 800: screen 810: Movement Track S210~S240, S411~S413: steps T1~T6: time point

圖1是依據本發明之一實施例繪示的行動裝置示意圖。圖2是依據本發明之一實施例繪示的手勢辨識方法流程圖。圖3是依據本發明之一實施例繪示的各式手勢示意圖。圖4是依據本發明之一實施例繪示的適應性替換手勢辨識模型的流程圖。圖5是依據本發明之一實施例繪示的應用情境圖。圖6是依據本發明之一實施例繪示的動態手勢模式的應用情境示意圖。圖7是依據本發明之一實施例繪示的靜態/動態手勢模式的示意圖。圖8是依據本發明之一實施例繪示的滑鼠手勢模式示意圖。 FIG. 1 is a schematic diagram of a mobile device according to an embodiment of the present invention. Fig. 2 is a flowchart of a gesture recognition method according to an embodiment of the present invention. FIG. 3 is a schematic diagram illustrating various gestures according to an embodiment of the present invention. 4 is a flowchart of an adaptive replacement gesture recognition model according to an embodiment of the present invention. Fig. 5 is an application scenario diagram drawn according to an embodiment of the present invention. Fig. 6 is a schematic diagram of an application scenario of a dynamic gesture mode according to an embodiment of the present invention. FIG. 7 is a schematic diagram of a static/dynamic gesture mode according to an embodiment of the invention. FIG. 8 is a schematic diagram illustrating a mouse gesture mode according to an embodiment of the invention.

S210~S240:步驟 S210~S240: steps

Claims

A gesture recognition method, suitable for a mobile device, includes: Obtain a first image, and determine a first ambient light intensity based on the first image; Finding a first brightness level to which the first ambient light intensity belongs from a plurality of preset brightness levels, wherein the preset brightness levels respectively correspond to a plurality of gesture recognition models; Selecting a first gesture recognition model corresponding to the first brightness level from the gesture recognition models; The first gesture recognition model is used to recognize a first gesture appearing in the first image.

For the method described in item 1 of the scope of patent application, each of the gesture recognition models is a convolutional network model, which includes N convolutional layers and 1 output layer, where N is a positive integer less than or equal to 15.

The method described in item 2 of the scope of patent application, wherein N is less than or equal to 12.

The method according to item 1 of the scope of patent application, wherein each of the preset brightness levels corresponds to a brightness range, and the brightness ranges corresponding to each of the preset brightness levels do not overlap with each other, and reflect the determination of the first ambient light intensity If it falls within the brightness range corresponding to the first brightness level, it is determined that the first ambient light intensity belongs to the first brightness level.

The method described in item 1 of the scope of patent application further includes: Acquiring a plurality of second images, and determining a plurality of second ambient light intensities corresponding to the second images; Using the first gesture recognition model to recognize a plurality of second gestures appearing in the second images, wherein the second gestures respectively correspond to a plurality of recognition confidence values; Performing a first K-means algorithm on the identification confidence values to classify the identification confidence values into multiple confidence value groups; Obtain a confidence reference value in each confidence value group, and estimate a confidence average corresponding to the confidence value group accordingly; In response to determining that the confidence average value is higher than a confidence threshold, it is determined that the first gesture recognition model does not need to be replaced.

For the method described in item 5 of the scope of patent application, wherein the second images respectively correspond to the second ambient light intensities, and are reflected in determining that the confidence average value is not higher than the confidence threshold, the method further includes : Performing a second K-means algorithm on the second ambient light intensities corresponding to the second images, so as to divide the second ambient light intensities into multiple light intensity groups; Obtain a light intensity reference value in each light intensity group, and estimate an average light intensity corresponding to the light intensity groups accordingly; In response to determining that the average light intensity is lower than a light intensity threshold, it is determined to stop performing a gesture recognition operation.

The method described in item 6 of the scope of patent application, wherein in response to determining that the average light intensity is not lower than the light intensity threshold, the method further includes: Judging whether the average light intensity belongs to the first brightness level; In response to determining that the average light intensity belongs to the first brightness level, it is determined that the first gesture recognition model does not need to be replaced.

For the method described in item 7 of the scope of patent application, wherein in response to determining that the average light intensity does not belong to the first brightness level, the method further includes: Find a second brightness level to which the average light intensity belongs from the preset brightness levels, and select a second gesture recognition model corresponding to the second brightness level from the gesture recognition models; Replace the first gesture recognition model with the second gesture recognition model, and continue to perform the gesture recognition operation.

The method described in claim 1, wherein the mobile device operates in a static gesture mode, and the method further includes: Acquiring a plurality of third images, and identifying a plurality of third gestures appearing in the third images and the hand position of each third gesture in the corresponding third image; In response to determining that the third gestures correspond to a first preset gesture, and the movement of the hand position in each of the third images meets a first preset condition, the mobile device is changed from the static gesture The mode is switched to a mouse gesture mode.

The method described in item 9 of the scope of patent application includes: Acquiring a plurality of fourth images, and identifying a plurality of fourth gestures appearing in the fourth images and the hand position of each fourth gesture in the corresponding fourth image; A movement track is presented on the mobile device according to the hand position of each fourth gesture in the corresponding fourth image.

The method described in item 10 of the scope of patent application further includes: In response to determining that one of the fourth gestures corresponds to a second preset gesture, the mobile device is switched from the mouse gesture mode to the static gesture mode.

The method described in item 9 of the scope of patent application includes: In response to determining that the consecutive multiple gestures in the third gestures all correspond to a third preset gesture, and the movement of the hand positions corresponding to the gestures meets a second preset condition, the mobile device Switch from the static gesture mode to a dynamic gesture mode.

The method described in item 12 of the scope of patent application further includes: Acquiring a plurality of fifth images, and identifying a plurality of fifth gestures appearing in the fifth images and the hand position of each fifth gesture in the corresponding fifth image; Switching a display page of the mobile device according to the movement direction of the hand position in the corresponding fifth image of each of the fifth gestures.

The method described in item 13 of the scope of patent application further includes: In response to determining that one of the fifth gestures corresponds to a second preset gesture, the mobile device is switched from the dynamic gesture mode to the static gesture mode.

A mobile device including: An imaging circuit; and A processor coupled to the image capturing circuit and configured to: Controlling the image capturing circuit to obtain a first image, and determining a first ambient light intensity based on the first image; Finding a first brightness level to which the first ambient light intensity belongs from a plurality of preset brightness levels, wherein the preset brightness levels respectively correspond to a plurality of gesture recognition models; Selecting a first gesture recognition model corresponding to the first brightness level from the gesture recognition models; The first gesture recognition model is used to recognize a first gesture appearing in the first image.