[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2021169642A1 - Video-based eyeball turning determination method and system - Google Patents

Video-based eyeball turning determination method and system Download PDF

Info

Publication number
WO2021169642A1
WO2021169642A1 PCT/CN2021/071261 CN2021071261W WO2021169642A1 WO 2021169642 A1 WO2021169642 A1 WO 2021169642A1 CN 2021071261 W CN2021071261 W CN 2021071261W WO 2021169642 A1 WO2021169642 A1 WO 2021169642A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
eyeball
video
matrix
turning
Prior art date
Application number
PCT/CN2021/071261
Other languages
French (fr)
Chinese (zh)
Inventor
卢宁
徐国强
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021169642A1 publication Critical patent/WO2021169642A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements

Definitions

  • the embodiments of the present application relate to the field of computer vision technology, and in particular to a method and system for determining eyeball rotation based on video.
  • PCCR pupil center corneal reflection
  • the inventor realizes that the current vision applications in artificial intelligence are still mainly image processing, or video is decomposed into pictures frame by frame, which is essentially an application based on a single frame of image. It does not correlate the relationship between videos, and cannot reflect the relevance and continuity between pictures. For eye tracking, the accuracy is not enough.
  • the purpose of the embodiments of the present application is to provide a video-based method and system for determining eyeball turning, which improves the accuracy of eyeball tracking.
  • an embodiment of the present application provides a video-based method for determining eyeball rotation, including:
  • Target video is a video of the target user watching the target product
  • the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
  • the frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
  • the eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
  • an embodiment of the present application also provides a video-based eyeball turning determination system, including:
  • An obtaining module configured to obtain a target video, where the target video is a video of a target user watching a target product;
  • An annotation module configured to annotate the eyeball features of the target video to obtain the annotated video
  • An input module for inputting the annotated video into an eyeball turning feature recognition model wherein the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
  • a conversion module configured to convert each frame of the annotated video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of the image to the frame relationship processing layer;
  • the feature sorting module is used for the frame relation processing layer to sort the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and input the feature queue to the eyeball turning Action recognition layer;
  • the feature fusion and output module is used for the eye turning action recognition layer to perform feature fusion on the feature queue to obtain an eye turning feature queue matrix, and to determine the target turning angle of the target user based on the eye turning feature queue matrix.
  • an embodiment of the present application also provides a computer device, the computer device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the computer program is When the processor is executed, the video-based eye-turn determination method as described above is implemented, and the video-based eye-turn determination method includes the following steps:
  • Target video is a video of the target user watching the target product
  • the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
  • the frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
  • the eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
  • an embodiment of the present application also provides a computer-readable storage medium, and a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor to enable the At least one processor executes the video-based method for determining eyeball rotation as described above, and the method for determining eyeball rotation based on video includes the following steps:
  • Target video is a video of the target user watching the target product
  • the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
  • the frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
  • the eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
  • the embodiment of the application obtains the annotated video by annotating the target video, inputs the annotated video to the eye-turn feature recognition model to obtain the eye-turn feature queue matrix, and then obtains the target steering angle of the corresponding target user based on the eye-turn feature queue matrix, thereby improving The accuracy of eye tracking is improved.
  • FIG. 1 is a flowchart of Embodiment 1 of a method for determining eyeball rotation based on video in this application.
  • Fig. 2 is a flowchart of step S102 in Fig. 1 of an embodiment of the application.
  • Fig. 3 is a flowchart of step S106 in Fig. 1 of the embodiment of the application.
  • Fig. 4 is a flowchart of step S110 in Fig. 1 of an embodiment of the application.
  • FIG. 5 is a flowchart of step S110A in FIG. 4 according to an embodiment of the application.
  • Fig. 6 is a flowchart of another embodiment of step S110 in Fig. 1 of the embodiment of the application.
  • FIG. 7 is a flowchart of step S111 and step S112 in the first embodiment of this application.
  • FIG. 8 is a schematic diagram of the program modules of the second embodiment of the video-based eyeball turning determination system of this application.
  • FIG. 9 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of this application.
  • the technical solution of this application can be applied to the fields of artificial intelligence, smart city and/or blockchain technology.
  • the data involved in this application such as video, feature queue matrix, and/or steering angle, can be stored in a database, or can be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application .
  • FIG. 1 there is shown a flow chart of the method for determining eyeball rotation based on video in the first embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps.
  • the following exemplarily describes the computer device 2 as the execution subject. details as follows.
  • Step S100 Obtain a target video, where the target video is a video of a target user watching a target product.
  • the process of the target user watching the target product is captured by the camera to obtain the target video, and the target video is transmitted to the computer device 2 for processing.
  • Step S102 Perform eye feature annotation on the target video to obtain an annotation video.
  • each frame of the target video is processed by image segmentation, object detection, image annotation, etc., to obtain annotated video.
  • step S102 further includes:
  • Step S102A Identify the eyeball feature of each frame of image in the target video.
  • the eyeball feature of each frame of image in the target video is identified.
  • step S102B the area where the eyeball feature is located is selected by the marking frame to obtain the marked video.
  • the area corresponding to the eyeball key points of each frame of video is selected through the marking frame to obtain the marked video. And mark the eyeball direction to obtain the eyeball turning movement area in the target video.
  • Step S104 Input the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer.
  • the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer.
  • the eyeball turning feature recognition model is a pre-trained model, which is used to analyze the annotated video and obtain the eyeball turning feature queue matrix. Pre-training the eyeball turning feature recognition model based on the deep learning network model:
  • Feature extraction methods include, but are not limited to, facial feature extraction algorithms based on deep neural networks and eye turn feature extraction algorithms based on geometric features.
  • the eyeball feature extraction layer is used to extract the eyeball feature of the target user from each frame of the target video, and convert the eyeball feature into a feature matrix;
  • the frame relationship processing layer is configured to determine the frame relationship between images with eyeball features in each frame according to the video time point of each frame of the target video;
  • the eye turning action recognition layer is used to determine the eye turning feature queue matrix of the target user according to the frame relationship and the feature matrix.
  • Step S106 Convert each frame of the annotated video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of image to the frame relationship processing layer.
  • the eyeball feature extraction layer splits the target video into each frame of image, and extracts the eyeball turning from each frame of image, and obtains the corresponding feature of each frame of image.
  • the eyeball feature is composed of multiple key points, which can be a feature matrix composed of 128 or 512 key points.
  • step S106 further includes:
  • Step S106A Determine the eyeball key points of each frame of the annotated video, where the eyeball key points include 128 key points or 256 key points.
  • the eyeball feature extraction layer splits the annotated video into each frame of image, and extracts the eyeball turning feature from each frame of image, and obtains the feature matrix corresponding to each frame of image.
  • the eyeball feature is composed of multiple eyeball key points, which can be 128 or 512 key points.
  • Step S106B Obtain the pixel point coordinates of the key eyeball points of each frame of image.
  • each key point of the eyeball is obtained, and each frame of image is firstly grayed out to obtain a two-dimensional gray-scale image, which is then converted into two-dimensional coordinates.
  • step S106C a feature matrix is established according to the eyeball key points of each frame of image, and the feature matrix includes 128 or 256 pixel coordinates.
  • the pixel coordinates are sorted to obtain a feature matrix in the form of 128 rows or 256 rows and 2 columns.
  • Step S108 the frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning action recognition layer .
  • the frame relationship processing layer calculates the corresponding feature matrix of adjacent video time points to determine whether to process the frame image.
  • the frame relationship processing layer performs differential operations on two adjacent frames of images to obtain differential image features, and obtains the movement route of the eyeball turning through differential image feature analysis, that is, when the differential image features of adjacent two frames of images change from change to remain unchanged , Means that the eyeballs are turning to complete the turning movement at this time; when the difference image characteristics of the adjacent two frames of images change from unchanged to changing, it means that the eyeballs start to move the eyeballs at this time, and the feature queue at this time is obtained.
  • the feature matrix of each frame of image is arranged in the order of the video time point to obtain a feature queue, which is convenient for subsequent calculations.
  • the feature queue is regarded as the frame relationship between the corresponding features of each frame of image.
  • step S110 the eye-turning action recognition layer performs feature fusion on the feature queue to obtain an eye-turn feature queue matrix, and determines the target steering angle of the target user based on the eye-turn feature queue matrix.
  • the eyeball turns to the feature layer to perform duplicate check processing on the feature queue, delete the same features in the queue to obtain target feature queues with different eyeball features, and combine the arrays of the target feature queues in chronological order to obtain eyeballs Turn to the feature queue matrix.
  • step S110 further includes:
  • Step S110A Calculate the difference image features of adjacent frame images to determine whether the eyeball features corresponding to the adjacent frame images are the same.
  • the difference between the feature matrices of adjacent frame images is calculated through a difference operation to obtain the feature of the difference image.
  • the difference image characteristics of two adjacent frames of images change from change to remain unchanged, it means that the eyeballs are turning to complete the turning movement; when the difference image characteristics of the adjacent two frames of images change from unchanged to change, it means that the eyeballs start at this time Perform eye-turning movements.
  • step S110B if they are the same, the feature matrix corresponding to one frame of the image is retained, and another identical feature matrix is deleted from the feature queue until the feature matrixes in the feature queue are all different, and the target feature queue is obtained.
  • the feature matrix corresponding to the eyeball feature of one frame is retained, and the retained feature can be the latter or the former. If the retained eye feature is the last identical eye feature, it indicates that the feature queue includes the turning time; if the retained eye feature is not the last identical feature, it indicates that the feature queue does not include the turning time.
  • the feature matrices in the feature queue are all different, that is, the eyeball turns are all different, it means that the multi-frame image corresponding to the feature queue is the eye movement area.
  • Step S110C Combine the feature matrices in the target feature queue to obtain the eyeball turning feature queue matrix.
  • the target feature queue includes a target steering angle, and the target steering angle corresponds to the feature queue.
  • the feature matrix of the feature queue is combined in chronological order to obtain the eye-turning feature queue matrix.
  • step S110A further includes:
  • step S110A1 the pixel coordinates of the adjacent frame image are obtained.
  • the current first frame of image is set to F_k(x, y)
  • the second frame of image is set to F_(k-1)(x, y)
  • (x, y) is the pixel point coordinates in each frame of image.
  • Step S110A2 Perform a difference operation on the pixel coordinates of the adjacent frame images to obtain a difference image feature.
  • D_k(x,y)
  • step S110A3 the difference image feature is compared with a preset binarization threshold to determine whether the eyeball features corresponding to the target image in adjacent frames are the same.
  • the difference image feature is compared with a preset binarization threshold by the formula
  • step S110 further includes:
  • Step S1101 using the center position of the eyeball of the target user as the origin, and mark the position of the product in the target video with coordinates.
  • each product is marked with the center position of the target user’s eyeball as the origin, so that the target steering angle calculated based on the eyeball turning feature queue matrix is mapped to Target product.
  • the angle of each product relative to the center position of the eyeball can also be calculated according to the coordinates of the product.
  • Step S1102 Calculate the matrix value of the eyeball turning feature queue matrix to obtain the target turning angle.
  • the matrix value of the eyeball turning characteristic queue matrix corresponding to the target characteristic queue is calculated to obtain the target steering angle of the target user, and the target steering angle is corresponding to the coordinates to obtain the target product. If the position corresponding to the target steering angle is deviated, the product closest to the target steering angle is selected as the target product.
  • the method further includes:
  • Step S111 Obtain a video time point corresponding to the eye turn feature queue matrix.
  • the video time points corresponding to the multiple frame images corresponding to the eye turn feature queue matrix are acquired. Since the frame relationship processing layer has been obtained, time stamping may be performed for acquisition.
  • Step S112 Calculate the distance between the video time point corresponding to the first feature matrix in the eye turning feature queue matrix and the video time point corresponding to the last feature matrix as the eye turning time.
  • the time of the last frame of image is subtracted from the time of the first frame of image to obtain the target turning time of the target product.
  • the reciprocal of the target turning time is the degree of interest. The longer the turning time, the smaller the reciprocal, the greater the degree of interest.
  • FIG. 8 shows a schematic diagram of the program modules of the second embodiment of the video-based eyeball turning determination system of the present application.
  • the video-based eyeball turning determination system 20 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors.
  • the program module referred to in the embodiments of the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable than the program itself to describe the execution process of the video-based eyeball turning determination system 20 in the storage medium. The following description will specifically introduce the functions of each program module in this embodiment:
  • the first obtaining module 200 is configured to obtain a target video, where the target video is a video of a target user watching a target product.
  • the process of the target user watching the target product is captured by the camera to obtain the target video, and the target video is transmitted to the computer device 2 for processing.
  • the tagging module 202 is used to tag the target video with eyeball features to obtain the tagged video.
  • the labeling module 202 is further used for:
  • the eyeball feature of each frame of image in the target video is identified.
  • the area where the eyeball feature is located is selected by the marking frame to obtain the marked video.
  • the area corresponding to the eyeball key points of each frame of video is selected through the marking frame to obtain the marked video. And mark the eyeball direction to obtain the eyeball turning movement area in the target video.
  • the input module 204 is configured to input the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer.
  • the eyeball turning feature recognition model is a pre-trained model, which is used to analyze the annotated video and obtain the eyeball turning feature queue matrix. Pre-training the eyeball turning feature recognition model based on the deep learning network model:
  • Feature extraction methods include, but are not limited to, facial feature extraction algorithms based on deep neural networks and eye turn feature extraction algorithms based on geometric features.
  • the eyeball feature extraction layer is used to extract the eyeball feature of the target user from each frame of the target video, and convert the eyeball feature into a feature matrix;
  • the frame relationship processing layer is configured to determine the frame relationship between images with eyeball features in each frame according to the video time point of each frame of the target video;
  • the eye turning action recognition layer is used to determine the eye turning feature queue matrix of the target user according to the frame relationship and the feature matrix.
  • the conversion module 206 is configured to convert each frame of image of the annotated video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of image to the frame relationship processing layer.
  • the eyeball feature extraction layer splits the target video into each frame of image, and extracts the eyeball turning from each frame of image, and obtains the corresponding feature of each frame of image.
  • the eyeball feature is composed of multiple key points, which can be a feature matrix composed of 128 or 512 key points.
  • the conversion module 206 is also used for:
  • the eyeball key points of each frame of the image of the annotation video are determined, where the eyeball key points include 128 key points or 256 key points.
  • the eyeball feature extraction layer splits the annotated video into each frame of image, and extracts the eyeball turning feature from each frame of image, and obtains the feature matrix corresponding to each frame of image.
  • the eyeball feature is composed of multiple eyeball key points, which can be 128 or 512 key points.
  • each key point of the eyeball is obtained, and each frame of image is firstly grayed out to obtain a two-dimensional gray-scale image, which is then converted into two-dimensional coordinates.
  • a feature matrix is established according to the eyeball key points of each frame of image, and the feature matrix includes 128 or 256 pixel coordinates.
  • the pixel coordinates are sorted to obtain a feature matrix in the form of 128 rows or 256 rows and 2 columns.
  • the feature sorting module 208 is used for the frame relation processing layer to sort the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and input the feature queue to the eyeball Turn to the action recognition layer.
  • the frame relationship processing layer calculates the corresponding feature matrix of adjacent video time points to determine whether to process the frame image.
  • the frame relationship processing layer performs differential operations on two adjacent frames of images to obtain differential image features, and obtains the movement route of the eyeball turning through differential image feature analysis, that is, when the differential image features of adjacent two frames of images change from change to remain unchanged , Means that the eyeballs are turning to complete the turning movement at this time; when the difference image characteristics of the adjacent two frames of images change from unchanged to changing, it means that the eyeballs start to move the eyeballs at this time, and the feature queue at this time is obtained.
  • the feature matrix of each frame of image is arranged in the order of the video time point to obtain a feature queue, which is convenient for subsequent calculations.
  • the feature queue is regarded as the frame relationship between the corresponding features of each frame of image.
  • the feature fusion and output module 210 is used for the eye turning action recognition layer to perform feature fusion on the feature queue to obtain an eye turning feature queue matrix, and to determine the target turning angle of the target user based on the eye turning feature queue matrix .
  • the eyeball turns to the feature layer to perform duplicate check processing on the feature queue, delete the same features in the queue to obtain target feature queues with different eyeball features, and combine the arrays of the target feature queues in chronological order to obtain eyeballs Turn to the feature queue matrix.
  • the feature fusion and output module 210 is also used for:
  • the difference between the feature matrices of adjacent frame images is calculated through a difference operation to obtain the feature of the difference image.
  • the difference image characteristics of two adjacent frames of images change from change to remain unchanged, it means that the eyeballs are turning to complete the turning movement; when the difference image characteristics of the adjacent two frames of images change from unchanged to change, it means that the eyeballs start at this time Perform eye-turning movements.
  • the feature matrix corresponding to one frame of the image is retained, and another identical feature matrix is deleted from the feature queue until the feature matrices in the feature queue are all different, and the target feature queue is obtained.
  • one eyeball feature is retained, and the retained feature can be the latter or the former. If the retained eye feature is the last identical eye feature, it indicates that the feature queue includes the turning time; if the retained eye feature is not the last identical feature, it indicates that the feature queue does not include the turning time.
  • the feature matrices in the target feature queue are all different, that is, the turning of the eyeballs are all different, it means that the multiple frames of images corresponding to the target feature queue are eye movement regions.
  • the target feature queue includes a target steering angle, and the target steering angle corresponds to the feature queue.
  • the feature matrix of the feature queue is combined in chronological order to obtain the eye-turning feature queue matrix.
  • the feature fusion and output module 210 is further configured to:
  • the current first frame of image is set to F_k(x, y)
  • the second frame of image is set to F_(k-1)(x, y)
  • (x, y) is the pixel point coordinates in each frame of image.
  • D_k(x,y)
  • the difference image feature is compared with a preset binarization threshold to determine whether the eyeball features corresponding to the target image in adjacent frames are the same.
  • the difference image feature is compared with a preset binarization threshold by the formula
  • the feature fusion and output module 210 is also used for:
  • the position of the product in the target video is coordinated.
  • each product is marked with the center position of the target user’s eyeball as the origin, so that the target steering angle calculated based on the eyeball turning feature queue matrix is mapped to Target product.
  • the angle of each product relative to the center position of the eyeball can also be calculated according to the coordinates of the product.
  • the matrix value of the eyeball turning characteristic queue matrix corresponding to the target characteristic queue is calculated to obtain the target steering angle of the target user, and the target steering angle is corresponding to the coordinates to obtain the target product. If the position corresponding to the target steering angle is deviated, the product closest to the target steering angle is selected as the target product.
  • the computer device 2 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • the computer device 2 may be a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers).
  • the computer device 2 at least includes, but is not limited to, a memory and a processor.
  • the computer device 2 may also include a network interface and/or a video-based eyeball turning determination system.
  • the computer device 2 may include a memory 21, a processor 22, a network interface 23, and a video-based eye turn determination system 20.
  • the memory 21, a processor 22, a network interface 23, and a video-based system bus can communicate with each other through a system bus.
  • the eyeballs are turned to determine the system 20. in:
  • the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 21 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2.
  • the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SMC) equipped on the computer device 2. SD) card, flash card (Flash Card), etc.
  • the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the computer device 2, such as the program code of the video-based eyeball turning determination system 20 in the second embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 22 is generally used to control the overall operation of the computer device 2.
  • the processor 22 is configured to run the program code or process data stored in the memory 21, for example, to run the video-based eye-turn determination system 20, so as to implement the video-based eye-turn determination method of the first embodiment.
  • the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the server 2 and other electronic devices.
  • the network interface 23 is used to connect the server 2 to an external terminal through a network, and to establish a data transmission channel and a communication connection between the server 2 and the external terminal.
  • the network may be Intranet, Internet, Global System of Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 9 only shows the computer device 2 with components 20-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the video-based eyeball turning determination system 20 stored in the memory 21 may also be divided into one or more program modules, and the one or more program modules are stored in the memory 21, and It is executed by one or more processors (the processor 22 in this embodiment) to complete the application.
  • FIG. 8 shows a schematic diagram of a program module implementing the second embodiment of the video-based eye-turn determination system 20.
  • the video-based eye-turn determination system 20 can be divided into an acquisition module 200, an annotation Module 202, input module 204, transformation module 206, feature ranking module 208, and feature fusion and output module 210.
  • the program module referred to in the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable than a program to describe the execution process of the video-based eyeball turning determination system 20 in the computer device 2.
  • the specific functions of the program modules 200-210 have been described in detail in the second embodiment, and will not be repeated here.
  • This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Readable memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App application malls, etc., on which computer programs are stored, The corresponding function is realized when the program is executed by the processor.
  • the computer-readable storage medium of this embodiment is used to store the video-based eyeball turning determination system 20, and when executed by a processor, realizes the video-based eyeball turning determination method of the first embodiment.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Ophthalmology & Optometry (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A video-based eyeball turning determination method and system. The method comprises: acquiring a target video, wherein the target video is a video of a target user viewing a target product (S100); inputting the target video into an eyeball turning feature recognition model to obtain an eyeball turning feature queue matrix; and on the basis of the eyeball turning feature queue matrix, determining a target turning angle of the target user. According to the method, the eyeball turning feature queue matrix is acquired by means of inputting the target video into the eyeball turning feature recognition model, and the target turning angle and target turning time of a corresponding target product are then acquired by means of the eyeball turning feature queue matrix, thereby improving the precision of eyeball tracking.

Description

基于视频的眼球转向确定方法与系统Video-based method and system for determining eyeball turning
本申请要求于2020年2月28日提交中国专利局、申请号为202010128432.6,发明名称为“基于眼球转向的感兴趣度方法与系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 28, 2020, the application number is 202010128432.6, and the invention title is "Method and System of Interest Based on Eyeball Turning", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请实施例涉及计算机视觉技术领域,尤其涉及一种基于视频的眼球转向确定方法与系统。The embodiments of the present application relate to the field of computer vision technology, and in particular to a method and system for determining eyeball rotation based on video.
背景技术Background technique
人眼追踪被长期运用于研究个体的视觉注意力,最常用的眼球跟踪技术是瞳孔中心角膜反射技术(pupil centre corneal reflection简称PCCR)。PCCR技术的原理是,通过物理追踪设备的摄像头捕捉,光源对瞳孔照射形成高度可见反射的图像,这些图像将被用于确定光源在角膜和瞳孔中的反射情况,最后通过对角膜、瞳孔反射形成向量夹角与其他几何特征计算,得出人眼注视的方向。但是该方案与光源的相关性大,干扰因素多,识别不准确。Eye tracking has long been used to study the visual attention of individuals. The most commonly used eye tracking technology is pupil center corneal reflection (PCCR). The principle of PCCR technology is to capture by the camera of the physical tracking device, and the light source illuminates the pupil to form highly visible reflection images. These images will be used to determine the reflection of the light source in the cornea and pupil, and finally formed by reflection on the cornea and pupil The vector included angle is calculated with other geometric features to get the direction of the human eye's gaze. However, this solution has a large correlation with the light source, many interference factors, and inaccurate recognition.
发明人意识到,目前在人工智能方面的视觉应用还是以图像处理为主,或者是视频分解成一帧一帧的图片,本质上还是以单帧图像为基础进行的应用。并没有对视频之间的关系进行关联,不能体现图片与图片间的相关性和连续性。针对眼球跟踪时,准确性不够。The inventor realizes that the current vision applications in artificial intelligence are still mainly image processing, or video is decomposed into pictures frame by frame, which is essentially an application based on a single frame of image. It does not correlate the relationship between videos, and cannot reflect the relevance and continuity between pictures. For eye tracking, the accuracy is not enough.
发明内容Summary of the invention
有鉴于此,本申请实施例的目的是提供一种基于视频的眼球转向确定方法与系统,提高了眼球追踪的准确度。In view of this, the purpose of the embodiments of the present application is to provide a video-based method and system for determining eyeball turning, which improves the accuracy of eyeball tracking.
为实现上述目的,本申请实施例提供了一种基于视频的眼球转向确定方法,包括:To achieve the foregoing objective, an embodiment of the present application provides a video-based method for determining eyeball rotation, including:
获取目标视频,所述目标视频为目标用户观看目标产品的视频;Acquiring a target video, where the target video is a video of the target user watching the target product;
对所述目标视频进行眼球特征标注,得到标注视频;Marking the target video with eyeball features to obtain a marked video;
将所述标注视频输入至眼球转向特征识别模型中,其中,所述眼球转向特征识别模型包括眼球特征提取层、帧关系处理层以及眼球转向动作识别层;Inputting the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,并将每帧图像对应的特征矩阵输入至所述帧关系处理层;Converting each frame of the annotated video into a feature matrix through the eye feature extraction layer, and inputting the feature matrix corresponding to each frame of image to the frame relationship processing layer;
所述帧关系处理层根据所述特征矩阵对应的视频时间点对所述每帧图像的特征矩阵进行排序,得到特征队列,并将所述特征队列输入至所述眼球转向动作识别层;The frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,并基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度。The eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
为实现上述目的,本申请实施例还提供了一种基于视频的眼球转向确定系统,包括:To achieve the foregoing objective, an embodiment of the present application also provides a video-based eyeball turning determination system, including:
获取模块,用于获取目标视频,所述目标视频为目标用户观看目标产品的视频;An obtaining module, configured to obtain a target video, where the target video is a video of a target user watching a target product;
标注模块,用于对所述目标视频进行眼球特征标注,得到标注视频;An annotation module, configured to annotate the eyeball features of the target video to obtain the annotated video;
输入模块,用于将所述标注视频输入至眼球转向特征识别模型中,其中,所述眼球转向特征识别模型包括眼球特征提取层、帧关系处理层以及眼球转向动作识别层;An input module for inputting the annotated video into an eyeball turning feature recognition model, wherein the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
转化模块,用于通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,并将每帧图像对应的特征矩阵输入至所述帧关系处理层;A conversion module, configured to convert each frame of the annotated video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of the image to the frame relationship processing layer;
特征排序模块,用于所述帧关系处理层根据所述特征矩阵对应的视频时间点对所述每帧图像的特征矩阵进行排序,得到特征队列,并将所述特征队列输入至所述眼球转向动作识别层;The feature sorting module is used for the frame relation processing layer to sort the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and input the feature queue to the eyeball turning Action recognition layer;
特征融合与输出模块,用于所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,并基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度。The feature fusion and output module is used for the eye turning action recognition layer to perform feature fusion on the feature queue to obtain an eye turning feature queue matrix, and to determine the target turning angle of the target user based on the eye turning feature queue matrix.
为实现上述目的,本申请实施例还提供了一种计算机设备,所述计算机设备包括存储 器、处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如上所述的基于视频的眼球转向确定方法,所述基于视频的眼球转向确定方法包括以下步骤:In order to achieve the foregoing objective, an embodiment of the present application also provides a computer device, the computer device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the computer program is When the processor is executed, the video-based eye-turn determination method as described above is implemented, and the video-based eye-turn determination method includes the following steps:
获取目标视频,所述目标视频为目标用户观看目标产品的视频;Acquiring a target video, where the target video is a video of the target user watching the target product;
对所述目标视频进行眼球特征标注,得到标注视频;Marking the target video with eyeball features to obtain a marked video;
将所述标注视频输入至眼球转向特征识别模型中,其中,所述眼球转向特征识别模型包括眼球特征提取层、帧关系处理层以及眼球转向动作识别层;Inputting the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,并将每帧图像对应的特征矩阵输入至所述帧关系处理层;Converting each frame of the annotated video into a feature matrix through the eye feature extraction layer, and inputting the feature matrix corresponding to each frame of image to the frame relationship processing layer;
所述帧关系处理层根据所述特征矩阵对应的视频时间点对所述每帧图像的特征矩阵进行排序,得到特征队列,并将所述特征队列输入至所述眼球转向动作识别层;The frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,并基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度。The eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
为实现上述目的,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行如上所述的基于视频的眼球转向确定方法,所述基于视频的眼球转向确定方法包括以下步骤:In order to achieve the foregoing objective, an embodiment of the present application also provides a computer-readable storage medium, and a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor to enable the At least one processor executes the video-based method for determining eyeball rotation as described above, and the method for determining eyeball rotation based on video includes the following steps:
获取目标视频,所述目标视频为目标用户观看目标产品的视频;Acquiring a target video, where the target video is a video of the target user watching the target product;
对所述目标视频进行眼球特征标注,得到标注视频;Marking the target video with eyeball features to obtain a marked video;
将所述标注视频输入至眼球转向特征识别模型中,其中,所述眼球转向特征识别模型包括眼球特征提取层、帧关系处理层以及眼球转向动作识别层;Inputting the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,并将每帧图像对应的特征矩阵输入至所述帧关系处理层;Converting each frame of the annotated video into a feature matrix through the eye feature extraction layer, and inputting the feature matrix corresponding to each frame of image to the frame relationship processing layer;
所述帧关系处理层根据所述特征矩阵对应的视频时间点对所述每帧图像的特征矩阵进行排序,得到特征队列,并将所述特征队列输入至所述眼球转向动作识别层;The frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,并基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度。The eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
本申请实施例通过对目标视频进行标注得到标注视频,将标注视频输入至眼球转向特征识别模型获取眼球转向特征队列矩阵,再基于眼球转向特征队列矩阵获取对应的目标用户的目标转向角度,因而提高了眼球追踪的准确度。The embodiment of the application obtains the annotated video by annotating the target video, inputs the annotated video to the eye-turn feature recognition model to obtain the eye-turn feature queue matrix, and then obtains the target steering angle of the corresponding target user based on the eye-turn feature queue matrix, thereby improving The accuracy of eye tracking is improved.
附图说明Description of the drawings
图1为本申请基于视频的眼球转向确定方法实施例一的流程图。FIG. 1 is a flowchart of Embodiment 1 of a method for determining eyeball rotation based on video in this application.
图2为本申请实施例图1中步骤S102的流程图。Fig. 2 is a flowchart of step S102 in Fig. 1 of an embodiment of the application.
图3为本申请实施例图1中步骤S106的流程图。Fig. 3 is a flowchart of step S106 in Fig. 1 of the embodiment of the application.
图4为本申请实施例图1中步骤S110的流程图。Fig. 4 is a flowchart of step S110 in Fig. 1 of an embodiment of the application.
图5为本申请实施例图4中步骤S110A的流程图。FIG. 5 is a flowchart of step S110A in FIG. 4 according to an embodiment of the application.
图6为本申请实施例图1中步骤S110另一实施例的流程图。Fig. 6 is a flowchart of another embodiment of step S110 in Fig. 1 of the embodiment of the application.
图7为本申请实施例一中步骤S111以及步骤S112的流程图。FIG. 7 is a flowchart of step S111 and step S112 in the first embodiment of this application.
图8为本申请基于视频的眼球转向确定系统实施例二的程序模块示意图。FIG. 8 is a schematic diagram of the program modules of the second embodiment of the video-based eyeball turning determination system of this application.
图9为本申请计算机设备实施例三的硬件结构示意图。FIG. 9 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of this application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前 提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without any creative work shall fall within the protection scope of this application.
本申请的技术方案可应用于人工智能、智慧城市和/或区块链技术领域。可选的,本申请涉及的数据如视频、特征队列矩阵和/或转向角度等可存储于数据库中,或者可以存储于区块链中,比如通过区块链分布式存储,本申请不做限定。The technical solution of this application can be applied to the fields of artificial intelligence, smart city and/or blockchain technology. Optionally, the data involved in this application, such as video, feature queue matrix, and/or steering angle, can be stored in a database, or can be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application .
实施例一Example one
参阅图1,示出了本申请实施例一之基于视频的眼球转向确定方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备2为执行主体进行示例性描述。具体如下。Referring to FIG. 1, there is shown a flow chart of the method for determining eyeball rotation based on video in the first embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps. The following exemplarily describes the computer device 2 as the execution subject. details as follows.
步骤S100,获取目标视频,所述目标视频为目标用户观看目标产品的视频。Step S100: Obtain a target video, where the target video is a video of a target user watching a target product.
具体地,通过摄像头拍摄目标用户观看目标产品的过程,得到目标视频,将目标视频传输至计算机设备2以供处理。Specifically, the process of the target user watching the target product is captured by the camera to obtain the target video, and the target video is transmitted to the computer device 2 for processing.
步骤S102,对所述目标视频进行眼球特征标注,得到标注视频。Step S102: Perform eye feature annotation on the target video to obtain an annotation video.
具体地,将目标视频的每帧图像进行图像分割、物体检测、图片加注等处理,得到标注视频。Specifically, each frame of the target video is processed by image segmentation, object detection, image annotation, etc., to obtain annotated video.
示例性地,参阅图2,步骤S102进一步包括:Exemplarily, referring to FIG. 2, step S102 further includes:
步骤S102A,识别所述目标视频中每帧图像的眼球特征。Step S102A: Identify the eyeball feature of each frame of image in the target video.
具体地,通过眼球关键点检测,识别出目标视频中每帧图像的眼球特征。Specifically, through eyeball key point detection, the eyeball feature of each frame of image in the target video is identified.
步骤S102B,通过标注框对所述眼球特征所在的区域进行框选,得到标注视频。In step S102B, the area where the eyeball feature is located is selected by the marking frame to obtain the marked video.
具体地,通过标注框框选出每帧视频的眼球关键点对应的区域,得到标注视频。并且标注眼球朝向,以获取目标视频中眼球转向运动区域。Specifically, the area corresponding to the eyeball key points of each frame of video is selected through the marking frame to obtain the marked video. And mark the eyeball direction to obtain the eyeball turning movement area in the target video.
步骤S104,将所述标注视频输入至眼球转向特征识别模型中,其中,所述眼球转向特征识别模型包括眼球特征提取层、帧关系处理层以及眼球转向动作识别层。Step S104: Input the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer.
具体地,眼球转向特征识别模型为预先训练的模型,用以对标注视频进行分析,并得到眼球转向特征队列矩阵。基于深度学习网络模型预先训练眼球转向特征识别模型:Specifically, the eyeball turning feature recognition model is a pre-trained model, which is used to analyze the annotated video and obtain the eyeball turning feature queue matrix. Pre-training the eyeball turning feature recognition model based on the deep learning network model:
获取大量样本视频数据,并识别每个样本视频数据中的每帧样本眼球特征区域,以得到样本图像;按照时间序列对样本图像进行标注,得到标注样本图像;将标注样本图像输入至深度神经网络中,深度神经网络的CNN卷积层提取标注样本图像的样本特征向量;将样本特征向量通过像素处理计算相邻帧标注样本图像之间差异,得到差异值;根据差异值对相同的样本图像进行删除,得到特征队列;通过全连接输出层输出基于特征队列得到的眼球转向特征队列矩阵。特征提取方法包括但不限于基于深度神经网络的面部特征提取算法和基于几何特征的眼球转向特征提取算法。Obtain a large number of sample video data, and identify the sample eyeball feature area of each frame in each sample video data to obtain sample images; annotate the sample images in time series to obtain annotated sample images; input the annotated sample images into the deep neural network In, the CNN convolutional layer of the deep neural network extracts the sample feature vector of the labeled sample image; the sample feature vector is processed by pixel to calculate the difference between the adjacent frame labeled sample images, and the difference value is obtained; the same sample image is processed according to the difference value Delete to get the feature queue; output the eyeball turning feature queue matrix based on the feature queue through the fully connected output layer. Feature extraction methods include, but are not limited to, facial feature extraction algorithms based on deep neural networks and eye turn feature extraction algorithms based on geometric features.
示例性地,所述眼球特征提取层,用于从所述目标视频的每帧图像中提取目标用户的眼球特征,并将所述眼球特征转换成特征矩阵;Exemplarily, the eyeball feature extraction layer is used to extract the eyeball feature of the target user from each frame of the target video, and convert the eyeball feature into a feature matrix;
所述帧关系处理层,用于根据所述目标视频的每帧图像的视频时间点,确定每帧带有眼球特征的图像之间的帧关系;以及The frame relationship processing layer is configured to determine the frame relationship between images with eyeball features in each frame according to the video time point of each frame of the target video; and
所述眼球转向动作识别层,用于根据所述帧关系与所述特征矩阵确定目标用户的眼球转向特征队列矩阵。The eye turning action recognition layer is used to determine the eye turning feature queue matrix of the target user according to the frame relationship and the feature matrix.
步骤S106,通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,并将每帧图像对应的特征矩阵输入至所述帧关系处理层。Step S106: Convert each frame of the annotated video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of image to the frame relationship processing layer.
具体地,眼球特征提取层将该目标视频拆分为每一帧图像,并从每帧图像中提取出眼球转向,得到每帧图像各自对应的特征。眼球特征由多个关键点组成,可以为128个或512个关键点组成的特征矩阵。Specifically, the eyeball feature extraction layer splits the target video into each frame of image, and extracts the eyeball turning from each frame of image, and obtains the corresponding feature of each frame of image. The eyeball feature is composed of multiple key points, which can be a feature matrix composed of 128 or 512 key points.
示例性地,参阅图3,步骤S106进一步包括:Exemplarily, referring to FIG. 3, step S106 further includes:
步骤S106A,确定所述标注视频的每帧图像的眼球关键点,所述眼球关键点包括128 个关键点或者256个关键点。Step S106A: Determine the eyeball key points of each frame of the annotated video, where the eyeball key points include 128 key points or 256 key points.
具体地,眼球特征提取层将该标注视频拆分为每一帧图像,并从每帧图像中提取出眼球转向特征,得到每帧图像各自对应的特征矩阵。眼球特征由多个眼球关键点组成,可以为128个或512个关键点。Specifically, the eyeball feature extraction layer splits the annotated video into each frame of image, and extracts the eyeball turning feature from each frame of image, and obtains the feature matrix corresponding to each frame of image. The eyeball feature is composed of multiple eyeball key points, which can be 128 or 512 key points.
步骤S106B,获取所述每帧图像的眼球关键点的像素点坐标。Step S106B: Obtain the pixel point coordinates of the key eyeball points of each frame of image.
具体地,获取每个眼球关键点的像素点坐标,首先将每帧图像进行灰度化处理,得到二维灰度图像,再转化为二维坐标。Specifically, the pixel coordinates of each key point of the eyeball are obtained, and each frame of image is firstly grayed out to obtain a two-dimensional gray-scale image, which is then converted into two-dimensional coordinates.
步骤S106C,根据所述每帧图像的所述眼球关键点建立特征矩阵,所述特征矩阵包括128个或者256个像素点坐标。In step S106C, a feature matrix is established according to the eyeball key points of each frame of image, and the feature matrix includes 128 or 256 pixel coordinates.
具体地,将像素点坐标进行排序,得到128行或者256行、2列形式的特征矩阵。Specifically, the pixel coordinates are sorted to obtain a feature matrix in the form of 128 rows or 256 rows and 2 columns.
步骤S108,所述帧关系处理层根据所述特征矩阵对应的视频时间点对所述每帧图像的特征矩阵进行排序,得到特征队列,并将所述特征队列输入至所述眼球转向动作识别层。Step S108, the frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning action recognition layer .
具体地,帧关系处理层计算相邻视频时间点的对应的特征矩阵,以判断是否对该帧图像进行处理。帧关系处理层对相邻两帧图像进行差分运算,以得到差分图像特征,通过差分图像特征分析得到眼球转向的运动路线,即,当相邻两帧图像的差分图像特征由变化到保持不变时,表示此时眼球转向以完成转向运动;当相邻两帧图像的差分图像特征由不变到变化时,表示此时眼球开始进行眼球转向运动,得到此时的特征队列。将每帧图像的特征矩阵按视频时间点的先后顺序进行排列,得到特征队列,便于后续的计算。将特征队列作为每帧图像各自对应的特征之间的帧关系。Specifically, the frame relationship processing layer calculates the corresponding feature matrix of adjacent video time points to determine whether to process the frame image. The frame relationship processing layer performs differential operations on two adjacent frames of images to obtain differential image features, and obtains the movement route of the eyeball turning through differential image feature analysis, that is, when the differential image features of adjacent two frames of images change from change to remain unchanged , Means that the eyeballs are turning to complete the turning movement at this time; when the difference image characteristics of the adjacent two frames of images change from unchanged to changing, it means that the eyeballs start to move the eyeballs at this time, and the feature queue at this time is obtained. The feature matrix of each frame of image is arranged in the order of the video time point to obtain a feature queue, which is convenient for subsequent calculations. The feature queue is regarded as the frame relationship between the corresponding features of each frame of image.
步骤S110,所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,并基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度。In step S110, the eye-turning action recognition layer performs feature fusion on the feature queue to obtain an eye-turn feature queue matrix, and determines the target steering angle of the target user based on the eye-turn feature queue matrix.
具体地,眼球转向特征层对特征队列进行查重处理,将队列中相同的特征进行删除,以得到不同眼球特征的目标特征队列,并且将该目标特征队列的数组按时间顺序进行组合,得到眼球转向特征队列矩阵。Specifically, the eyeball turns to the feature layer to perform duplicate check processing on the feature queue, delete the same features in the queue to obtain target feature queues with different eyeball features, and combine the arrays of the target feature queues in chronological order to obtain eyeballs Turn to the feature queue matrix.
示例性地,参阅图4,步骤S110进一步包括:Exemplarily, referring to FIG. 4, step S110 further includes:
步骤S110A,计算相邻帧图像的差分图像特征,以判断相邻帧图像对应的眼球特征是否相同。Step S110A: Calculate the difference image features of adjacent frame images to determine whether the eyeball features corresponding to the adjacent frame images are the same.
具体地,通过差分运算计算相邻帧图像的特征矩阵之间的差值,得到差分图像特征。当相邻两帧图像的差分图像特征由变化到保持不变时,表示此时眼球转向以完成转向运动;当相邻两帧图像的差分图像特征由不变到变化时,表示此时眼球开始进行眼球转向运动。Specifically, the difference between the feature matrices of adjacent frame images is calculated through a difference operation to obtain the feature of the difference image. When the difference image characteristics of two adjacent frames of images change from change to remain unchanged, it means that the eyeballs are turning to complete the turning movement; when the difference image characteristics of the adjacent two frames of images change from unchanged to change, it means that the eyeballs start at this time Perform eye-turning movements.
步骤S110B,如果相同,则保留其中一帧图像对应的特征矩阵,并从所述特征队列中删除另一个相同的特征矩阵,直到所述特征队列中的特征矩阵均不同,得到目标特征队列。In step S110B, if they are the same, the feature matrix corresponding to one frame of the image is retained, and another identical feature matrix is deleted from the feature queue until the feature matrixes in the feature queue are all different, and the target feature queue is obtained.
具体地,如果相同,则保留一帧眼球特征对应的特征矩阵,保留的特征可以为后一个或者前一个。如果保留的眼球特征为最后一个相同的眼球特征,则表明该特征队列包括了转向时间;如果保留的眼球特征不是最后一个相同特征,则表明该特征队列没有包括转向时间。当特征队列中的特征矩阵均不同,即眼球转向均不同,表示该特征队列对应的多帧图像,为眼球运动区域。Specifically, if they are the same, the feature matrix corresponding to the eyeball feature of one frame is retained, and the retained feature can be the latter or the former. If the retained eye feature is the last identical eye feature, it indicates that the feature queue includes the turning time; if the retained eye feature is not the last identical feature, it indicates that the feature queue does not include the turning time. When the feature matrices in the feature queue are all different, that is, the eyeball turns are all different, it means that the multi-frame image corresponding to the feature queue is the eye movement area.
步骤S110C,将所述目标特征队列中的特征矩阵进行组合,得到所述眼球转向特征队列矩阵。Step S110C: Combine the feature matrices in the target feature queue to obtain the eyeball turning feature queue matrix.
具体地,目标特征队列包括有目标转向角度,目标转向角度对应于特征队列,将特征队列的特征矩阵,按时间顺序进行组合,得到眼球转向特征队列矩阵。Specifically, the target feature queue includes a target steering angle, and the target steering angle corresponds to the feature queue. The feature matrix of the feature queue is combined in chronological order to obtain the eye-turning feature queue matrix.
示例性地,参阅图5,步骤S110A进一步包括:Exemplarily, referring to FIG. 5, step S110A further includes:
步骤S110A1,获取相邻帧图像的像素点坐标。In step S110A1, the pixel coordinates of the adjacent frame image are obtained.
具体地,设置当前第一帧图像为F_k(x,y),第二帧图像为F_(k-1)(x,y),(x,y) 为每帧图像中的像素点坐标。Specifically, the current first frame of image is set to F_k(x, y), the second frame of image is set to F_(k-1)(x, y), and (x, y) is the pixel point coordinates in each frame of image.
步骤S110A2,对所述相邻帧图像的像素点坐标进行差分运算,得到差分图像特征。Step S110A2: Perform a difference operation on the pixel coordinates of the adjacent frame images to obtain a difference image feature.
具体地,参见差分运算公式D_k(x,y)=|F_k(x,y)-F_(k-1)(x,y)|进行计算;D_k(x,y)为差分图像特征。Specifically, refer to the difference operation formula D_k(x,y)=|F_k(x,y)-F_(k-1)(x,y)| for calculation; D_k(x,y) is the difference image feature.
步骤S110A3,将所述差分图像特征与预先设置的二值化阈值进行比对,以判断相邻帧所述目标图像对应的眼球特征是否相同。In step S110A3, the difference image feature is compared with a preset binarization threshold to determine whether the eyeball features corresponding to the target image in adjacent frames are the same.
具体地,以公式|D_k(x,y)|>T将差分图像特征与预先设置的二值化阈值进行比对,若大于则表示不同,小于表示相同。Specifically, the difference image feature is compared with a preset binarization threshold by the formula |D_k(x, y)|>T, if it is greater than it means different, and less than it means the same.
示例性地,参阅图6,步骤S110进一步包括:Exemplarily, referring to FIG. 6, step S110 further includes:
步骤S1101,以目标用户的眼球的中心位置为原点,对所述目标视频中产品的位置进行坐标标注。Step S1101, using the center position of the eyeball of the target user as the origin, and mark the position of the product in the target video with coordinates.
具体地,目标视频中具有多个产品,将每个产品的位置,以目标用户的眼球的中心位置为原点,进行坐标标注,以便将基于眼球转向特征队列矩阵计算得到的目标转向角度,对应到目标产品。也可根据产品的坐标计算出每个产品相对于眼球的中心位置的角度。Specifically, there are multiple products in the target video, and the position of each product is marked with the center position of the target user’s eyeball as the origin, so that the target steering angle calculated based on the eyeball turning feature queue matrix is mapped to Target product. The angle of each product relative to the center position of the eyeball can also be calculated according to the coordinates of the product.
步骤S1102,计算所述眼球转向特征队列矩阵的矩阵值,得到目标转向角度。Step S1102: Calculate the matrix value of the eyeball turning feature queue matrix to obtain the target turning angle.
具体地,计算出目标特征队列对应的眼球转向特征队列矩阵的矩阵值,得到目标用户的目标转向角度,将目标转向角度对应于坐标上,得到目标产品。若目标转向角度对应的位置有偏差,选择最靠近该目标转向角度的产品作为目标产品。Specifically, the matrix value of the eyeball turning characteristic queue matrix corresponding to the target characteristic queue is calculated to obtain the target steering angle of the target user, and the target steering angle is corresponding to the coordinates to obtain the target product. If the position corresponding to the target steering angle is deviated, the product closest to the target steering angle is selected as the target product.
示例性地,参阅图7,所述方法还包括:Exemplarily, referring to FIG. 7, the method further includes:
步骤S111,获取所述眼球转向特征队列矩阵对应的视频时间点。Step S111: Obtain a video time point corresponding to the eye turn feature queue matrix.
具体地,获取眼球转向特征队列矩阵对应的多个帧图像对应的视频时间点,由于帧关系处理层中已经得到,可进行时间标注,以便进行获取。Specifically, the video time points corresponding to the multiple frame images corresponding to the eye turn feature queue matrix are acquired. Since the frame relationship processing layer has been obtained, time stamping may be performed for acquisition.
步骤S112,计算所述眼球转向特征队列矩阵中的第一个特征矩阵对应的视频时间点到最后一个特征矩阵对应的视频时间点之间的距离,作为眼球转向时间。Step S112: Calculate the distance between the video time point corresponding to the first feature matrix in the eye turning feature queue matrix and the video time point corresponding to the last feature matrix as the eye turning time.
具体地,将最后帧图像的时间减去首帧图像的时间,得到目标产品的目标转向时间。目标转向时间的倒数即为感兴趣度,转向时间越长,倒数越小,说明感兴趣度越大。Specifically, the time of the last frame of image is subtracted from the time of the first frame of image to obtain the target turning time of the target product. The reciprocal of the target turning time is the degree of interest. The longer the turning time, the smaller the reciprocal, the greater the degree of interest.
实施例二Example two
请继续参阅图8,示出了本申请基于视频的眼球转向确定系统实施例二的程序模块示意图。在本实施例中,基于视频的眼球转向确定系统20可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述基于视频的眼球转向确定方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序本身更适合于描述基于视频的眼球转向确定系统20在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:Please continue to refer to FIG. 8, which shows a schematic diagram of the program modules of the second embodiment of the video-based eyeball turning determination system of the present application. In this embodiment, the video-based eyeball turning determination system 20 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors. , In order to complete this application, and realize the above-mentioned video-based eyeball turning determination method. The program module referred to in the embodiments of the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable than the program itself to describe the execution process of the video-based eyeball turning determination system 20 in the storage medium. The following description will specifically introduce the functions of each program module in this embodiment:
第一获取模块200,用于获取目标视频,所述目标视频为目标用户观看目标产品的视频。The first obtaining module 200 is configured to obtain a target video, where the target video is a video of a target user watching a target product.
具体地,通过摄像头拍摄目标用户观看目标产品的过程,得到目标视频,将目标视频传输至计算机设备2以供处理。Specifically, the process of the target user watching the target product is captured by the camera to obtain the target video, and the target video is transmitted to the computer device 2 for processing.
标注模块202,用于对所述目标视频进行眼球特征标注,得到标注视频。The tagging module 202 is used to tag the target video with eyeball features to obtain the tagged video.
示例性地,所述标注模块202还用于:Exemplarily, the labeling module 202 is further used for:
识别所述目标视频中每帧图像的眼球特征。Identify the eyeball features of each frame of image in the target video.
具体地,通过眼球关键点检测,识别出目标视频中每帧图像的眼球特征。Specifically, through eyeball key point detection, the eyeball feature of each frame of image in the target video is identified.
通过标注框对所述眼球特征所在的区域进行框选,得到标注视频。The area where the eyeball feature is located is selected by the marking frame to obtain the marked video.
具体地,通过标注框框选出每帧视频的眼球关键点对应的区域,得到标注视频。并且 标注眼球朝向,以获取目标视频中眼球转向运动区域。Specifically, the area corresponding to the eyeball key points of each frame of video is selected through the marking frame to obtain the marked video. And mark the eyeball direction to obtain the eyeball turning movement area in the target video.
输入模块204,用于将所述标注视频输入至眼球转向特征识别模型中,其中,所述眼球转向特征识别模型包括眼球特征提取层、帧关系处理层以及眼球转向动作识别层。The input module 204 is configured to input the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer.
具体地,眼球转向特征识别模型为预先训练的模型,用以对标注视频进行分析,并得到眼球转向特征队列矩阵。基于深度学习网络模型预先训练眼球转向特征识别模型:Specifically, the eyeball turning feature recognition model is a pre-trained model, which is used to analyze the annotated video and obtain the eyeball turning feature queue matrix. Pre-training the eyeball turning feature recognition model based on the deep learning network model:
获取大量样本视频数据,并识别每个样本视频数据中的每帧样本眼球特征区域,以得到样本图像;按照时间序列对样本图像进行标注,得到标注样本图像;将标注样本图像输入至深度神经网络中,深度神经网络的CNN卷积层提取标注样本图像的样本特征向量;将样本特征向量通过像素处理计算相邻帧标注样本图像之间差异,得到差异值;根据差异值对相同的样本图像进行删除,得到特征队列;通过全连接输出层输出基于特征队列得到的眼球转向特征队列矩阵。特征提取方法包括但不限于基于深度神经网络的面部特征提取算法和基于几何特征的眼球转向特征提取算法。Obtain a large number of sample video data, and identify the sample eyeball feature area of each frame in each sample video data to obtain sample images; annotate the sample images in time series to obtain annotated sample images; input the annotated sample images into the deep neural network In, the CNN convolutional layer of the deep neural network extracts the sample feature vector of the labeled sample image; the sample feature vector is processed by pixel to calculate the difference between the adjacent frame labeled sample images, and the difference value is obtained; the same sample image is processed according to the difference value Delete to get the feature queue; output the eyeball turning feature queue matrix based on the feature queue through the fully connected output layer. Feature extraction methods include, but are not limited to, facial feature extraction algorithms based on deep neural networks and eye turn feature extraction algorithms based on geometric features.
示例性地,所述眼球特征提取层,用于从所述目标视频的每帧图像中提取目标用户的眼球特征,并将所述眼球特征转换成特征矩阵;Exemplarily, the eyeball feature extraction layer is used to extract the eyeball feature of the target user from each frame of the target video, and convert the eyeball feature into a feature matrix;
所述帧关系处理层,用于根据所述目标视频的每帧图像的视频时间点,确定每帧带有眼球特征的图像之间的帧关系;以及The frame relationship processing layer is configured to determine the frame relationship between images with eyeball features in each frame according to the video time point of each frame of the target video; and
所述眼球转向动作识别层,用于根据所述帧关系与所述特征矩阵确定目标用户的眼球转向特征队列矩阵。The eye turning action recognition layer is used to determine the eye turning feature queue matrix of the target user according to the frame relationship and the feature matrix.
转化模块206,用于通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,并将每帧图像对应的特征矩阵输入至所述帧关系处理层。The conversion module 206 is configured to convert each frame of image of the annotated video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of image to the frame relationship processing layer.
具体地,眼球特征提取层将该目标视频拆分为每一帧图像,并从每帧图像中提取出眼球转向,得到每帧图像各自对应的特征。眼球特征由多个关键点组成,可以为128个或512个关键点组成的特征矩阵。Specifically, the eyeball feature extraction layer splits the target video into each frame of image, and extracts the eyeball turning from each frame of image, and obtains the corresponding feature of each frame of image. The eyeball feature is composed of multiple key points, which can be a feature matrix composed of 128 or 512 key points.
示例性地,所述转化模块206还用于:Exemplarily, the conversion module 206 is also used for:
确定所述标注视频的每帧图像的眼球关键点,所述眼球关键点包括128个关键点或者256个关键点。The eyeball key points of each frame of the image of the annotation video are determined, where the eyeball key points include 128 key points or 256 key points.
具体地,眼球特征提取层将该标注视频拆分为每一帧图像,并从每帧图像中提取出眼球转向特征,得到每帧图像各自对应的特征矩阵。眼球特征由多个眼球关键点组成,可以为128个或512个关键点。Specifically, the eyeball feature extraction layer splits the annotated video into each frame of image, and extracts the eyeball turning feature from each frame of image, and obtains the feature matrix corresponding to each frame of image. The eyeball feature is composed of multiple eyeball key points, which can be 128 or 512 key points.
获取所述每帧图像的眼球关键点的像素点坐标。Obtain the pixel point coordinates of the eyeball key points of each frame of image.
具体地,获取每个眼球关键点的像素点坐标,首先将每帧图像进行灰度化处理,得到二维灰度图像,再转化为二维坐标。Specifically, the pixel coordinates of each key point of the eyeball are obtained, and each frame of image is firstly grayed out to obtain a two-dimensional gray-scale image, which is then converted into two-dimensional coordinates.
根据所述每帧图像的所述眼球关键点建立特征矩阵,所述特征矩阵包括128个或者256个像素点坐标。A feature matrix is established according to the eyeball key points of each frame of image, and the feature matrix includes 128 or 256 pixel coordinates.
具体地,将像素点坐标进行排序,得到128行或者256行、2列形式的特征矩阵。Specifically, the pixel coordinates are sorted to obtain a feature matrix in the form of 128 rows or 256 rows and 2 columns.
特征排序模块208,用于所述帧关系处理层根据所述特征矩阵对应的视频时间点对所述每帧图像的特征矩阵进行排序,得到特征队列,并将所述特征队列输入至所述眼球转向动作识别层。The feature sorting module 208 is used for the frame relation processing layer to sort the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and input the feature queue to the eyeball Turn to the action recognition layer.
具体地,帧关系处理层计算相邻视频时间点的对应的特征矩阵,以判断是否对该帧图像进行处理。帧关系处理层对相邻两帧图像进行差分运算,以得到差分图像特征,通过差分图像特征分析得到眼球转向的运动路线,即,当相邻两帧图像的差分图像特征由变化到保持不变时,表示此时眼球转向以完成转向运动;当相邻两帧图像的差分图像特征由不变到变化时,表示此时眼球开始进行眼球转向运动,得到此时的特征队列。将每帧图像的特征矩阵按视频时间点的先后顺序进行排列,得到特征队列,便于后续的计算。将特征队列 作为每帧图像各自对应的特征之间的帧关系。Specifically, the frame relationship processing layer calculates the corresponding feature matrix of adjacent video time points to determine whether to process the frame image. The frame relationship processing layer performs differential operations on two adjacent frames of images to obtain differential image features, and obtains the movement route of the eyeball turning through differential image feature analysis, that is, when the differential image features of adjacent two frames of images change from change to remain unchanged , Means that the eyeballs are turning to complete the turning movement at this time; when the difference image characteristics of the adjacent two frames of images change from unchanged to changing, it means that the eyeballs start to move the eyeballs at this time, and the feature queue at this time is obtained. The feature matrix of each frame of image is arranged in the order of the video time point to obtain a feature queue, which is convenient for subsequent calculations. The feature queue is regarded as the frame relationship between the corresponding features of each frame of image.
特征融合与输出模块210,用于所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,并基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度。The feature fusion and output module 210 is used for the eye turning action recognition layer to perform feature fusion on the feature queue to obtain an eye turning feature queue matrix, and to determine the target turning angle of the target user based on the eye turning feature queue matrix .
具体地,眼球转向特征层对特征队列进行查重处理,将队列中相同的特征进行删除,以得到不同眼球特征的目标特征队列,并且将该目标特征队列的数组按时间顺序进行组合,得到眼球转向特征队列矩阵。Specifically, the eyeball turns to the feature layer to perform duplicate check processing on the feature queue, delete the same features in the queue to obtain target feature queues with different eyeball features, and combine the arrays of the target feature queues in chronological order to obtain eyeballs Turn to the feature queue matrix.
示例性地,所述特征融合与输出模块210还用于:Exemplarily, the feature fusion and output module 210 is also used for:
计算相邻帧图像的差分图像特征,以判断相邻帧图像对应的眼球特征是否相同。Calculate the difference image features of adjacent frame images to determine whether the eyeball features corresponding to the adjacent frame images are the same.
具体地,通过差分运算计算相邻帧图像的特征矩阵之间的差值,得到差分图像特征。当相邻两帧图像的差分图像特征由变化到保持不变时,表示此时眼球转向以完成转向运动;当相邻两帧图像的差分图像特征由不变到变化时,表示此时眼球开始进行眼球转向运动。Specifically, the difference between the feature matrices of adjacent frame images is calculated through a difference operation to obtain the feature of the difference image. When the difference image characteristics of two adjacent frames of images change from change to remain unchanged, it means that the eyeballs are turning to complete the turning movement; when the difference image characteristics of the adjacent two frames of images change from unchanged to change, it means that the eyeballs start at this time Perform eye-turning movements.
如果相同,则保留其中一帧图像对应的特征矩阵,并从所述特征队列中删除另一个相同的特征矩阵,直到所述特征队列中的特征矩阵均不同,得到目标特征队列。If they are the same, the feature matrix corresponding to one frame of the image is retained, and another identical feature matrix is deleted from the feature queue until the feature matrices in the feature queue are all different, and the target feature queue is obtained.
具体地,如果相同,则保留一个眼球特征,保留的特征可以为后一个或者前一个。如果保留的眼球特征为最后一个相同的眼球特征,则表明该特征队列包括了转向时间;如果保留的眼球特征不是最后一个相同特征,则表明该特征队列没有包括转向时间。当目标特征队列中的特征矩阵均不同,即眼球转向均不同,表示该目标特征队列对应的多帧图像,为眼球运动区域。Specifically, if they are the same, one eyeball feature is retained, and the retained feature can be the latter or the former. If the retained eye feature is the last identical eye feature, it indicates that the feature queue includes the turning time; if the retained eye feature is not the last identical feature, it indicates that the feature queue does not include the turning time. When the feature matrices in the target feature queue are all different, that is, the turning of the eyeballs are all different, it means that the multiple frames of images corresponding to the target feature queue are eye movement regions.
将所述目标特征队列中的特征矩阵进行组合,得到所述眼球转向特征队列矩阵。Combining the feature matrices in the target feature queue to obtain the eyeball turning feature queue matrix.
具体地,目标特征队列包括有目标转向角度,目标转向角度对应于特征队列,将特征队列的特征矩阵,按时间顺序进行组合,得到眼球转向特征队列矩阵。Specifically, the target feature queue includes a target steering angle, and the target steering angle corresponds to the feature queue. The feature matrix of the feature queue is combined in chronological order to obtain the eye-turning feature queue matrix.
示例性地,所述特征融合与输出模块210进一步用于:Exemplarily, the feature fusion and output module 210 is further configured to:
获取相邻帧图像的像素点坐标。Get the pixel coordinates of the adjacent frame image.
具体地,设置当前第一帧图像为F_k(x,y),第二帧图像为F_(k-1)(x,y),(x,y)为每帧图像中的像素点坐标。Specifically, the current first frame of image is set to F_k(x, y), the second frame of image is set to F_(k-1)(x, y), and (x, y) is the pixel point coordinates in each frame of image.
对所述相邻帧图像的像素点坐标进行差分运算,得到差分图像特征。Performing a difference operation on the pixel coordinates of the adjacent frame images to obtain the difference image characteristics.
具体地,参见差分运算公式D_k(x,y)=|F_k(x,y)-F_(k-1)(x,y)|进行计算;D_k(x,y)为差分图像特征。Specifically, refer to the difference operation formula D_k(x,y)=|F_k(x,y)-F_(k-1)(x,y)| for calculation; D_k(x,y) is the difference image feature.
将所述差分图像特征与预先设置的二值化阈值进行比对,以判断相邻帧所述目标图像对应的眼球特征是否相同。The difference image feature is compared with a preset binarization threshold to determine whether the eyeball features corresponding to the target image in adjacent frames are the same.
具体地,以公式|D_k(x,y)|>T将差分图像特征与预先设置的二值化阈值进行比对,若大于则表示不同,小于表示相同。Specifically, the difference image feature is compared with a preset binarization threshold by the formula |D_k(x, y)|>T, if it is greater than it means different, and less than it means the same.
示例性地,所述特征融合与输出模块210还用于:Exemplarily, the feature fusion and output module 210 is also used for:
以目标用户的眼球的中心位置为原点,对所述目标视频中产品的位置进行坐标标注。Using the center position of the eyeball of the target user as the origin, the position of the product in the target video is coordinated.
具体地,目标视频中具有多个产品,将每个产品的位置,以目标用户的眼球的中心位置为原点,进行坐标标注,以便将基于眼球转向特征队列矩阵计算得到的目标转向角度,对应到目标产品。也可根据产品的坐标计算出每个产品相对于眼球的中心位置的角度。Specifically, there are multiple products in the target video, and the position of each product is marked with the center position of the target user’s eyeball as the origin, so that the target steering angle calculated based on the eyeball turning feature queue matrix is mapped to Target product. The angle of each product relative to the center position of the eyeball can also be calculated according to the coordinates of the product.
计算所述目标特征队列的矩阵值,得到目标转向角度。Calculate the matrix value of the target feature queue to obtain the target steering angle.
具体地,计算出目标特征队列对应的眼球转向特征队列矩阵的矩阵值,得到目标用户的目标转向角度,将目标转向角度对应于坐标上,得到目标产品。若目标转向角度对应的位置有偏差,选择最靠近该目标转向角度的产品作为目标产品。Specifically, the matrix value of the eyeball turning characteristic queue matrix corresponding to the target characteristic queue is calculated to obtain the target steering angle of the target user, and the target steering angle is corresponding to the coordinates to obtain the target product. If the position corresponding to the target steering angle is deviated, the product closest to the target steering angle is selected as the target product.
实施例三Example three
参阅图9,是本申请实施例三之计算机设备的硬件架构示意图。本实施例中,所述计 算机设备2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。该计算机设备2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图9所示,所述计算机设备2至少包括,但不限于,存储器和处理器。可选的,该计算机设备2还可包括网络接口和/或基于视频的眼球转向确定系统。例如,该计算机设备2可包括存储器21、处理器22、网络接口23以及基于视频的眼球转向确定系统20,如可通过系统总线相互通信连接存储器21、处理器22、网络接口23、以及基于视频的眼球转向确定系统20。其中:Refer to FIG. 9, which is a schematic diagram of the hardware architecture of the computer device according to the third embodiment of the present application. In this embodiment, the computer device 2 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. The computer device 2 may be a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers). As shown in Fig. 9, the computer device 2 at least includes, but is not limited to, a memory and a processor. Optionally, the computer device 2 may also include a network interface and/or a video-based eyeball turning determination system. For example, the computer device 2 may include a memory 21, a processor 22, a network interface 23, and a video-based eye turn determination system 20. For example, the memory 21, a processor 22, a network interface 23, and a video-based system bus can communicate with each other through a system bus. The eyeballs are turned to determine the system 20. in:
本实施例中,存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器21还可以既包括计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备2的操作系统和各类应用软件,例如实施例二的基于视频的眼球转向确定系统20的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。In this embodiment, the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SMC) equipped on the computer device 2. SD) card, flash card (Flash Card), etc. Of course, the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device. In this embodiment, the memory 21 is generally used to store the operating system and various application software installed in the computer device 2, such as the program code of the video-based eyeball turning determination system 20 in the second embodiment. In addition, the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制计算机设备2的总体操作。本实施例中,处理器22用于运行存储器21中存储的程序代码或者处理数据,例如运行基于视频的眼球转向确定系统20,以实现实施例一的基于视频的眼球转向确定方法。In some embodiments, the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 22 is generally used to control the overall operation of the computer device 2. In this embodiment, the processor 22 is configured to run the program code or process data stored in the memory 21, for example, to run the video-based eye-turn determination system 20, so as to implement the video-based eye-turn determination method of the first embodiment.
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述服务器2与其他电子装置之间建立通信连接。例如,所述网络接口23用于通过网络将所述服务器2与外部终端相连,在所述服务器2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。The network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the server 2 and other electronic devices. For example, the network interface 23 is used to connect the server 2 to an external terminal through a network, and to establish a data transmission channel and a communication connection between the server 2 and the external terminal. The network may be Intranet, Internet, Global System of Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
需要指出的是,图9仅示出了具有部件20-23的计算机设备2,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。It should be pointed out that FIG. 9 only shows the computer device 2 with components 20-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
在本实施例中,存储于存储器21中的所述基于视频的眼球转向确定系统20还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并由一个或多个处理器(本实施例为处理器22)所执行,以完成本申请。In this embodiment, the video-based eyeball turning determination system 20 stored in the memory 21 may also be divided into one or more program modules, and the one or more program modules are stored in the memory 21, and It is executed by one or more processors (the processor 22 in this embodiment) to complete the application.
例如,图8示出了所述实现基于视频的眼球转向确定系统20实施例二的程序模块示意图,该实施例中,所述基于视频的眼球转向确定系统20可以被划分为获取模块200、标注模块202、输入模块204、转化模块206、特征排序模块208以及特征融合与输出模块210。其中,本申请所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述所述基于视频的眼球转向确定系统20在所述计算机设备2中的执行过程。所述程序模块200-210的具体功能在实施例二中已有详细描述,在此不再赘述。For example, FIG. 8 shows a schematic diagram of a program module implementing the second embodiment of the video-based eye-turn determination system 20. In this embodiment, the video-based eye-turn determination system 20 can be divided into an acquisition module 200, an annotation Module 202, input module 204, transformation module 206, feature ranking module 208, and feature fusion and output module 210. Wherein, the program module referred to in the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable than a program to describe the execution process of the video-based eyeball turning determination system 20 in the computer device 2. The specific functions of the program modules 200-210 have been described in detail in the second embodiment, and will not be repeated here.
实施例四Example four
本实施例还提供一种计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被 处理器执行时实现相应功能。本实施例的计算机可读存储介质用于存储基于视频的眼球转向确定系统20,被处理器执行时实现实施例一的基于视频的眼球转向确定方法。This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Readable memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, servers, App application malls, etc., on which computer programs are stored, The corresponding function is realized when the program is executed by the processor. The computer-readable storage medium of this embodiment is used to store the video-based eyeball turning determination system 20, and when executed by a processor, realizes the video-based eyeball turning determination method of the first embodiment.
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。Optionally, the storage medium involved in this application, such as a computer-readable storage medium, may be non-volatile or volatile.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种基于视频的眼球转向确定方法,其中,包括:A video-based method for determining eyeball rotation, which includes:
    获取目标视频,所述目标视频为目标用户观看目标产品的视频;Acquiring a target video, where the target video is a video of the target user watching the target product;
    对所述目标视频进行眼球特征标注,得到标注视频;Marking the target video with eyeball features to obtain a marked video;
    将所述标注视频输入至眼球转向特征识别模型中,其中,所述眼球转向特征识别模型包括眼球特征提取层、帧关系处理层以及眼球转向动作识别层;Inputting the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
    通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,并将每帧图像对应的特征矩阵输入至所述帧关系处理层;Converting each frame of the annotated video into a feature matrix through the eye feature extraction layer, and inputting the feature matrix corresponding to each frame of image to the frame relationship processing layer;
    所述帧关系处理层根据所述特征矩阵对应的视频时间点对所述每帧图像的特征矩阵进行排序,得到特征队列,并将所述特征队列输入至所述眼球转向动作识别层;The frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
    所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,并基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度。The eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
  2. 根据权利要求1所述的方法,其中,基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度之后,还包括:The method according to claim 1, wherein after determining the target steering angle of the target user based on the eye-turning characteristic queue matrix, the method further comprises:
    获取所述眼球转向特征队列矩阵对应的视频时间点;Acquiring a video time point corresponding to the eyeball turning feature queue matrix;
    计算所述眼球转向特征队列矩阵中的第一个特征矩阵对应的视频时间点到最后一个特征矩阵对应的视频时间点之间的距离,作为眼球转向时间。The distance between the video time point corresponding to the first feature matrix and the video time point corresponding to the last feature matrix in the eye turning feature queue matrix is calculated as the eye turning time.
  3. 根据权利要求1所述的方法,其中,对所述目标视频进行眼球特征标注,得到标注视频包括:The method according to claim 1, wherein performing eye feature annotation on the target video to obtain the annotated video comprises:
    识别所述目标视频中每帧图像的眼球特征;Identifying the eyeball features of each frame of image in the target video;
    通过标注框对所述眼球特征所在的区域进行框选,得到标注视频。The area where the eyeball feature is located is selected by the marking frame to obtain the marked video.
  4. 根据权利要求1所述的方法,其中,通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵包括:The method according to claim 1, wherein the converting each frame of the annotated video into a feature matrix by the eyeball feature extraction layer comprises:
    确定所述标注视频的每帧图像的眼球关键点,所述眼球关键点包括128个关键点或者256个关键点;Determining eyeball key points of each frame of image of the annotation video, where the eyeball key points include 128 key points or 256 key points;
    获取所述每帧图像的眼球关键点的像素点坐标;Obtaining pixel coordinates of key eyeball points of each frame of image;
    根据所述每帧图像的所述眼球关键点建立特征矩阵,所述特征矩阵包括128个或者256个像素点坐标。A feature matrix is established according to the eyeball key points of each frame of image, and the feature matrix includes 128 or 256 pixel coordinates.
  5. 根据权利要求4所述的方法,其中,所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵包括:The method according to claim 4, wherein the eye-turning action recognition layer performs feature fusion on the feature queue to obtain the eye-turning feature queue matrix comprises:
    计算相邻帧图像的差分图像特征,以判断相邻帧图像对应的眼球特征是否相同;Calculate the difference image features of adjacent frame images to determine whether the eyeball features corresponding to the adjacent frame images are the same;
    如果相同,则保留其中一帧图像对应的特征矩阵,并从所述特征队列中删除另一个相同的特征矩阵,直到所述特征队列中的特征矩阵均不同,得到目标特征队列;If they are the same, retain the feature matrix corresponding to one frame of the image, and delete another identical feature matrix from the feature queue until the feature matrices in the feature queue are all different, and the target feature queue is obtained;
    将所述目标特征队列中的特征矩阵进行组合,得到所述眼球转向特征队列矩阵。Combining the feature matrices in the target feature queue to obtain the eyeball turning feature queue matrix.
  6. 根据权利要求5所述的方法,其中,计算相邻帧所述目标图像的差分图像特征,以判断相邻帧所述目标图像对应的眼球特征是否相同包括:The method according to claim 5, wherein calculating the difference image characteristics of the target image in adjacent frames to determine whether the eyeball characteristics corresponding to the target image in the adjacent frames are the same comprises:
    获取相邻帧图像的像素点坐标;Obtain the pixel coordinates of the adjacent frame image;
    对所述相邻帧图像的像素点坐标进行差分运算,得到差分图像特征;Performing a difference operation on the pixel coordinates of the adjacent frame images to obtain a difference image feature;
    将所述差分图像特征与预先设置的二值化阈值进行比对,以判断相邻帧所述目标图像对应的眼球特征是否相同。The difference image feature is compared with a preset binarization threshold to determine whether the eyeball features corresponding to the target image in adjacent frames are the same.
  7. 根据权利要求5所述的方法,其中,基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度包括:The method according to claim 5, wherein determining the target steering angle of the target user based on the eye-turning characteristic queue matrix comprises:
    以目标用户的眼球的中心位置为原点,对所述目标视频中产品的位置进行坐标标注;Using the center position of the eyeball of the target user as the origin, mark the position of the product in the target video with coordinates;
    计算所述眼球转向特征队列矩阵的矩阵值,得到目标转向角度。Calculate the matrix value of the eyeball turning feature queue matrix to obtain the target turning angle.
  8. 一种基于视频的眼球转向确定系统,其中,包括:A video-based eyeball turning determination system, which includes:
    获取模块,用于获取目标视频,所述目标视频为目标用户观看目标产品的视频;An obtaining module, configured to obtain a target video, where the target video is a video of a target user watching a target product;
    标注模块,用于对所述目标视频进行眼球特征标注,得到标注视频;An annotation module, configured to annotate the eyeball features of the target video to obtain the annotated video;
    输入模块,用于将所述标注视频输入至眼球转向特征识别模型中,其中,所述眼球转向特征识别模型包括眼球特征提取层、帧关系处理层以及眼球转向动作识别层;An input module for inputting the annotated video into an eyeball turning feature recognition model, wherein the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
    转化模块,用于通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,并将每帧图像对应的特征矩阵输入至所述帧关系处理层;A conversion module, configured to convert each frame of the annotated video into a feature matrix through the eyeball feature extraction layer, and input the feature matrix corresponding to each frame of the image to the frame relationship processing layer;
    特征排序模块,用于所述帧关系处理层根据所述特征矩阵对应的视频时间点对所述每帧图像的特征矩阵进行排序,得到特征队列,并将所述特征队列输入至所述眼球转向动作识别层;The feature sorting module is used for the frame relation processing layer to sort the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and input the feature queue to the eyeball turning Action recognition layer;
    特征融合与输出模块,用于所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,并基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度。The feature fusion and output module is used for the eye turning action recognition layer to perform feature fusion on the feature queue to obtain an eye turning feature queue matrix, and to determine the target turning angle of the target user based on the eye turning feature queue matrix.
  9. 一种计算机设备,其中,所述计算机设备包括存储器、处理器,所述存储器上存储有可在所述处理器上运行的计算机程序,所述计算机程序可被所述处理器执行时实现基于视频的眼球转向确定方法,所述基于视频的眼球转向确定方法包括以下步骤:A computer device, wherein the computer device includes a memory and a processor, and a computer program that can be run on the processor is stored in the memory, and the computer program can be executed by the processor to realize video-based The method for determining eyeball turning based on video includes the following steps:
    获取目标视频,所述目标视频为目标用户观看目标产品的视频;Acquiring a target video, where the target video is a video of the target user watching the target product;
    对所述目标视频进行眼球特征标注,得到标注视频;Marking the target video with eyeball features to obtain a marked video;
    将所述标注视频输入至眼球转向特征识别模型中,其中,所述眼球转向特征识别模型包括眼球特征提取层、帧关系处理层以及眼球转向动作识别层;Inputting the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
    通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,并将每帧图像对应的特征矩阵输入至所述帧关系处理层;Converting each frame of the annotated video into a feature matrix through the eye feature extraction layer, and inputting the feature matrix corresponding to each frame of image to the frame relationship processing layer;
    所述帧关系处理层根据所述特征矩阵对应的视频时间点对所述每帧图像的特征矩阵进行排序,得到特征队列,并将所述特征队列输入至所述眼球转向动作识别层;The frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
    所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,并基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度。The eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
  10. 根据权利要求9所述的计算机设备,其中,执行所述基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度之后,还包括:9. The computer device according to claim 9, wherein after performing said determining the target steering angle of the target user based on the eye-turning characteristic queue matrix, the method further comprises:
    获取所述眼球转向特征队列矩阵对应的视频时间点;Acquiring a video time point corresponding to the eyeball turning feature queue matrix;
    计算所述眼球转向特征队列矩阵中的第一个特征矩阵对应的视频时间点到最后一个特征矩阵对应的视频时间点之间的距离,作为眼球转向时间。The distance between the video time point corresponding to the first feature matrix and the video time point corresponding to the last feature matrix in the eye turning feature queue matrix is calculated as the eye turning time.
  11. 根据权利要求9所述的计算机设备,其中,执行所述通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,包括:9. The computer device according to claim 9, wherein executing the conversion of each frame of the annotated video into a feature matrix through the eyeball feature extraction layer comprises:
    确定所述标注视频的每帧图像的眼球关键点,所述眼球关键点包括128个关键点或者256个关键点;Determining eyeball key points of each frame of image of the annotation video, where the eyeball key points include 128 key points or 256 key points;
    获取所述每帧图像的眼球关键点的像素点坐标;Obtaining pixel coordinates of key eyeball points of each frame of image;
    根据所述每帧图像的所述眼球关键点建立特征矩阵,所述特征矩阵包括128个或者256个像素点坐标。A feature matrix is established according to the eyeball key points of each frame of image, and the feature matrix includes 128 or 256 pixel coordinates.
  12. 根据权利要求11所述的计算机设备,其中,执行所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,包括:The computer device according to claim 11, wherein executing the eye-turning action recognition layer to perform feature fusion on the feature queue to obtain an eye-turning feature queue matrix, comprising:
    计算相邻帧图像的差分图像特征,以判断相邻帧图像对应的眼球特征是否相同;Calculate the difference image features of adjacent frame images to determine whether the eyeball features corresponding to the adjacent frame images are the same;
    如果相同,则保留其中一帧图像对应的特征矩阵,并从所述特征队列中删除另一个相同的特征矩阵,直到所述特征队列中的特征矩阵均不同,得到目标特征队列;If they are the same, retain the feature matrix corresponding to one frame of the image, and delete another identical feature matrix from the feature queue until the feature matrices in the feature queue are all different, and the target feature queue is obtained;
    将所述目标特征队列中的特征矩阵进行组合,得到所述眼球转向特征队列矩阵。Combining the feature matrices in the target feature queue to obtain the eyeball turning feature queue matrix.
  13. 根据权利要求12所述的计算机设备,其中,执行所述计算相邻帧所述目标图像的差分图像特征,以判断相邻帧所述目标图像对应的眼球特征是否相同,包括:The computer device according to claim 12, wherein executing said calculating the difference image characteristics of the target image of adjacent frames to determine whether the eyeball characteristics corresponding to the target image in the adjacent frames are the same, comprising:
    获取相邻帧图像的像素点坐标;Obtain the pixel coordinates of the adjacent frame image;
    对所述相邻帧图像的像素点坐标进行差分运算,得到差分图像特征;Performing a difference operation on the pixel coordinates of the adjacent frame images to obtain a difference image feature;
    将所述差分图像特征与预先设置的二值化阈值进行比对,以判断相邻帧所述目标图像对应的眼球特征是否相同。The difference image feature is compared with a preset binarization threshold to determine whether the eyeball features corresponding to the target image in adjacent frames are the same.
  14. 根据权利要求12所述的计算机设备,其中,执行所述基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度,包括:The computer device according to claim 12, wherein performing the determining of the target steering angle of the target user based on the eye-turning characteristic queue matrix comprises:
    以目标用户的眼球的中心位置为原点,对所述目标视频中产品的位置进行坐标标注;Using the center position of the eyeball of the target user as the origin, mark the position of the product in the target video with coordinates;
    计算所述眼球转向特征队列矩阵的矩阵值,得到目标转向角度。Calculate the matrix value of the eyeball turning feature queue matrix to obtain the target turning angle.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行基于视频的眼球转向确定方法,所述基于视频的眼球转向确定方法包括以下步骤:A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor, so that the at least one processor performs a video-based eye shift A determination method, the video-based method for determining eyeball rotation includes the following steps:
    获取目标视频,所述目标视频为目标用户观看目标产品的视频;Acquiring a target video, where the target video is a video of the target user watching the target product;
    对所述目标视频进行眼球特征标注,得到标注视频;Marking the target video with eyeball features to obtain a marked video;
    将所述标注视频输入至眼球转向特征识别模型中,其中,所述眼球转向特征识别模型包括眼球特征提取层、帧关系处理层以及眼球转向动作识别层;Inputting the annotated video into an eyeball turning feature recognition model, where the eyeball turning feature recognition model includes an eyeball feature extraction layer, a frame relationship processing layer, and an eyeball turning action recognition layer;
    通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,并将每帧图像对应的特征矩阵输入至所述帧关系处理层;Converting each frame of the annotated video into a feature matrix through the eye feature extraction layer, and inputting the feature matrix corresponding to each frame of image to the frame relationship processing layer;
    所述帧关系处理层根据所述特征矩阵对应的视频时间点对所述每帧图像的特征矩阵进行排序,得到特征队列,并将所述特征队列输入至所述眼球转向动作识别层;The frame relationship processing layer sorts the feature matrix of each frame of image according to the video time point corresponding to the feature matrix to obtain a feature queue, and inputs the feature queue to the eyeball turning motion recognition layer;
    所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,并基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度。The eye turning action recognition layer performs feature fusion on the feature queue to obtain an eye turning feature queue matrix, and determines the target turning angle of the target user based on the eye turning feature queue matrix.
  16. 根据权利要求15所述的计算机可读存储介质,其中,执行所述基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度之后,还包括:15. The computer-readable storage medium according to claim 15, wherein after performing the determining of the target steering angle of the target user based on the eye-turning characteristic queue matrix, the method further comprises:
    获取所述眼球转向特征队列矩阵对应的视频时间点;Acquiring a video time point corresponding to the eyeball turning feature queue matrix;
    计算所述眼球转向特征队列矩阵中的第一个特征矩阵对应的视频时间点到最后一个特征矩阵对应的视频时间点之间的距离,作为眼球转向时间。The distance between the video time point corresponding to the first feature matrix and the video time point corresponding to the last feature matrix in the eye turning feature queue matrix is calculated as the eye turning time.
  17. 根据权利要求15所述的计算机可读存储介质,其中,执行所述通过所述眼球特征提取层将所述标注视频的每帧图像转化成特征矩阵,包括:15. The computer-readable storage medium according to claim 15, wherein executing the conversion of each frame of the annotated video into a feature matrix through the eyeball feature extraction layer comprises:
    确定所述标注视频的每帧图像的眼球关键点,所述眼球关键点包括128个关键点或者256个关键点;Determining eyeball key points of each frame of image of the annotation video, where the eyeball key points include 128 key points or 256 key points;
    获取所述每帧图像的眼球关键点的像素点坐标;Obtaining pixel coordinates of key eyeball points of each frame of image;
    根据所述每帧图像的所述眼球关键点建立特征矩阵,所述特征矩阵包括128个或者256个像素点坐标。A feature matrix is established according to the eyeball key points of each frame of image, and the feature matrix includes 128 or 256 pixel coordinates.
  18. 根据权利要求17所述的计算机可读存储介质,其中,执行所述眼球转向动作识别层对所述特征队列进行特征融合,得到眼球转向特征队列矩阵,包括:18. The computer-readable storage medium according to claim 17, wherein executing the eye-turning action recognition layer to perform feature fusion on the feature queue to obtain an eye-turning feature queue matrix, comprising:
    计算相邻帧图像的差分图像特征,以判断相邻帧图像对应的眼球特征是否相同;Calculate the difference image features of adjacent frame images to determine whether the eyeball features corresponding to the adjacent frame images are the same;
    如果相同,则保留其中一帧图像对应的特征矩阵,并从所述特征队列中删除另一个相同的特征矩阵,直到所述特征队列中的特征矩阵均不同,得到目标特征队列;If they are the same, retain the feature matrix corresponding to one frame of the image, and delete another identical feature matrix from the feature queue until the feature matrices in the feature queue are all different, and the target feature queue is obtained;
    将所述目标特征队列中的特征矩阵进行组合,得到所述眼球转向特征队列矩阵。Combining the feature matrices in the target feature queue to obtain the eyeball turning feature queue matrix.
  19. 根据权利要求18所述的计算机可读存储介质,其中,执行所述计算相邻帧所述目标图像的差分图像特征,以判断相邻帧所述目标图像对应的眼球特征是否相同,包括:18. The computer-readable storage medium according to claim 18, wherein executing the calculation of the difference image characteristics of the target image in adjacent frames to determine whether the eyeball characteristics corresponding to the target image in the adjacent frames are the same comprises:
    获取相邻帧图像的像素点坐标;Obtain the pixel coordinates of the adjacent frame image;
    对所述相邻帧图像的像素点坐标进行差分运算,得到差分图像特征;Performing a difference operation on the pixel coordinates of the adjacent frame images to obtain a difference image feature;
    将所述差分图像特征与预先设置的二值化阈值进行比对,以判断相邻帧所述目标图像对应的眼球特征是否相同。The difference image feature is compared with a preset binarization threshold to determine whether the eyeball features corresponding to the target image in adjacent frames are the same.
  20. 根据权利要求18所述的计算机可读存储介质,其中,执行所述基于所述眼球转向特征队列矩阵确定所述目标用户的目标转向角度,包括:18. The computer-readable storage medium according to claim 18, wherein performing the determining of the target steering angle of the target user based on the eye-turning characteristic queue matrix comprises:
    以目标用户的眼球的中心位置为原点,对所述目标视频中产品的位置进行坐标标注;Using the center position of the eyeball of the target user as the origin, mark the position of the product in the target video with coordinates;
    计算所述眼球转向特征队列矩阵的矩阵值,得到目标转向角度。Calculate the matrix value of the eyeball turning feature queue matrix to obtain the target turning angle.
PCT/CN2021/071261 2020-02-28 2021-01-12 Video-based eyeball turning determination method and system WO2021169642A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010128432.6A CN111353429A (en) 2020-02-28 2020-02-28 Interest degree method and system based on eyeball turning
CN202010128432.6 2020-02-28

Publications (1)

Publication Number Publication Date
WO2021169642A1 true WO2021169642A1 (en) 2021-09-02

Family

ID=71195806

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071261 WO2021169642A1 (en) 2020-02-28 2021-01-12 Video-based eyeball turning determination method and system

Country Status (2)

Country Link
CN (1) CN111353429A (en)
WO (1) WO2021169642A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353429A (en) * 2020-02-28 2020-06-30 深圳壹账通智能科技有限公司 Interest degree method and system based on eyeball turning
CN112053600B (en) * 2020-08-31 2022-05-03 上海交通大学医学院附属第九人民医院 Orbit endoscope navigation surgery training method, device, equipment and system
CN115544473B (en) * 2022-09-09 2023-11-21 苏州吉弘能源科技有限公司 Photovoltaic power station operation and maintenance terminal login control system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677024A (en) * 2015-12-31 2016-06-15 北京元心科技有限公司 Eye movement detection tracking method and device, and application of eye movement detection tracking method
CN107679448A (en) * 2017-08-17 2018-02-09 平安科技(深圳)有限公司 Eyeball action-analysing method, device and storage medium
CN109359512A (en) * 2018-08-28 2019-02-19 深圳壹账通智能科技有限公司 Eyeball position method for tracing, device, terminal and computer readable storage medium
US20190294240A1 (en) * 2018-03-23 2019-09-26 Aisin Seiki Kabushiki Kaisha Sight line direction estimation device, sight line direction estimation method, and sight line direction estimation program
CN110555426A (en) * 2019-09-11 2019-12-10 北京儒博科技有限公司 Sight line detection method, device, equipment and storage medium
CN111353429A (en) * 2020-02-28 2020-06-30 深圳壹账通智能科技有限公司 Interest degree method and system based on eyeball turning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677024A (en) * 2015-12-31 2016-06-15 北京元心科技有限公司 Eye movement detection tracking method and device, and application of eye movement detection tracking method
CN107679448A (en) * 2017-08-17 2018-02-09 平安科技(深圳)有限公司 Eyeball action-analysing method, device and storage medium
US20190294240A1 (en) * 2018-03-23 2019-09-26 Aisin Seiki Kabushiki Kaisha Sight line direction estimation device, sight line direction estimation method, and sight line direction estimation program
CN109359512A (en) * 2018-08-28 2019-02-19 深圳壹账通智能科技有限公司 Eyeball position method for tracing, device, terminal and computer readable storage medium
CN110555426A (en) * 2019-09-11 2019-12-10 北京儒博科技有限公司 Sight line detection method, device, equipment and storage medium
CN111353429A (en) * 2020-02-28 2020-06-30 深圳壹账通智能科技有限公司 Interest degree method and system based on eyeball turning

Also Published As

Publication number Publication date
CN111353429A (en) 2020-06-30

Similar Documents

Publication Publication Date Title
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
CN108470332B (en) Multi-target tracking method and device
WO2021093468A1 (en) Video classification method and apparatus, model training method and apparatus, device and storage medium
CN109960742B (en) Local information searching method and device
US12039440B2 (en) Image classification method and apparatus, and image classification model training method and apparatus
WO2023010758A1 (en) Action detection method and apparatus, and terminal device and storage medium
CN110503076B (en) Video classification method, device, equipment and medium based on artificial intelligence
WO2021169642A1 (en) Video-based eyeball turning determination method and system
WO2021238548A1 (en) Region recognition method, apparatus and device, and readable storage medium
CN108564102A (en) Image clustering evaluation of result method and apparatus
CN112749655B (en) Sight line tracking method, device, computer equipment and storage medium
CN108986137B (en) Human body tracking method, device and equipment
CN113706481A (en) Sperm quality detection method, sperm quality detection device, computer equipment and storage medium
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN113496208B (en) Video scene classification method and device, storage medium and terminal
CN111753766B (en) Image processing method, device, equipment and medium
Alkhudaydi et al. Counting spikelets from infield wheat crop images using fully convolutional networks
Mar et al. Cow detection and tracking system utilizing multi-feature tracking algorithm
Sharjeel et al. Real time drone detection by moving camera using COROLA and CNN algorithm
CN115115825B (en) Method, device, computer equipment and storage medium for detecting object in image
Liao et al. Multi-scale saliency features fusion model for person re-identification
Li et al. Location and model reconstruction algorithm for overlapped and sheltered spherical fruits based on geometry
CN111797704B (en) Action recognition method based on related object perception
CN113762231B (en) End-to-end multi-pedestrian posture tracking method and device and electronic equipment
CN113537359A (en) Training data generation method and device, computer readable medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21761730

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21761730

Country of ref document: EP

Kind code of ref document: A1