CN118097779A - Pet behavior recognition method and electronic equipment - Google Patents
Pet behavior recognition method and electronic equipment Download PDFInfo
- Publication number
- CN118097779A CN118097779A CN202410216799.1A CN202410216799A CN118097779A CN 118097779 A CN118097779 A CN 118097779A CN 202410216799 A CN202410216799 A CN 202410216799A CN 118097779 A CN118097779 A CN 118097779A
- Authority
- CN
- China
- Prior art keywords
- pet
- module
- behavior recognition
- data
- behavior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000006399 behavior Effects 0.000 claims abstract description 149
- 238000012545 processing Methods 0.000 claims abstract description 55
- 230000008909 emotion recognition Effects 0.000 claims abstract description 25
- 230000008451 emotion Effects 0.000 claims abstract description 24
- 238000001514 detection method Methods 0.000 claims abstract description 22
- 238000004458 analytical method Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 28
- 230000015654 memory Effects 0.000 claims description 27
- 239000013598 vector Substances 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 14
- 238000010586 diagram Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 230000004927 fusion Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 2
- 241000894007 species Species 0.000 description 15
- 238000004891 communication Methods 0.000 description 14
- 210000002569 neuron Anatomy 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000001815 facial effect Effects 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000002996 emotional effect Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000036651 mood Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000036506 anxiety Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 210000000078 claw Anatomy 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 235000003642 hunger Nutrition 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000010415 tidying Methods 0.000 description 1
- 210000005182 tip of the tongue Anatomy 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Human Computer Interaction (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Image Analysis (AREA)
Abstract
The application relates to the technical field of computer vision, and provides a pet behavior recognition method and electronic equipment, wherein the method inputs pet data into a pre-trained pet behavior recognition model; extracting features of the pet data by utilizing a feature processing module to obtain pet features corresponding to the pet data; the species classification module is used for classifying the species of the pet characteristics to obtain pet categories corresponding to the pet data; based on the pet category, carrying out emotion recognition on the pet characteristics by utilizing a face detection and emotion recognition module to obtain an emotion category corresponding to the pet data; carrying out gesture estimation on the pet characteristics by using a gesture estimation module to obtain coordinate information of each characteristic point corresponding to the pet data; based on the pet category, the emotion category and the coordinate information of each feature point, determining the behavior category corresponding to the pet data by using the comprehensive behavior analysis module as a pet behavior recognition result. By the method, the accuracy of pet behavior identification can be improved.
Description
Technical Field
The application relates to the technical field of computer vision, in particular to a pet behavior recognition method and electronic equipment.
Background
The pet behavior recognition is beneficial to the owners to better know the requirements and the emotion states of the pets, so that the pets are better cared and cared, the life quality of the pets is improved, and meanwhile, the safety harmony of families is also facilitated to be ensured. In the method for identifying the pet behaviors based on the pet image data in the related art, the pet behaviors are generally simply classified based on the body gestures of the pet, and the model structure using the pet behavior identification model is also simpler, so that the accuracy rate of the pet behavior identification is lower.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a pet behavior recognition method and an electronic device, which can solve the problem of low accuracy of pet behavior recognition caused by limitations of classification basis and model structure used in the related art.
The embodiment of the application provides a pet behavior identification method, which comprises the following steps: inputting pet data into a pre-trained pet behavior recognition model; extracting features of the pet data by utilizing a feature processing module of the pet behavior recognition model to obtain pet features corresponding to the pet data; the species classification module of the pet behavior recognition model is utilized to classify the species of the pet characteristics, and the pet category corresponding to the pet data is obtained; based on the pet category, carrying out emotion recognition on the pet characteristics by utilizing a face detection and emotion recognition module of the pet behavior recognition model to obtain an emotion category corresponding to the pet data; carrying out gesture estimation on the pet characteristics by using a gesture estimation module of the pet behavior recognition model to obtain coordinate information of each characteristic point corresponding to the pet data; and determining a behavior category corresponding to the pet data by utilizing a comprehensive behavior analysis module of the pet behavior recognition model based on the pet category, the emotion category and the coordinate information of each feature point, and taking the behavior category as the pet behavior recognition result.
In one embodiment, the pet data includes pet images and/or pet videos.
In one embodiment, prior to entering the pet data into the pre-trained pet behavior recognition model, the method further comprises preprocessing the pet data, the preprocessing comprising: if the pet data is the pet video, extracting video frames of the pet video, and taking the extracted video frames as pet images; subjecting the pet image to one or more of the following: denoising processing, brightness enhancement processing, contrast enhancement processing, rotation correction processing, frequency domain conversion and high-frequency information enhancement processing.
In one embodiment, the training method of the pet behavior recognition model comprises a supervised pre-training method and a supervised optimization method.
In one embodiment, the feature processing module comprises at least one set of sub-modules, wherein each set of sub-modules comprises a downsampling sub-module and a joint processing sub-module comprising a position coding unit and a multi-head processing unit, wherein: the position coding unit module performs dynamic position coding on each feature point in input data to obtain position coding information of each feature point, wherein the input data comprises the pet data; and the multi-head processing unit carries out global relative position coding on all the characteristic points through the feedforward neural network based on the position coding information of each characteristic point.
In one embodiment, the species classification module includes a multi-layer perceptron.
In one embodiment, the face detection and emotion recognition module includes a target detection network, a parallel convolution network, a fusion module, a classification function, wherein: the target detection network performs face detection on the pet characteristics and determines face areas in the pet characteristics; based on the channel attention, the spatial attention and DropBlock algorithm, the parallel convolution network carries out convolution processing on the face area to obtain two one-dimensional feature vectors; the fusion module fuses the two one-dimensional feature vectors by using a bilinear difference algorithm to obtain fused vectors; and the classification function carries out emotion classification on the fused vectors to obtain the emotion classification.
In one embodiment, the gesture estimation module performs decoupling on the pet features in two preset directions to obtain a set of vectors corresponding to each feature point in the pet features; and a multi-head thermodynamic diagram sub-module in the gesture estimation module determines coordinate information of each feature point according to a group of vectors corresponding to each feature point.
In one embodiment, the method further comprises displaying the pet behavior recognition result, comprising: if the pet data is a pet image, displaying the pet behavior recognition result and feature points of the pet in the pet feature in the pet image; or if the pet data is a historical pet video, displaying the pet behavior identification result and the characteristic points of the pet in each video frame in the historical pet video; or if the pet data is a real-time pet video, displaying the pet behavior recognition result and the characteristic points of the pet in the currently displayed video frame.
An embodiment of the present application provides a pet behavior recognition apparatus, the apparatus including: the input module is used for inputting the pet data into the pre-trained pet behavior recognition model; the feature processing module is used for extracting features of the pet data by utilizing the feature processing module of the pet behavior recognition model to obtain pet features corresponding to the pet data; the species classification module is used for classifying the species of the pet features by utilizing the species classification module of the pet behavior recognition model to obtain pet categories corresponding to the pet data; the emotion recognition module is used for carrying out emotion recognition on the pet characteristics by utilizing the face detection and emotion recognition module of the pet behavior recognition model based on the pet categories to obtain emotion categories corresponding to the pet data; the gesture estimation module is used for carrying out gesture estimation on the pet characteristics by utilizing the gesture estimation module of the pet behavior recognition model to obtain the coordinate information of each characteristic point corresponding to the pet data; and the comprehensive behavior analysis module is used for determining the behavior category corresponding to the pet data by utilizing the comprehensive behavior analysis module of the pet behavior recognition model based on the pet category, the emotion category and the coordinate information of each feature point, and taking the behavior category as the pet behavior recognition result.
An embodiment of the present application provides an electronic device including a processor and a memory, where the processor is configured to implement the pet behavior recognition method when executing a computer program stored in the memory.
Embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the pet behavior recognition method.
In summary, the pet behavior recognition method combines the advantages of convolution and Transform structure, so that the overall and local characteristics of the image can be considered, the analysis accuracy of animal behavior recognition based on the image is improved, and the calculation amount of network reasoning is reduced. In addition, by adding facial emotion recognition of the pet, the pet behavior can be better analyzed in combination with the emotion characteristics of the pet. And the pet is not required to be worn by a sensor, additional accessories are not required to be purchased, and the life habit of the pet can not be influenced.
Drawings
Fig. 1 is a block diagram of an electronic device according to an embodiment of the present application.
Fig. 2 is a flowchart of a pet behavior recognition method according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of a feature processing module according to an embodiment of the present application.
Fig. 4 is a schematic diagram of an overall module structure of a pet behavior recognition method according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of a face detection and emotion recognition module according to an embodiment of the present application.
Fig. 6 is a block diagram of a pet behavior recognition device according to an embodiment of the present application.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the application is for the purpose of describing one embodiment only and is not intended to be limiting of the application.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and the representation may have three relationships, for example, a and/or B may represent: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion. The following embodiments and features of the embodiments may be combined with each other without conflict.
In one embodiment, pet behavior recognition helps the owner to better understand the needs and emotional states of the pet, so that the pet is better cared for and cared for, the life quality of the pet is improved, and meanwhile safety harmony of the family is also guaranteed. The pet behavior recognition methods in the related art can be roughly classified into two types, one type is to recognize the pet behavior based on sensor data monitored by a sensor worn by the pet, and the other type is to recognize the pet behavior based on pet image data. The wearing of the sensor in the method for identifying animal behaviors can affect the living habit of the animal, but the method for identifying pet behaviors based on pet image data is usually only simple classification based on the body gestures of the pet, the emotion identification of the animal is not combined with the behavior identification of the animal, and a plurality of behavior identification network models at present only use convolution networks or only use Transformer networks, so that the model structure is simpler, and the accuracy of the pet behavior identification is lower.
In order to solve the problems, the embodiment of the application provides a pet behavior recognition method, which combines the advantages of convolution and Transform structure, can simultaneously consider the whole and local characteristics of an image, improves the analysis accuracy of animal behavior recognition based on the image, and reduces the calculation amount of network reasoning. In addition, by adding facial emotion recognition of the pet, the pet behavior can be better analyzed in combination with the emotion characteristics of the pet. And the pet is not required to be worn by a sensor, additional accessories are not required to be purchased, and the life habit of the pet can not be influenced.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 10 may be an electronic device such as a computer, a server, a mobile phone, a tablet computer, a notebook computer, etc., and the embodiment of the present application does not limit the specific type of the electronic device.
As shown in fig. 1, the electronic device 10 may include a communication module 101, a memory 102, a processor 103, an Input/Output (I/O) interface 104, and a bus 105. The processor 103 is coupled to the communication module 101, the memory 102, and the I/O interface 104, respectively, by a bus 105.
The communication module 101 may include a wired communication module and/or a wireless communication module. The wired communication module may provide one or more of a universal serial bus (universal serial bus, USB), a controller area network bus (Controller Area Network, CAN), etc. wired communication solution. The wireless communication module may provide one or more of wireless communication solutions such as wireless fidelity (WIRELESS FIDELITY, wi-Fi), bluetooth (BT), mobile communication networks, frequency modulation (frequency modulation, FM), near Field Communication (NFC), infrared (IR), and the like.
Memory 102 may include one or more random access memories (random access memory, RAM) and one or more non-volatile memories (NVM). Random access memory may be read directly from and written to by processor 103, may be used to store executable programs (e.g., machine instructions) for an operating system or other on-the-fly programs, may also be used to store data for users and applications, and the like. The random access memory may include a static random-access memory (SRAM), a dynamic random-access memory (dynamic random access memory, DRAM), a synchronous dynamic random-access memory (synchronous dynamic random access memory, SDRAM), a double data rate synchronous dynamic random-access memory (doubledata rate synchronous dynamic random access memory, DDR SDRAM), and the like.
The nonvolatile memory may store executable programs, store data of users and applications, and the like, and may be loaded into the random access memory in advance for the processor 103 to directly read and write. The nonvolatile memory may include a disk storage device, a flash memory (flash memory).
The memory 102 is used to store one or more computer programs. One or more computer programs are configured to be executed by the processor 103. The one or more computer programs include a plurality of instructions that when executed by the processor 103, implement the pet behavior identification method executing on the electronic device 10.
In other embodiments, the electronic device 10 further includes an external memory interface for connecting to an external memory to enable expansion of the memory capabilities of the electronic device 10.
The processor 103 may include one or more processing units, such as: the processor 103 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural-Network Processor (NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The processor 103 provides computing and control capabilities, for example, the processor 103 is configured to execute computer programs stored in the memory 102 to implement the pet behavior recognition methods described above.
The I/O interface 104 is used to provide a channel for user input or output, e.g., the I/O interface 104 may be used to connect various input/output devices, e.g., a mouse, keyboard, touch device, display screen, etc., so that a user may enter information, or visualize information.
The bus 105 is used at least to provide a pathway for communication between the communication module 101, the memory 102, the processor 103, and the I/O interface 104 in the electronic device 10.
It should be understood that the illustrated construction of the embodiments of the present application does not constitute a particular limitation of the electronic device 10. In other embodiments of the application, the electronic device 10 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Fig. 2 is a flowchart of a pet behavior recognition method according to an embodiment of the present application. The pet behavior recognition method is applied to an electronic device, such as the electronic device 10 in fig. 1, and specifically includes the following steps, the order of the steps in the flowchart may be changed according to different requirements, and some may be omitted.
Step S21, inputting the pet data into a pre-trained pet behavior recognition model.
In one embodiment, the electronic device may receive pet data entered by a user, the pet data including pet images and/or pet video, wherein the pet video may include historical pet video and real-time pet video. Specifically, the historical pet video represents a pet video obtained by historical shooting, and the real-time pet video may be a video stream being shot by the shooting device.
In one embodiment, prior to entering the pet data into the pre-trained pet behavior recognition model, the method further comprises preprocessing the pet data, the preprocessing comprising: if the pet data is the pet video, extracting video frames of the pet video, and taking the extracted video frames as pet images. The video frame extraction is carried out on the pet video, so that video data can be converted into image data, and the pet image can be conveniently processed and analyzed by using the pet behavior recognition model in the subsequent process.
In one embodiment, the preprocessing further includes one or more of the following processing the pet image: denoising processing, brightness enhancement processing, contrast enhancement processing, rotation correction processing, frequency domain conversion and high-frequency information enhancement processing. Through the preprocessing, the quality of the pet image can be improved, so that the accuracy of pet behavior recognition based on the pet image is improved.
In one embodiment, the training method of the pet behavior recognition model includes a supervised pretraining (Pre-training) method and a supervised optimizing (Fine-tuning) method. Specifically, the training method of the pet behavior recognition model comprises the following steps: constructing an initial model, wherein the initial model comprises a feature processing module, a species classification module, a face detection and emotion recognition module, a gesture estimation module and a comprehensive behavior analysis module; pre-training the initial model based on an AP-10K data set until an initial model meeting preset requirements is obtained, wherein the AP-10K data set comprises animal images with tags of animal types and animal characteristic points, the preset requirements comprise but are not limited to preset loss function convergence, training iteration times are larger than preset times and the like; the initial model meeting the preset requirements is optimized based on an Oxford-IIIT PET DATASET dataset, wherein the Oxford-IIIT PET DATASET dataset comprises pet images with pet behavior category labels.
Through a supervised pre-training and fine tuning method, an initial model can be trained on a large AP-10K data set, and then fine tuning of a pet behavior recognition task is realized on a smaller Oxford-IIIT PET DATASET data set, so that the acquisition cost of training samples is reduced, and the training efficiency of the model is improved, wherein the specific network structure of the initial model is explained by combining a model application reasoning process in the following flow.
Step S22, feature extraction is carried out on the pet data by utilizing a feature processing module of the pet behavior recognition model, and pet features corresponding to the pet data are obtained.
In one embodiment, the feature processing module comprises at least one set of sub-modules, wherein each set of sub-modules comprises a downsampling sub-module and a joint processing sub-module comprising a position coding unit and a multi-head processing unit, wherein: the position coding unit module is used for carrying out dynamic position coding on each characteristic point in input data to obtain position coding information of each characteristic point (or called key points, such as bone key points), wherein the input data comprises the pet data; and the multi-head processing unit carries out global relative position coding on all the characteristic points through the feedforward neural network based on the position coding information of each characteristic point.
In one embodiment, the downsampling submodule can be realized through pooling operation (such as maximum pooling or average pooling), can help a network extract important features from input data, and filter redundant or noise information, so that the model focuses more on features useful for tasks, and the generalization capability of the model is improved. The downsampling sub-module can also increase the receptive field of the network, so that the subsequent convolution kernel can learn more global information, and the model is facilitated to capture more extensive context information of input data, and the representation capacity of the input data is improved. Downsampling may help prevent the model from overfitting on the training data by reducing the number of parameters and complexity of the model.
In one embodiment, the joint processing submodule corresponds to a transducer module, and the position coding information obtained by the dynamic position coding (or embedding) method used by the joint processing submodule is a tensor created in the form of a trainable variable; the multi-head processing unit of the combined processing sub-module is a multi-head relation aggregation method, a learnable relation matrix can be designed, global dependence is built for each characteristic point, namely a token, and other characteristic points in a fixed local neighborhood in shallow layers, a self-attention mechanism self-attention is used in deep layers, global relative position coding can be carried out on all characteristic points of input data based on position coding information of each characteristic point based on a global feedforward neural network (feedforward neural network, FNN), and the application does not limit the specific structure of FNN specifically.
By using convolution to realize relative position coding, compared with absolute position coding, the resolution of an input picture can be dynamically adapted, compression or clipping is not needed by fixing the size of the input image, deformation of a picture can be avoided, and the accuracy of pet behavior recognition results is improved. In addition, animal skeleton key points are identified through the combination convolution and Transform method, so that the cost of Transform shallow coding features can be effectively reduced, the overall calculation amount of a network is reduced, the weight of the network can be reduced, and then real-time detection is realized.
In one embodiment, for example, as shown in fig. 3, a schematic structural diagram of a feature processing module according to an embodiment of the present application is provided. The feature processing module comprises 3 groups of sub-modules, wherein H represents the length of the pet image, W represents the width of the pet image, and C represents the number of channels of the pet image, the size of the input image is not limited, the input image can be any size, the number of channels is recommended to be an RGB image, and therefore, C can be 3,64,128,256 and other numerical values; the first downsampling submodule takes a convolution kernel of 4×4,64 (length×width, channel number), the second downsampling submodule takes a convolution kernel of 2×2,128, and the third downsampling submodule takes a convolution kernel of 2×2,256, for example, if the dimension of the input pet image is h×w×c=256×256×3, the dimension of the pet feature obtained by the feature processing module is 16×16×256.
For example, fig. 4 is a schematic diagram of an overall module structure of a pet behavior recognition method according to an embodiment of the present application. The feature processing module is used for carrying out feature processing on the pet data, so that the pet feature X of the pet data can be obtained, and the pet feature X is analyzed in multiple aspects in the subsequent process, so that a final pet behavior recognition result is obtained.
Step S23, the species classification module of the pet behavior recognition model is utilized to classify the species of the pet features, and the pet category corresponding to the pet data is obtained.
In one embodiment, the species classification module includes a Multi-Layer Perceptron (MLP). The MLP is a classifier based on an artificial neural network, and the MLP may include an input layer, one or more hidden layers, and an output layer, where each layer includes a plurality of neurons connected by weights, each neuron receives an input signal from a previous layer of neurons, generates an output signal by an activation function (e.g., a sigmoid function, a tanh function, a ReLU function, etc.), and transmits the output signal to a next layer of neurons, thereby obtaining a pet class corresponding to pet data.
The MLP adds complexity to the model by introducing hidden layers, thereby being capable of fitting more complex functional relationships, wherein the number of hidden layers and the number of neurons per layer can be adjusted according to the requirements of specific tasks, and the application is not particularly limited to this. During the training of the MLP, a back propagation algorithm may be used to update the connection weights between neurons in the MLP to optimize the performance of the model by minimizing the loss function.
By determining the pet category, the result of animal species identification can be used as priori information, and the corresponding species can be matched in the subsequent pet behavior identification process, so that the pet behavior identification result is more accurate.
Step S24, based on the pet category, carrying out emotion recognition on the pet characteristics by utilizing a face detection and emotion recognition module of the pet behavior recognition model to obtain an emotion category corresponding to the pet data.
In one embodiment, the face detection and emotion recognition module includes a target detection network, a parallel convolution network, a fusion module, a classification function (e.g., softmax function, sigmoid function, etc.), wherein: the target detection network performs face detection on the pet characteristics and determines face areas in the pet characteristics; based on the channel attention, the spatial attention and DropBlock algorithm, the parallel convolution network carries out convolution processing on the face area to obtain two one-dimensional feature vectors; the fusion module fuses the two one-dimensional feature vectors by using a bilinear difference algorithm to obtain fused vectors; the classification function performs emotional classification on the fused vectors to obtain the emotional categories (e.g., anxiety, peace, comfort, vitality, pain, happiness, hunger, etc.).
In one embodiment, a channel attention mechanism is used to determine key channels of the core region; spatial attention mechanisms are used to determine core regions in the facial region (core regions of animal facial expressions, e.g., regions of eyes, mouth, etc.) and non-core regions outside of core regions; the parallel convolution network further comprises a localized attention mechanism for increasing the attention weight of the core region; the DropBlock algorithm is used to reduce the attention weight of the non-core region. Different attention weights are distributed to the core area and the non-core area through an attention mechanism, so that the model performance is focused on emotion detection of the core area, and the use efficiency of the model performance is improved.
In one embodiment, for example, as shown in fig. 5, a schematic structure diagram of a face detection and emotion recognition module according to an embodiment of the present application is provided. The first of the two parallel convolution networks uses channel attention and spatial attention to strengthen the core area, and uses DropBlock algorithm for the non-core area; the second convolution network of the two parallel convolution networks adopts a common resnet network, which is because the depth network has the defect of information loss in the characteristic transportation process, and the complete characteristic is difficult to accurately obtain, so that gradient disappearance or explosion is easy to occur, and therefore, the restnet network comprising quick links is added to compensate the characteristic information loss condition of the first convolution network in the characteristic learning process, so that the characteristic supplementing effect is achieved.
In one embodiment, a classification function (e.g., softmax function, sigmoid function, etc.) may obtain a mood prediction result (e.g., a result represented by a one-dimensional vector) based on the fusion vector, and may use one-hot encoding to classify the pet data into which category the category is encoded as 1, and the other is 0, so as to obtain a mood category corresponding to the pet data according to the category encoded as 1.
According to the method, the emotion recognition of the animal is added into the animal behavior recognition category, the emotion recognition is carried out according to the face of the animal, and the animal behavior mode can be better and accurately analyzed by combining the emotion characteristics of the animal.
Step S25, carrying out gesture estimation on the pet features by using a gesture estimation module of the pet behavior recognition model to obtain coordinate information of each feature point corresponding to the pet data.
In one embodiment, the gesture estimation module performs decoupling on the pet feature in two preset directions to obtain a set of vectors corresponding to each feature point (for example, feature points corresponding to the tip of the tongue, the tip of the claw, the tail end, the left ear, the right ear, and the like) in the pet feature; and a multi-head thermodynamic diagram (heatmap) sub-module in the gesture estimation module determines coordinate information of each feature point according to the group of vectors corresponding to each feature point, wherein the coordinate information can be understood as position coordinates corresponding to the feature point in the pet image.
In one embodiment, such as shown in fig. 4, the pose estimation module may decouple the data feature X in two directions (including horizontal X-direction and vertical y-direction) to obtain a set of vectors (including X-vector and y-vector, such as the vector corresponding to the first feature point in fig. 4) corresponding to each feature point (n in number)). By decoupling the coordinates when predicting the key points, the calculation amount can be simplified, and the quantization error can be effectively reduced.
In one embodiment, the loss function corresponding to the pose estimation module includes a first loss function including KL divergences (Kullback-Leibler Divergence, KLD) between all vectors corresponding to all feature points; the corresponding loss function of the gesture estimation module further comprises a second loss function, wherein the second loss function comprises a mean square error (Mean Squared Error, MSE) between the coordinate information of each feature point and the true annotation coordinate.
Step S26, based on the pet category, the emotion category and the coordinate information of each feature point, determining a behavior category corresponding to the pet data by using a comprehensive behavior analysis module of the pet behavior recognition model, and taking the behavior category as the pet behavior recognition result.
In one embodiment, the comprehensive behavior analysis module includes a classification function, which can classify behavior categories of pets in different categories based on the pet category, the emotion category, and the coordinate information of each feature point, so as to obtain pet behavior recognition results (such as bow back, hair tidying, eating, sleeping, running, standing, lying, etc.).
In one embodiment, the method further comprises: if the pet data is a pet image, displaying the pet behavior recognition result and feature points (such as skeleton key points) of the pet in the pet feature in the pet image; or if the pet data is a historical pet video, displaying the pet behavior identification result and the characteristic points of the pet in each video frame in the historical pet video; or if the pet data is a real-time pet video, displaying the pet behavior recognition result and the characteristic points of the pet in the currently displayed video frame. Specifically, the pet behavior recognition result can be displayed by using text description or the generated corresponding cartoon expression of the pet, which is not particularly limited by the application.
In one embodiment, a plurality of continuous video frames corresponding to the same pet behavior recognition result can be determined, and the plurality of continuous video frames are synthesized into one video to perform complete output and display of the final gesture estimation recognition result.
According to the pet behavior recognition method provided by the embodiment of the application, the advantages of convolution and transformation structure are combined, so that the whole and partial characteristics of the image can be considered, the analysis accuracy of animal behavior recognition based on the image is improved, and the calculation amount of network reasoning is reduced. In addition, by adding facial emotion recognition of the pet, the pet behavior can be better analyzed in combination with the emotion characteristics of the pet. And the pet is not required to be worn by a sensor, additional accessories are not required to be purchased, and the life habit of the pet can not be influenced.
Fig. 6 is a block diagram of a pet behavior recognition device according to an embodiment of the present application.
In some embodiments, the pet behavior recognition device 40 may include a plurality of functional modules comprised of computer program segments. The computer program of the individual program segments in the pet behavior recognition device 40 may be stored in a memory of the electronic device and executed by at least one processor to perform the pet behavior recognition functions (described in detail with respect to fig. 2).
In this embodiment, the pet behavior recognition device 40 may be divided into a plurality of functional modules according to the functions performed by the device. The functional module may include: an input module 401, a feature processing module 402, a species classification module 403, a mood recognition module 404, a gesture estimation module 405, and a comprehensive behavior analysis module 406. The module referred to in the present application refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In this embodiment, regarding the functional implementation of each module in the pet behavior recognition device 40, reference may be made to the above limitation of the pet behavior recognition method, and the description thereof will not be repeated.
The input module 401 is configured to input pet data into a pre-trained pet behavior recognition model.
The feature processing module 402 is configured to perform feature extraction on the pet data by using the feature processing module of the pet behavior recognition model, so as to obtain pet features corresponding to the pet data.
The species classification module 403 is configured to classify the species of the pet feature by using the species classification module of the pet behavior recognition model, so as to obtain a pet category corresponding to the pet data.
The emotion recognition module 404 is configured to perform emotion recognition on the pet features by using the face detection and emotion recognition module of the pet behavior recognition model based on the pet category, so as to obtain an emotion category corresponding to the pet data.
The gesture estimation module 405 is configured to perform gesture estimation on the pet features by using the gesture estimation module of the pet behavior recognition model, so as to obtain coordinate information of each feature point corresponding to the pet data.
The comprehensive behavior analysis module 406 is configured to determine, based on the pet category, the emotion category, and the coordinate information of each feature point, a behavior category corresponding to the pet data by using the comprehensive behavior analysis module of the pet behavior recognition model, and use the behavior category as the pet behavior recognition result.
Embodiments of the present application also provide a computer readable storage medium, where a computer program is stored, where the computer program includes program instructions, where a method implemented by the program instructions when executed may refer to a method in the foregoing embodiments of the present application.
The computer readable storage medium may be an internal memory of the electronic device according to the above embodiment, for example, a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the electronic device.
In some embodiments, the computer readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.
Claims (10)
1. A method of pet behavior identification, the method comprising:
Inputting pet data into a pre-trained pet behavior recognition model;
extracting features of the pet data by utilizing a feature processing module of the pet behavior recognition model to obtain pet features corresponding to the pet data;
the species classification module of the pet behavior recognition model is utilized to classify the species of the pet characteristics, and the pet category corresponding to the pet data is obtained;
based on the pet category, carrying out emotion recognition on the pet characteristics by utilizing a face detection and emotion recognition module of the pet behavior recognition model to obtain an emotion category corresponding to the pet data;
carrying out gesture estimation on the pet characteristics by using a gesture estimation module of the pet behavior recognition model to obtain coordinate information of each characteristic point corresponding to the pet data;
and determining a behavior category corresponding to the pet data by utilizing a comprehensive behavior analysis module of the pet behavior recognition model based on the pet category, the emotion category and the coordinate information of each feature point, and taking the behavior category as the pet behavior recognition result.
2. The pet behavior recognition method of claim 1, wherein the pet data comprises a pet image and/or a pet video.
3. The pet behavior recognition method of claim 2, wherein prior to entering pet data into the pre-trained pet behavior recognition model, the method further comprises preprocessing the pet data, the preprocessing comprising:
if the pet data is the pet video, extracting video frames of the pet video, and taking the extracted video frames as pet images;
Subjecting the pet image to one or more of the following: denoising processing, brightness enhancement processing, contrast enhancement processing, rotation correction processing, frequency domain conversion and high-frequency information enhancement processing.
4. The pet behavior recognition method of claim 1, wherein the training method of the pet behavior recognition model comprises a supervised pre-training method and a supervised optimization method.
5. The pet behavior recognition method of claim 1, wherein the feature processing module comprises at least one set of sub-modules, wherein each set of sub-modules comprises a downsampling sub-module and a joint processing sub-module, the joint processing sub-module comprising a position coding unit and a multi-head processing unit, wherein:
the position coding unit module performs dynamic position coding on each feature point in input data to obtain position coding information of each feature point, wherein the input data comprises the pet data;
And the multi-head processing unit carries out global relative position coding on all the characteristic points through the feedforward neural network based on the position coding information of each characteristic point.
6. The pet behavior recognition method of claim 1, wherein the species classification module comprises a multi-layer perceptron.
7. The pet behavior recognition method of claim 1, wherein the face detection and emotion recognition module comprises a target detection network, a parallel convolution network, a fusion module, a classification function, wherein:
the target detection network performs face detection on the pet characteristics and determines face areas in the pet characteristics;
based on the channel attention, the spatial attention and DropBlock algorithm, the parallel convolution network carries out convolution processing on the face area to obtain two one-dimensional feature vectors;
the fusion module fuses the two one-dimensional feature vectors by using a bilinear difference algorithm to obtain fused vectors;
And the classification function carries out emotion classification on the fused vectors to obtain the emotion classification.
8. The pet behavior recognition method according to claim 1, wherein the gesture estimation module performs decoupling in two preset directions on the pet features to obtain a set of vectors corresponding to each feature point in the pet features;
And a multi-head thermodynamic diagram sub-module in the gesture estimation module determines coordinate information of each feature point according to a group of vectors corresponding to each feature point.
9. The pet behavior recognition method of claim 2, further comprising displaying the pet behavior recognition result, comprising:
If the pet data is a pet image, displaying the pet behavior recognition result and feature points of the pet in the pet feature in the pet image; or alternatively
If the pet data is a historical pet video, displaying the pet behavior identification result and the characteristic points of the pet in each video frame in the historical pet video; or alternatively
And if the pet data is a real-time pet video, displaying the pet behavior identification result and the characteristic points of the pet in a currently displayed video frame.
10. An electronic device comprising a processor and a memory, wherein the processor is configured to implement the pet behavior recognition method of any one of claims 1 to 9 when executing a computer program stored in the memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410216799.1A CN118097779A (en) | 2024-02-27 | 2024-02-27 | Pet behavior recognition method and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410216799.1A CN118097779A (en) | 2024-02-27 | 2024-02-27 | Pet behavior recognition method and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118097779A true CN118097779A (en) | 2024-05-28 |
Family
ID=91147281
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410216799.1A Pending CN118097779A (en) | 2024-02-27 | 2024-02-27 | Pet behavior recognition method and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118097779A (en) |
-
2024
- 2024-02-27 CN CN202410216799.1A patent/CN118097779A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Non-iterative and fast deep learning: Multilayer extreme learning machines | |
Dalvi et al. | A survey of ai-based facial emotion recognition: Features, ml & dl techniques, age-wise datasets and future directions | |
US20230082173A1 (en) | Data processing method, federated learning training method, and related apparatus and device | |
Iqbal et al. | Generative adversarial network for medical images (MI-GAN) | |
CN112446270B (en) | Training method of pedestrian re-recognition network, pedestrian re-recognition method and device | |
WO2021190296A1 (en) | Dynamic gesture recognition method and device | |
Zhang et al. | Multimodal learning for facial expression recognition | |
WO2022052601A1 (en) | Neural network model training method, and image processing method and device | |
CN112800903B (en) | Dynamic expression recognition method and system based on space-time diagram convolutional neural network | |
US12039440B2 (en) | Image classification method and apparatus, and image classification model training method and apparatus | |
CN111383637A (en) | Signal processing device, signal processing method and related product | |
CN110020582B (en) | Face emotion recognition method, device, equipment and medium based on deep learning | |
WO2021018245A1 (en) | Image classification method and apparatus | |
Ruan et al. | Adaptive deep disturbance-disentangled learning for facial expression recognition | |
WO2021047587A1 (en) | Gesture recognition method, electronic device, computer-readable storage medium, and chip | |
CN113326930A (en) | Data processing method, neural network training method, related device and equipment | |
WO2022179606A1 (en) | Image processing method and related apparatus | |
Zhang et al. | IL-GAN: Illumination-invariant representation learning for single sample face recognition | |
WO2022156475A1 (en) | Neural network model training method and apparatus, and data processing method and apparatus | |
WO2022111387A1 (en) | Data processing method and related apparatus | |
CN112164002A (en) | Training method and device for face correction model, electronic equipment and storage medium | |
CN110968235B (en) | Signal processing device and related product | |
WO2022227024A1 (en) | Operational method and apparatus for neural network model and training method and apparatus for neural network model | |
Jagadeesh et al. | Dynamic FERNet: Deep learning with optimal feature selection for face expression recognition in video | |
Ni et al. | Diverse local facial behaviors learning from enhanced expression flow for microexpression recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |