CN114898466A - Video motion recognition method and system for smart factory - Google Patents
Video motion recognition method and system for smart factory Download PDFInfo
- Publication number
- CN114898466A CN114898466A CN202210521070.6A CN202210521070A CN114898466A CN 114898466 A CN114898466 A CN 114898466A CN 202210521070 A CN202210521070 A CN 202210521070A CN 114898466 A CN114898466 A CN 114898466A
- Authority
- CN
- China
- Prior art keywords
- video
- factory
- position information
- worker
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 230000009471 action Effects 0.000 claims abstract description 109
- 230000006399 behavior Effects 0.000 claims abstract description 64
- 238000012549 training Methods 0.000 claims abstract description 55
- 238000001514 detection method Methods 0.000 claims abstract description 42
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 36
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000012544 monitoring process Methods 0.000 claims abstract description 20
- 238000013135 deep learning Methods 0.000 claims abstract description 9
- 239000012634 fragment Substances 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 26
- 238000013528 artificial neural network Methods 0.000 claims description 25
- 239000011159 matrix material Substances 0.000 claims description 25
- 238000007781 pre-processing Methods 0.000 claims description 17
- 238000002372 labelling Methods 0.000 claims description 15
- 238000012795 verification Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 12
- 238000005516 engineering process Methods 0.000 claims description 9
- 238000005520 cutting process Methods 0.000 claims description 6
- 230000008014 freezing Effects 0.000 claims description 6
- 238000007710 freezing Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 6
- 238000004891 communication Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 238000013461 design Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 8
- 238000010586 diagram Methods 0.000 description 7
- 230000005540 biological transmission Effects 0.000 description 5
- 238000003062 neural network model Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention relates to the technical field of deep learning action recognition, in particular to a video action recognition method and a video action recognition system for an intelligent factory, wherein the recognition method specifically comprises the following steps: s101, a factory video data fragment generating step S102 and a factory worker operation action data set generating step; s103, generating a factory operation target detection data set; s104, modeling a factory worker action recognition model; s105, a factory worker position information coding network training step; s106, building a factory worker behavior recognition algorithm; s107, behavior recognition input step; s108, behavior recognition and output; the specific system comprises: the system comprises a model training program, a label file generating program, model training electronic equipment, a processing and calculating center, a server side and a video monitoring terminal; compared with the traditional action recognition method which usually only adopts RGB characteristics to express the video, the method can greatly eliminate the influence of other information when the video characteristics are obtained, thereby improving the action recognition effect of factory workers.
Description
Technical Field
The invention relates to the technical field of deep learning action recognition, in particular to a video action recognition method and system for an intelligent factory.
Background
The work of factory worker action recognition mainly focuses on the definition of the actions of workers and the production of data sets and modeling of action recognition models. The current mainstream methods are as follows: firstly, an image recognition method is adopted, and a picture is input to recognize the state of a worker at a certain moment so as to judge the action of the worker. And secondly, inputting a frame sequence into a network recognition action by adopting a video classification method. And thirdly, extracting information related to the action by adopting a sensor, and judging by combining a deep learning method.
The prior art has the following defects: (1) since many operations cannot be determined from a state at a moment and a time-series relationship must be combined, an operation recognition method based on an image recognition method cannot handle such an object. (2) The recognition of motion relies heavily on the recognition of scenes, motion-related items, rather than the motion itself. Therefore, when information useful for motion recognition cannot be obtained from the scene and the related item itself, the recognition result is poor. (3) The current deep learning method has not good enough level for modeling time sequence information and can not accurately model different time sequence relations. (4) At present, most of research on motion recognition adopts a supervised learning method to detect motion, and higher accuracy and recall rate can be obtained on a data set. However, the method is limited by the cost required by labeling video data, and can not cover massive dynamic behaviors in a real scene, so that the actual landing effect is poor (5).
Disclosure of Invention
In order to solve the above problems, the present invention provides a video motion recognition method and system for an intelligent factory.
A video motion recognition method for an intelligent factory specifically comprises the following steps:
s101, a factory video data fragment generation step: processing and processing the video of the factory worker operation by utilizing an image preprocessing technology, and converting all original videos into usable factory worker operation data fragments;
s102, a factory worker operation action data set generation step: labeling the factory worker operation data segments for classification, and making the factory worker operation data segments into data for learning of an action recognition model;
s103, a factory work target detection data set generation step: outputting the worker operation video into frames, sampling the pictures, and performing frame selection and marking on the targets of people, a workbench and operation workpieces.
S104, a modeling method of a factory worker action recognition model comprises the following steps: after frame sampling, cropping and data enhancement are carried out on data of a factory worker operation action data set, the data are converted into a standard data sequence acceptable by a model and input into a 3D-ResNet deep neural network suitable for video understanding to train the model;
s105, a factory worker position information coding network training step: the method comprises the steps that a factory operation target detection data set is subjected to scaling, normalization type preprocessing, turning, random position and mosaic type data enhancement methods and then input into a target detection algorithm for training, so that position information of workers, an operation table and an operated workpiece can be provided, and then the position information is embedded into a multi-channel matrix input position coding branch for training;
s106, building a factory worker behavior recognition algorithm: splicing the trained action recognition model and the depth characteristics output by the tail part of the position information coding model, enabling the action recognition network and the position information coding network to respectively form an action recognition branch and a position information coding branch to form the depth characteristics which contain position information codes and reflect the behaviors of workers, inputting a full connection layer and freezing the network parameters before training to obtain a complete worker behavior recognition model;
s107, behavior recognition input step: inputting a video needing to identify the behavior of a worker into a factory worker behavior identification model;
s108, behavior recognition and output: and obtaining a behavior prediction probability vector based on the trained factory worker behavior recognition model, comparing the behavior category vectors to obtain a behavior recognition result, and sending the recognition result to a server in a socket communication mode.
The overall training frame of the algorithm belongs to supervised learning, and the core idea of the supervised learning is to label data, optimize model parameters according to the output result of input data and the label during training, enable the algorithm to find the corresponding relation between the input and the label, learn an optimal model from a data set, and predict the corresponding label when facing data without the label.
The step S101 of processing and processing the video of the factory worker operation specifically includes: preprocessing, labeling and classifying the monitoring video stream data, and converting the monitoring video stream into a worker action identification data set.
The step of generating the factory video data segment in step S102 is specifically as follows: the method comprises the steps of firstly utilizing an image cutting technology to cut a video frame to a working area of a worker so as to eliminate the influence of other areas, and firstly utilizing a video clipping technology to clip a segment of a factory worker working video by taking an action starting point as a starting point and an action ending point as a finishing point according to action types.
The labeling specification of the workpiece target detection data in step S103 is as follows: outputting a worker operation video to form a frame, sampling pictures, selecting a workpiece operated by a person, not marking all workpieces in the picture, and only detecting the workpiece operated by the worker to avoid inputting noise information of irrelevant actions to a neural network.
The neural network for identifying factory worker behaviors in the step S104 is composed of two neural network branches, wherein one neural network branch is a 3D-ResNet-based classical deep learning action identification algorithm and is composed of a 3D convolution kernel, and the neural network can move in a time dimension, extract time sequence characteristics and directly acquire continuous frame sequence identification actions; and the other is a depth position information coding network, frame sequence position information extracted by a target detection algorithm is embedded into a four-dimensional matrix, then a depth position information coding branch is input, and finally the action modeling depth characteristic output by the action identification branch and the depth position code output by the position information coding branch are spliced and input into a full-connection layer for prediction.
The specific design steps of the frame sequence position information characteristic embedded matrix for the factory worker action recognition target detection characteristics in the step S105 are as follows: the method comprises the steps of carrying out target detection by adopting n frames sampled on a video clip to be detected, embedding detection information on each frame into a matrix of a k channel, wherein the number of k channels depends on the number of types of targets concerned by motion recognition, each channel is a matrix with the size of 1-4, information of each target detection frame is contained in each channel, and each channel represents position information of one type of targets respectively.
A video motion recognition system for a smart factory comprises:
the model training program is used for inputting a data set file to the action recognition branch to obtain action information depth vectors and inputting the data set file to the YOLO target detection network to obtain a target position information matrix and then inputting the position information coding network to output position information depth codes;
a label file generation program constructs the detailed information of the data set in a dictionary file form so as to facilitate the training module to use at any time;
the model training electronic equipment is used for saving the model parameters obtained by the model training verification cycle as files and outputting training verification data as log files;
the video monitoring terminal is used for acquiring data;
the processing and calculating center is used for processing and identifying the transmitted video data and then transmitting the video data to the next terminal;
and the server is used for storing and using the identification result of the transmitted data.
The model training program specifically comprises:
the video sampling module samples frames in an input video at equal intervals or randomly at equal intervals;
the image preprocessing module is used for converting the format of a video frame, cutting a picture, scaling the size, labeling the category and the like, and converting the original video segment into a worker behavior data set for model training;
the action recognition network module is used for converting the input worker behavior video data set result neural network into a behavior depth feature vector and providing appearance and time sequence related information in the video for next worker behavior recognition;
the target detection network module is used for detecting the types and positions of targets interested by the algorithm in the video frames and providing the target types and positions to the position information coding module of the next step;
the position information coding module is used for embedding the target type and the position information output by the target detection module into a position information matrix and converting the target type and the position information into a position information depth characteristic vector through a position information depth coding network;
and the joint network training module is used for freezing the parameters of the designated module, and obtaining the prediction probability of the action of the worker by the behavior depth characteristic vector provided by the action recognition module and the joint depth information vector spliced by the position information depth characteristic vector provided by the position information coding module.
The model training electronic equipment comprises a storage for storing model parameters, a data set and a labeling file, a processor for training a model algorithm, a model training algorithm storage and a video recording terminal.
The processing and computing center comprises a behavior recognition network model file library, a computing center, a memory and a server; the identification network model file library stores the parameter data of the deep neural network to be used by a calculation center; the computing center carries out preprocessing, feature extraction and behavior prediction on video data transmitted by the video monitoring terminal, and prediction results are stored in a log file of the memory and are simultaneously transmitted to the server side for use.
The invention has the beneficial effects that: the video action recognition method and system for the intelligent factory can solve the problems of specificity and difficult time sequence modeling in the field of factory action recognition, and achieve about 95% of worker action recognition effect on a verification set; compared with the traditional action recognition method which usually only adopts RGB characteristics to express the video, the method can greatly eliminate the influence of noise and redundant information when the video characteristics are obtained, thereby improving the action recognition effect of factory workers.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a block diagram illustrating an overall schematic diagram of a video motion recognition method for an intelligent factory according to an embodiment of the present invention;
FIG. 2 is a block diagram of a method for identifying video motion of an intelligent factory according to an embodiment of the present invention;
FIG. 3 illustrates a video motion recognition network model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a model training procedure according to an embodiment of the present invention;
FIG. 5 is a schematic structure of a markup dictionary file according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of model training electronics in accordance with an embodiment of the present invention;
FIG. 7 is a schematic diagram of a plant worker action recognition model execution electronics in accordance with an embodiment of the present invention;
fig. 8 is a schematic diagram of detection information and a matrix according to an embodiment of the invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further explained below.
As shown in fig. 1 to 8, a video motion recognition method for an intelligent factory specifically includes the following steps:
s101, a factory video data fragment generation step: processing and processing the video of the factory worker operation by utilizing an image preprocessing technology, and converting all original videos into usable factory worker operation data fragments;
s102, a factory worker operation action data set generation step: labeling the factory worker operation data segments for classification, and making the factory worker operation data segments into data for learning of an action recognition model;
s103, a factory work target detection data set generation step: outputting the worker operation video into frames, sampling the pictures, and performing frame selection and marking on the targets of people, a workbench and operation workpieces.
S104, a modeling method of a factory worker action recognition model comprises the following steps: after frame sampling, cropping and data enhancement are carried out on data of a factory worker operation action data set, the data are converted into a standard data sequence acceptable by a model and input into a 3D-ResNet deep neural network suitable for video understanding to train the model;
s105, a factory worker position information coding network training step: the method comprises the steps that a factory operation target detection data set is subjected to scaling, normalization type preprocessing, turning, random position and mosaic type data enhancement methods and then input into a target detection algorithm for training, so that position information of workers, an operation table and an operated workpiece can be provided, and then the position information is embedded into a multi-channel matrix input position coding branch for training;
s106, building a factory worker behavior recognition algorithm: splicing the trained action recognition model and the depth characteristics output by the tail part of the position information coding model, enabling the action recognition network and the position information coding network to respectively form an action recognition branch and a position information coding branch to form the depth characteristics which contain position information codes and reflect the behaviors of workers, inputting a full connection layer and freezing the network parameters before training to obtain a complete worker behavior recognition model;
s107, behavior recognition input step: inputting a video needing to identify worker behaviors into a factory worker behavior identification model;
s108, behavior recognition and output: the method comprises the steps of obtaining a behavior prediction probability vector based on a trained factory worker behavior recognition model, comparing behavior category vectors to obtain a behavior recognition result, and meanwhile sending the recognition result to a server in a socket communication mode, so that the real-time performance is higher, and the method is convenient to observe and detect at any time.
The video action recognition method and system for the intelligent factory can solve the problems of specificity and difficult time sequence modeling in the field of factory action recognition, and can achieve about 95% of worker action recognition effect on a verification set; compared with the traditional action recognition method which usually only adopts RGB characteristics to express the video, the method can greatly eliminate the influence of noise and redundant information when the video characteristics are obtained, thereby improving the action recognition effect of factory workers.
The step S101 of processing and processing the video of the factory worker operation specifically includes: the method comprises the steps of preprocessing, labeling and classifying monitoring video stream data, and converting the monitoring video stream into a worker action recognition data set.
The step of generating the factory video data segment in step S102 is specifically as follows: the method comprises the steps of firstly utilizing an image cutting technology to cut a video frame to a working area of a worker so as to eliminate the influence of other areas, and firstly utilizing a video clipping technology to clip a segment of a factory worker working video by taking an action starting point as a starting point and an action ending point as a finishing point according to action types.
The labeling specification of the workpiece target detection data in step S103 is as follows: the method comprises the steps of outputting a worker operation video to form a frame, sampling pictures, selecting a workpiece operated by a person, not marking all workpieces in the frame, and only detecting the workpiece operated by the worker so as to avoid inputting noise information of irrelevant actions to a neural network.
The neural network for identifying factory worker behaviors in the step S104 is composed of two neural network branches, one of which is a 3D-ResNet-based classical deep learning action identification algorithm and is composed of 3D convolution kernels, and can move in a time dimension, extract timing characteristics, and directly acquire continuous frame sequence identification actions. The other is a depth position information coding network, frame sequence position information extracted by a target detection algorithm is embedded into a four-dimensional matrix, and then a depth position information coding branch is input. And finally, splicing the motion modeling depth features output by the motion recognition branch and the depth position codes output by the position information coding branch, and inputting the spliced motion modeling depth features and the depth position codes into a full-connection layer for prediction.
The specific design steps of the frame sequence position information characteristic embedded matrix for the factory worker action recognition target detection characteristics in the step S105 are as follows: the method comprises the steps of performing target detection by adopting n frames sampled on a video clip to be detected, embedding detection information on each frame into a matrix of a k channel, wherein the number of k channels depends on the number of types of targets concerned by motion recognition, each channel is a matrix with the size of 1-4 and contains information of each target detection frame, and each channel represents position information of one type of targets respectively, as shown in figure 8.
The video is preprocessed in the step S104 and the step S105, so that the recognition is more accurate, the recognition efficiency is higher, and the specificity is strong.
The function of the step S107 for the plant worker action recognition model execution device is: inputting a trained model, inputting a video stream transmitted by a video monitoring terminal, and inputting all model parameters; and obtaining a worker behavior prediction result of the input factory monitoring video stream based on the trained factory worker action recognition system.
As shown in fig. 3, the video motion recognition network model is a video motion recognition network model, and the motion recognition neural network model is composed of 3 sub-neural network modules, wherein the two sub-neural network modules include two neural network branches, one of which is a 3D-ResNet-based classical deep learning motion recognition algorithm and is composed based on a 3D convolution kernel, and the sub-neural network model can move in a time dimension t, extract a time sequence feature, and directly acquire a continuous frame sequence recognition motion; the other is a depth position information coding network, frame sequence position information extracted by a YOLOv3 target detection algorithm is embedded into a four-dimensional matrix, and then a depth position information coding branch is input; YOLOv3 adopts Darknet-53 as a basic network, has very good image recognition effect, the whole network mainly comprises 5 groups of residual blocks, and the size of a prior frame is obtained by adopting K-means clustering; because the complexity of the position information is far lower than that of the image, a characteristic connection mode of a full connection layer can be adopted, so that the time receptive field of the position information coding branch can directly reach the size of the whole input sequence, the time sequence information in a long range can be conveniently obtained, and the time sequence modeling is easier; and finally, splicing the motion modeling depth features output by the motion recognition branch and the depth position codes output by the position information coding branch, and inputting the spliced motion modeling depth features and the depth position codes into a full-connection layer for prediction.
As shown in fig. 7, the step S108 includes, for the plant worker action recognition model execution electronic device: a video monitoring terminal; a video streaming interface; a behavior category library; a model input interface; a worker motion recognition model algorithm processor; a data storage; a display; an identification result transmission interface; and an action recognition model algorithm memory. After the equipment is powered on and the program of the worker action recognition algorithm is run through the processor, the following steps are executed:
transmitting a video stream from the video monitoring terminal in real time through a video stream transmission interface for receiving a worker action recognition algorithm;
acquiring trained models from a model input interface, and sending the models, the behavior category library and all model parameters to the text semantic feature extraction algorithm processor;
the worker action recognition algorithm processor carries out prediction recognition on the actions of factory workers based on a trained action recognition model, a YOLO target detection algorithm and a position information coding network;
the worker action recognition model algorithm processor outputs the prediction result of the worker action to a data memory and the recognition result transmission interface for a server program to take;
the display displays the worker action recognition results, the real-time video stream and the result transmission conditions of the recognition result transmission interface which are stored in the data storage.
A video motion recognition system for a smart factory comprises:
the model training program is used for inputting a data set file to the action recognition branch to obtain action information depth vectors and inputting the data set file to the YOLO target detection network to obtain a target position information matrix and then inputting the position information coding network to output position information depth codes;
a label file generation program, which constructs the detailed information of the data set in the form of a dictionary file to be convenient for a training module to use at any time, wherein a schematic structure of the label dictionary file is given in fig. 5;
the model training electronic equipment is used for saving the model parameters obtained by the model training verification cycle as files and outputting training verification data as log files;
the video monitoring terminal is used for acquiring data;
the processing and calculating center is used for processing and identifying the transmitted video data and then transmitting the processed and identified video data to the next terminal;
and the server is used for storing and using the identification result of the transmitted data.
The model training program specifically comprises:
the video sampling module samples frames in an input video at equal intervals or randomly at equal intervals;
the image preprocessing module is used for converting the format of a video frame, cutting a picture, scaling the size, labeling the category and the like, and converting the original video segment into a worker behavior data set for model training;
the action recognition network module is used for converting the input worker behavior video data set result neural network into a behavior depth feature vector and providing appearance and time sequence related information in the video for next worker behavior recognition;
the target detection network module is used for detecting the types and positions of targets interested by the algorithm in the video frames and providing the target types and positions to the position information coding module of the next step;
the position information coding module is used for embedding the target type and the position information output by the target detection module into a position information matrix and converting the position information matrix into a position information depth characteristic vector through a position information depth coding network;
and the joint network training module is used for freezing the parameters of the designated module, and obtaining the prediction probability of the action of the worker by the behavior depth characteristic vector provided by the action recognition module and the joint depth information vector spliced by the position information depth characteristic vector provided by the position information coding module.
The model training program is used for executing a model training verification cycle, inputting a data set file to a motion recognition branch to obtain a motion information depth vector and inputting the data set file to a YOLO target detection network to obtain a target position information matrix, inputting a position information coding network to output position information depth codes, splicing and inputting the motion information depth vector and the position information depth codes into a full connection layer to obtain an output prediction result, judging the quality of the result through a Cross EntropyLoss discriminator, and continuously optimizing parameters through an SGD optimizer.
And outputting all models and model parameters thereof to a model output interface.
And storing the obtained model parameters of the model training verification cycle as files, and outputting training verification data as log files.
The model training electronic equipment comprises a storage for storing model parameters, a data set and a label file, a processor for training a model algorithm, a model training algorithm storage and a video recording terminal.
After the model training electronic equipment is powered on and the processor runs the program of the algorithm model, the following steps are executed:
executing a model building script, reproducing a model, including an action identification branch and a position information coding branch, and building an initial model structure;
reading a pre-training model file in a memory, extracting model parameters and giving the model parameters to obtain an initial action recognition model;
generating a discriminator for discriminating the quality of the output result of the model and the training level of the model;
running a configuration script in a training stage, determining a video preprocessing method in the training stage, a data enhancement mode of a video, loading a data set example, determining the type and parameter setting of an optimizer and the setting type of a training log;
and running a configuration script of the verification stage, determining a video preprocessing method of the verification stage, loading a data set example of the verification stage and setting the verification log.
The processing and computing center is connected with the server and the external monitoring and blocking communication, and the processing center obtains video data transmitted by the video monitoring terminal, processes and identifies the video data, and transmits the video data to the server for storing and using an identification result.
The processing and computing center comprises a behavior recognition network model file library, a computing center, a memory and a server; the identification network model file library stores the parameter data of the deep neural network to be used by a calculation center; the computing center carries out preprocessing, feature extraction and behavior prediction on video data transmitted by the video monitoring terminal, and prediction results are stored in a log file of the memory and are simultaneously transmitted to the server side for use.
Compared with the prior art, the invention has the following three innovation points.
Aiming at the problem of poor training result caused by the particularity of the field of the current factory worker action recognition and the lack of the labeling data, a whole set of specification and workflow from monitoring video stream to data set production and the format and specification of the labeling file are provided.
Aiming at the characteristic that the traditional action recognition algorithm generally only utilizes video rgb flow, target position information based on target detection is introduced, and new modal information is provided for action recognition.
The invention provides a mixed neural network model based on deep learning action recognition and target recognition and a video action recognition method facing a smart factory, which comprises an action recognition branch and a target position and type information coding branch.
The foregoing shows and describes the general principles, principal features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are merely illustrative of the principles of the invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (10)
1. A video motion recognition method for an intelligent factory is characterized by comprising the following steps: the method specifically comprises the following steps:
s101, a factory video data fragment generation step: processing and processing the video of the factory worker operation by utilizing an image preprocessing technology, and converting all original videos into usable factory worker operation data fragments;
s102, generating a factory worker operation action data set: marking labels on the operation data segments of factory workers for classification, and making the operation data segments of the factory workers into data for learning of an action recognition model;
s103, a factory work target detection data set generation step: outputting a worker operation video to form a frame, sampling a picture, and performing frame selection and marking on a person, a workbench and an operation workpiece type target;
s104, a modeling method of a factory worker action recognition model comprises the following steps: after frame sampling, cropping and data enhancement are carried out on data of a factory worker operation action data set, the data are converted into a standard data sequence acceptable by a model and input into a 3D-ResNet deep neural network suitable for video understanding to train the model;
s105, a factory worker position information coding network training step: the method comprises the steps that a factory operation target detection data set is subjected to scaling, normalization type preprocessing, turning, random position and mosaic type data enhancement methods and then input into a target detection algorithm for training, so that position information of workers, an operation table and an operated workpiece can be provided, and then the position information is embedded into a multi-channel matrix input position coding branch for training;
s106, building a factory worker behavior recognition algorithm: splicing the trained action recognition model and the depth characteristics output by the tail part of the position information coding model, enabling the action recognition network and the position information coding network to respectively form an action recognition branch and a position information coding branch to form the depth characteristics which contain position information codes and reflect the behaviors of workers, inputting a full connection layer and freezing the network parameters before training to obtain a complete worker behavior recognition model;
s107, behavior recognition input step: inputting a video needing to identify worker behaviors into a factory worker behavior identification model;
s108, behavior recognition and output: and obtaining a behavior prediction probability vector based on the trained factory worker behavior recognition model, comparing the behavior category vectors to obtain a behavior recognition result, and sending the recognition result to a server in a socket communication mode.
2. The method of claim 1, wherein the video motion recognition method for intelligent factory comprises: the step S101 of processing and processing the video of the factory worker operation specifically includes: preprocessing, labeling and classifying the monitoring video stream data, and converting the monitoring video stream into a worker action identification data set.
3. The method of claim 1, wherein the video motion recognition method for intelligent factory comprises: the step of generating the factory video data segment in step S102 is specifically as follows: the method comprises the steps of firstly utilizing an image cutting technology to cut a video frame to a working area of a worker so as to eliminate the influence of other areas, and firstly utilizing a video clipping technology to clip a segment of a factory worker working video by taking an action starting point as a starting point and an action ending point as a finishing point according to action types.
4. The method of claim 1, wherein the video motion recognition method for intelligent factory comprises: the labeling specification of the workpiece target detection data in step S103 is as follows: outputting a worker operation video to form a frame, sampling pictures, selecting a workpiece operated by a person, not marking all workpieces in the picture, and only detecting the workpiece operated by the worker to avoid inputting noise information of irrelevant actions to a neural network.
5. The method of claim 1, wherein the video motion recognition method for intelligent factory comprises: the neural network for identifying factory worker behaviors in the step S104 is composed of two neural network branches, wherein one neural network branch is a 3D-ResNet-based classical deep learning action identification algorithm and is composed of a 3D convolution kernel, and the neural network can move in a time dimension, extract time sequence characteristics and directly acquire continuous frame sequence identification actions; and the other is a depth position information coding network, frame sequence position information extracted by a target detection algorithm is embedded into a four-dimensional matrix, then a depth position information coding branch is input, and finally the action modeling depth characteristic output by the action identification branch and the depth position code output by the position information coding branch are spliced and input into a full-connection layer for prediction.
6. The method of claim 1, wherein the video motion recognition method for intelligent factory comprises: the specific design steps of the frame sequence position information characteristic embedded matrix for the factory worker action recognition target detection characteristics in the step S105 are as follows: the method comprises the steps of carrying out target detection by adopting n frames sampled on a video clip to be detected, embedding detection information on each frame into a matrix of a k channel, wherein the number of k channels depends on the number of types of targets concerned by motion recognition, each channel is a matrix with the size of 1-4, information of each target detection frame is contained in each channel, and each channel represents position information of one type of targets respectively.
7. The system for identifying video motion of intelligent factory facing to any one of claims 1 to 6, wherein: the method comprises the following steps:
the model training program is used for inputting a data set file to the action recognition branch to obtain action information depth vectors and inputting the data set file to the YOLO target detection network to obtain a target position information matrix and then inputting the position information coding network to output position information depth codes;
a label file generation program constructs the detailed information of the data set in a dictionary file form so as to facilitate the training module to use at any time;
the model training electronic equipment is used for saving the model parameters obtained by the model training verification cycle as files and outputting training verification data as log files;
the video monitoring terminal is used for acquiring data;
the processing and calculating center is used for processing and identifying the transmitted video data and then transmitting the processed and identified video data to the next terminal;
and the server is used for storing and using the identification result of the transmitted data.
8. The system of claim 7, wherein: the model training program specifically comprises:
the video sampling module samples frames in an input video at equal intervals or randomly at equal intervals;
the image preprocessing module is used for converting the format of the video frame, cutting the picture, scaling the size, labeling the category and the like, and converting the original video segment into a worker behavior data set for model training;
the action recognition network module is used for converting the input worker behavior video data set result neural network into a behavior depth characteristic vector and providing appearance and time sequence type related information in a video for next worker behavior recognition;
the target detection network module is used for detecting the types and positions of targets interested by the algorithm in the video frames and providing the target types and positions to the position information coding module of the next step;
the position information coding module is used for embedding the target type and the position information output by the target detection module into a position information matrix and converting the target type and the position information into a position information depth characteristic vector through a position information depth coding network;
and the joint network training module is used for freezing the parameters of the designated module, and obtaining the prediction probability of the action of the worker by the behavior depth characteristic vector provided by the action recognition module and the joint depth information vector spliced by the position information depth characteristic vector provided by the position information coding module.
9. The system of claim 7, wherein: the model training electronic equipment comprises a storage for storing model parameters, a data set and a label file, a processor for training a model algorithm, a model training algorithm storage and a video recording terminal.
10. The system of claim 7, wherein: the processing and computing center comprises a behavior recognition network model file library, a computing center, a memory and a server; the identification network model file library stores the parameter data of the deep neural network to be used by a calculation center; the computing center carries out preprocessing, feature extraction and behavior prediction on video data transmitted by the video monitoring terminal, and prediction results are stored in a log file of the memory and are simultaneously transmitted to the server side for use.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210521070.6A CN114898466B (en) | 2022-05-13 | 2022-05-13 | Intelligent factory-oriented video action recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210521070.6A CN114898466B (en) | 2022-05-13 | 2022-05-13 | Intelligent factory-oriented video action recognition method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114898466A true CN114898466A (en) | 2022-08-12 |
CN114898466B CN114898466B (en) | 2024-08-23 |
Family
ID=82722522
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210521070.6A Active CN114898466B (en) | 2022-05-13 | 2022-05-13 | Intelligent factory-oriented video action recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114898466B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115469627A (en) * | 2022-11-01 | 2022-12-13 | 山东恒远智能科技有限公司 | Intelligent factory operation management system based on Internet of things |
CN116258466A (en) * | 2023-05-15 | 2023-06-13 | 国网山东省电力公司菏泽供电公司 | Multi-mode power scene operation specification detection method, system, equipment and medium |
CN116386148A (en) * | 2023-05-30 | 2023-07-04 | 国网江西省电力有限公司超高压分公司 | Knowledge graph guide-based small sample action recognition method and system |
CN116524395A (en) * | 2023-04-04 | 2023-08-01 | 江苏智慧工场技术研究院有限公司 | Intelligent factory-oriented video action recognition method and system |
WO2024065189A1 (en) * | 2022-09-27 | 2024-04-04 | Siemens Aktiengesellschaft | Method, system, apparatus, electronic device, and storage medium for evaluating work task |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109986560A (en) * | 2019-03-19 | 2019-07-09 | 埃夫特智能装备股份有限公司 | A kind of mechanical arm self-adapting grasping method towards multiple target type |
CN112633378A (en) * | 2020-12-24 | 2021-04-09 | 电子科技大学 | Intelligent detection method and system for multimodal image fetus corpus callosum |
CN113903081A (en) * | 2021-09-29 | 2022-01-07 | 北京许继电气有限公司 | Visual identification artificial intelligence alarm method and device for images of hydraulic power plant |
CN114330503A (en) * | 2021-12-06 | 2022-04-12 | 北京无线电计量测试研究所 | Smoke flame identification method and device |
-
2022
- 2022-05-13 CN CN202210521070.6A patent/CN114898466B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109986560A (en) * | 2019-03-19 | 2019-07-09 | 埃夫特智能装备股份有限公司 | A kind of mechanical arm self-adapting grasping method towards multiple target type |
CN112633378A (en) * | 2020-12-24 | 2021-04-09 | 电子科技大学 | Intelligent detection method and system for multimodal image fetus corpus callosum |
CN113903081A (en) * | 2021-09-29 | 2022-01-07 | 北京许继电气有限公司 | Visual identification artificial intelligence alarm method and device for images of hydraulic power plant |
CN114330503A (en) * | 2021-12-06 | 2022-04-12 | 北京无线电计量测试研究所 | Smoke flame identification method and device |
Non-Patent Citations (1)
Title |
---|
周曼;刘志勇;陈梦迟;赵浴阳;杨鲁江;: "基于AlexNet的迁移学习在流程工业图像识别中的应用", 工业控制计算机, no. 11, 25 November 2018 (2018-11-25) * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024065189A1 (en) * | 2022-09-27 | 2024-04-04 | Siemens Aktiengesellschaft | Method, system, apparatus, electronic device, and storage medium for evaluating work task |
CN115469627A (en) * | 2022-11-01 | 2022-12-13 | 山东恒远智能科技有限公司 | Intelligent factory operation management system based on Internet of things |
CN115469627B (en) * | 2022-11-01 | 2023-04-04 | 山东恒远智能科技有限公司 | Intelligent factory operation management system based on Internet of things |
CN116524395A (en) * | 2023-04-04 | 2023-08-01 | 江苏智慧工场技术研究院有限公司 | Intelligent factory-oriented video action recognition method and system |
CN116524395B (en) * | 2023-04-04 | 2023-11-07 | 江苏智慧工场技术研究院有限公司 | Intelligent factory-oriented video action recognition method and system |
CN116258466A (en) * | 2023-05-15 | 2023-06-13 | 国网山东省电力公司菏泽供电公司 | Multi-mode power scene operation specification detection method, system, equipment and medium |
CN116258466B (en) * | 2023-05-15 | 2023-10-27 | 国网山东省电力公司菏泽供电公司 | Multi-mode power scene operation specification detection method, system, equipment and medium |
CN116386148A (en) * | 2023-05-30 | 2023-07-04 | 国网江西省电力有限公司超高压分公司 | Knowledge graph guide-based small sample action recognition method and system |
CN116386148B (en) * | 2023-05-30 | 2023-08-11 | 国网江西省电力有限公司超高压分公司 | Knowledge graph guide-based small sample action recognition method and system |
Also Published As
Publication number | Publication date |
---|---|
CN114898466B (en) | 2024-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114898466B (en) | Intelligent factory-oriented video action recognition method and system | |
CN110147726B (en) | Service quality inspection method and device, storage medium and electronic device | |
CN111400547B (en) | Human-computer cooperation video anomaly detection method | |
CN113365147B (en) | Video editing method, device, equipment and storage medium based on music card point | |
CN112861575A (en) | Pedestrian structuring method, device, equipment and storage medium | |
CN110890102A (en) | Engine defect detection algorithm based on RNN voiceprint recognition | |
WO2020056995A1 (en) | Method and device for determining speech fluency degree, computer apparatus, and readable storage medium | |
CN113642474A (en) | Hazardous area personnel monitoring method based on YOLOV5 | |
CN111931809A (en) | Data processing method and device, storage medium and electronic equipment | |
WO2023241102A1 (en) | Label recognition method and apparatus, and electronic device and storage medium | |
CN117058595B (en) | Video semantic feature and extensible granularity perception time sequence action detection method and device | |
CN112417996A (en) | Information processing method and device for industrial drawing, electronic equipment and storage medium | |
CN111563886B (en) | Unsupervised feature learning-based tunnel steel rail surface disease detection method and device | |
CN111461253A (en) | Automatic feature extraction system and method | |
CN117633613A (en) | Cross-modal video emotion analysis method and device, equipment and storage medium | |
CN110728316A (en) | Classroom behavior detection method, system, device and storage medium | |
Nag et al. | CNN based approach for post disaster damage assessment | |
CN116945258A (en) | Die cutting machine control system and method thereof | |
CN117216264A (en) | Machine tool equipment fault analysis method and system based on BERT algorithm | |
CN116453514A (en) | Multi-view-based voice keyword detection and positioning method and device | |
CN115565008A (en) | Transferable image recognition detection system, method and computer readable storage medium | |
CN111798237B (en) | Abnormal transaction diagnosis method and system based on application log | |
CN114140879A (en) | Behavior identification method and device based on multi-head cascade attention network and time convolution network | |
CN118053107B (en) | Time sequence action detection method and device based on potential action interval feature integration | |
CN118503893B (en) | Time sequence data anomaly detection method and device based on space-time characteristic representation difference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |