[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113688734B - FPGA heterogeneous acceleration-based old people falling detection method - Google Patents

FPGA heterogeneous acceleration-based old people falling detection method Download PDF

Info

Publication number
CN113688734B
CN113688734B CN202110980385.2A CN202110980385A CN113688734B CN 113688734 B CN113688734 B CN 113688734B CN 202110980385 A CN202110980385 A CN 202110980385A CN 113688734 B CN113688734 B CN 113688734B
Authority
CN
China
Prior art keywords
human body
network
model
fall
pruning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110980385.2A
Other languages
Chinese (zh)
Other versions
CN113688734A (en
Inventor
张立国
申前
金梅
秦芊
杨红光
王磊
孟子杰
黄文汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202110980385.2A priority Critical patent/CN113688734B/en
Publication of CN113688734A publication Critical patent/CN113688734A/en
Application granted granted Critical
Publication of CN113688734B publication Critical patent/CN113688734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a fall detection method for old people based on FPGA heterogeneous acceleration, which belongs to the technical field of target recognition, and comprises a fusion algorithm part and a hardware acceleration part, wherein the whole neural network is taken as a basic frame, the FPGA hardware acceleration technology is combined, algorithm embedded transplantation is realized through quantitative compiling transplantation, a YOLOv3 network is adopted by a human body detection part, and a portable improved lightweight YOLOv3 network is realized through bidirectional pruning and improved loss function; the fall detection algorithm part adopts a lightweight SquezeNet network, and realizes the fall detection of the old through a method of comprehensively judging the rectangular height-width ratio of the human body and the Euclidean distance of the main key points; the hardware part selects the mpsoc architecture board ultra96-v2 of Xilinx company. The invention not only improves the portability of the fall detection equipment for the old, but also reduces the cost.

Description

FPGA heterogeneous acceleration-based old people falling detection method
Technical Field
The invention relates to the technical field of target recognition, in particular to an old man falling detection method based on FPGA heterogeneous acceleration.
Background
With the development of society, the problem of large gap of medical staff and insufficient intelligent detection equipment exists in the relevant aspects of nursing of the old. The medical science shows that the falling event can reduce 80% of death risks under the condition of timely treatment, and the survival rate of the old is improved, so that the real-time and accurate detection of the falling event has great social and scientific significance.
Currently, the common old people fall detection methods mainly include 3 kinds:
the detection method based on the surrounding environment signals mainly relies on sensors arranged in the surrounding environment, and is used for detecting according to sound generated when a human body falls and changes of wall body ground pressure, is extremely easy to be interfered by other surrounding environment factors to cause false alarm, has extremely low efficiency, and is rarely adopted.
Secondly, the detection method based on the wearable equipment utilizes the gyroscope and the acceleration sensor which are arranged on the wearable equipment to detect falling, but the wearing for a long time increases the body load of the old and affects the daily activities of the old.
Third, based on the detection method of computer vision: the method is a traditional machine vision method, and is extremely easy to be influenced by ambient light and background by judging falling characteristics; the method is an artificial intelligence method, video information extracted by the acquisition equipment is input into a neural network for training and prediction, the identification accuracy is high, but the requirement on the equipment performance is high, and high equipment cost is caused.
Based on the defects of the method, it is necessary to develop a fall detection method for the old based on FPGA heterogeneous acceleration.
Disclosure of Invention
The invention aims to solve the technical problem of providing the fall detection method for the old based on the FPGA heterogeneous acceleration, which is characterized in that the network structure and the detection algorithm in the traditional artificial intelligence fall detection method for the old are subjected to light-weight adjustment and are transplanted into the ARM+FPGA portable embedded system, so that the portable installation of fall detection equipment is realized while the identification precision is not influenced, and the use cost is reduced.
In order to solve the technical problems, the invention adopts the following technical scheme:
the method comprises a fusion algorithm part and a hardware acceleration part, wherein a neural network is taken as a basic frame as a whole, and an FPGA hardware acceleration technology is combined, so that algorithm embedded transplanting is realized through quantitative compiling transplanting; the method specifically comprises the following steps:
step 1, acquiring a training sample, acquiring image characteristic information of a target, and manufacturing target data;
step 2, improving a YOLOV3 network model;
step 3, training the improved YOLOV3 network model structure in the step 2 by using the training sample obtained in the step 1, and iterating to obtain a lightweight YOLOV3 network model;
step 4, constructing a human body posture distinguishing and fusing algorithm which is more suitable for the heterogeneous acceleration embedded environment;
step 5, training the lightweight yolket 3 network model structure obtained in the iteration step 3 by utilizing the training sample obtained in the step 1, and obtaining an improved SqueezeNet network model in an iteration mode;
step 6, quantifying the human body posture discrimination fusion algorithm constructed in the step 4 and the SquezeNet network model obtained in the step 5;
step 7, evaluating the quantized model obtained in the step 6, and performing fine adjustment to obtain higher precision;
step 8, compiling the quantized model evaluated in the step 7;
step 9, transplanting the quantized model compiled in the step 8 to an upper plate, inputting images, and collecting images of indoor human bodies by adopting a Rogowski C920E camera connecting plate card;
and step 10, image detection, namely performing fall detection on the image in the step 9 by utilizing a network transplanted to a board card, and sending out an early warning signal when falling.
The technical scheme of the invention is further improved as follows: the fusion algorithm part mainly refers to human body detection and action recognition of images acquired by video acquisition equipment, the corresponding human body detection part is a human body detection method based on improved lightweight yolket 3, and the corresponding action recognition part is a fusion algorithm based on improved yolket 3 and SqueezeNet network; the hardware acceleration part mainly refers to transplanting a network structure of the algorithm part to an MPSOC architecture board card through a quantization and compiling method, and realizing algorithm realization of an embedded platform.
The technical scheme of the invention is further improved as follows: in step 1, a COCO2017 data set is selected as a training sample.
The technical scheme of the invention is further improved as follows: in step 1, the image characteristic information of the target is the image characteristic information of the target under the non-ideal condition.
The technical scheme of the invention is further improved as follows: in step 2, the improved YOLOV3 network model specifically includes:
2.1, carrying out channel and layer bidirectional pruning on a backbone network, and compressing the width of a model;
the channel pruning is to prune based on a gamma coefficient of a BN layer, find out masks of all convolution layers according to a global threshold value, then collect and merge the pruning masks of all the connected convolution layers for each group of shortcut, prune by using the masks after merge, consider each relevant layer, limit reserved channels of each layer, add processing to an activation offset value, and reduce precision loss during pruning;
layer pruning is further pruning based on a channel pruning strategy, and is carried out by evaluating a CBL before each shortcut layer, sequencing gamma mean values of each layer and taking the smallest layer pruning; in order to ensure the integrity of the Yolov3 network structure, each short cut structure cuts off one short cut layer and two convolution layers in front of the short cut layer; totally cutting off 5 shortcut;
2.2 improving the loss function; the formula for the improved loss function is as follows:
wherein E is coord Representing coordinate loss, E conf Indicating a confidence loss.
The technical scheme of the invention is further improved as follows: in the step 4, the key point detection adopts a lightweight SquezeNet network structure for training, and the falling detection adopts a human body posture discrimination fusion algorithm for discriminating the human body height-width ratio and the key coordinate Euclidean distance;
the adopted SquezeNet network simplifies the network structure under the condition of not obviously reducing the precision by reducing the calculated amount during model training and testing, reducing the size of the model network structure and reducing the quantity of the learnable parameters, thereby obtaining better portability; the Euclidean distance judgment of key points of each part of the human body is introduced, and the human body posture is comprehensively judged by combining the aspect ratio of human body detection rectangle;
the target bounding box of the obtained result after human body detection can be equivalently a rectangle, and the aspect ratio H:W of the rectangle is used as a discrimination condition:
H:W=(H max -H min ):(W max -W min )
where H represents the height of the rectangular frame and W represents the width of the rectangular frame. H max And H min Respectively maximum value and minimum value of human body detection rectangle height, W max And W is min Respectively inquiring the maximum value and the minimum value of the width of the human body detection rectangle;
when a human body normally moves, the height-width ratio tends to be stable and unchanged, the ratio is always kept to be more than 1, when the human body falls, the height-width ratio is dynamically changed greatly, and meanwhile, the ratio tends to be less than 1;
the key points of human body are mainly divided into head and trunk, and the fall determination can be mainly based on head coordinates (X head ,Y head ) Shoulder center coordinates (X) shoulder ,Y shoulder ) And ankle center coordinates (X) ankle ,Y ankle ) The change of the relative position is shown in the Euclidean distanceAnd (3) violent shaking occurs, and a threshold value is set for judging, wherein the Euclidean distance d has the following calculation formula:
wherein d (he, an) represents the Euclidean distance between the head coordinate and the ankle center coordinate, d (sh, an) represents the Euclidean distance between the shoulder center coordinate and the ankle center coordinate, i represents different frames, and the whole formula calculates d (he, an) and d (sh, an) of continuous frames and compares jitter by numerical values;
and carrying out combined judgment on the aspect ratio and Euclidean distance judgment, further judging whether the Euclidean distance in the formula is dithered or not when the aspect ratio value is changed, and considering falling when two judgment conditions are simultaneously met, otherwise, considering erroneous judgment.
The technical scheme of the invention is further improved as follows: in step 6, the quantization adopts a VITIS-AI development stack, and the float32 model is converted into an int8 model through an AI Quantizer, so that the trained network model can be deployed on the FPGA to perform acceleration operation.
The technical scheme of the invention is further improved as follows: in step 8, the compiling refers to that for the model generated by quantization, the model is also required to be converted into a computational graph in an XIR format which can be operated by the target board, and the process uses an AI Compiler to generate an optimized machine code of the corresponding board card by utilizing heterogeneous optimization of the xmodel generated by S6 quantization.
The technical scheme of the invention is further improved as follows: in the step 9, the upper transplanting board is to burn the program onto an ultra96-v2 board card, and the original image is used for collecting indoor human body target image data through a compass C920E camera connecting board card; the camera acquires an image of a moving target under a non-ideal condition.
By adopting the technical scheme, the invention has the following technical progress:
1. the method adopts the FPGA heterogeneous acceleration lightweight network, improves the portability of the fall detection equipment for the old, and reduces the cost; by reducing the network model and improving the key point action recognition method, the running speed of the detection and recognition algorithm is increased, and the requirements of accuracy and instantaneity can be met in the actual non-ideal condition scene.
2. The human body detection part adopts a YOLOv3 network, and the portable improved lightweight YOLOv3 network is realized through bidirectional pruning and improved loss function.
3. The fall detection algorithm part of the invention adopts a lightweight SqueezeNet network, and realizes the fall detection of the old through a method of comprehensively judging the aspect ratio of the rectangle of the human body and the Euclidean distance of the main key points; compared with other fall detection algorithms, the lightweight yolket3+squezenet network structure and the improved human body posture discrimination fusion algorithm have the characteristics of small size and precision, so that the requirements on hardware equipment are not high; heterogeneous acceleration of ARM+FPGA can be realized through a quantitative compiling method, and the method can be more portable and applied to fall detection of the old indoor.
4. The hardware part of the invention selects the mpsoc framework board ultra96-v2 of Xilinx company, and realizes the algorithm embedded transplanting through the quantization compiling transplanting.
5. By improving and transplanting, the invention not only improves the accuracy and the instantaneity, but also reduces the requirement on hardware in the practical application, and ensures that the falling detection task of the old under the non-ideal condition in the practical application can be well completed under the conditions of low cost and miniaturization.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a lightweight YOLOV3 network architecture diagram of the present invention;
FIG. 3 is a flow chart of a human body posture discrimination fusion algorithm of the present invention;
FIG. 4 is a diagram showing the effect of detecting falls according to the present invention;
FIG. 5 is a second diagram of the fall detection effect of the present invention;
fig. 6 is a third diagram of the fall detection effect of the present invention.
Detailed Description
The invention is described in further detail below with reference to the attached drawings and examples:
1-3, the method comprises a fusion algorithm part and a hardware acceleration part, wherein a neural network is taken as a basic frame as a whole, and an FPGA hardware acceleration technology is combined, so that algorithm embedded transplantation is realized through quantitative compiling transplantation;
the fusion algorithm part mainly refers to human body detection and action recognition of images acquired by video acquisition equipment, the corresponding human body detection part is a human body detection method based on improved lightweight yolket 3, and the corresponding action recognition part is a fusion algorithm based on improved yolket 3 and SqueezeNet network; the hardware acceleration part mainly refers to transplanting a network structure of the algorithm part to an MPSOC architecture board card through a quantization and compiling method, and realizing algorithm realization of an embedded platform.
For the YOLOV3 network model, the backbone network is a dark net-53 network, the dark net-53 network comprises 21 convolution layers and a full connection layer, and a residual network structure is introduced between the convolution layers, so that the deep extraction of the features is improved, and the multi-scale prediction function is realized.
The improved network model of the YOLOV3 is improved, bidirectional pruning is carried out on channels and layers of the original network, the depth and the width of the model are compressed, and the complex structure of the YOLOV3 is improved in a light-weight mode under the condition that the accuracy is not affected. Meanwhile, the invention improves the original LOSS function, and improves the redundancy degree of the LOSS function by deleting LOSS parts brought by categories on the basis of the original LOSS function.
The method specifically comprises the following steps of:
step 1, acquiring a training sample, acquiring image characteristic information of a target, and manufacturing target data; the image characteristic information of the target is the image characteristic information of the target under non-ideal conditions;
manufacturing a single-category target data set for training a network model, selecting a currently popular COCO data set format, wherein the COCO target data set has a complex background and is more suitable for the network model of an actual detection condition; selecting a COCO2017 data set as a training sample;
step 2, improving a YOLOV3 network model; the method for carrying out channel and layer bidirectional pruning on the backbone network and improving the loss function specifically comprises the following steps:
2.1: the channel pruning is carried out on the trunk network in a channel and layer bidirectional pruning mode, the channel pruning is carried out on the basis of a gamma coefficient of a BN layer, the mask of each convolution layer is found out through a global threshold value, then for each group of shortcut, the connected pruning masks of each convolution layer are combined, the mask after merge is used for pruning, each relevant layer is considered, meanwhile, the reserved channel of each layer is limited, the activation offset value is processed, and the precision loss during pruning is reduced. The layer pruning is further pruning based on a channel pruning strategy, the CBL before each shortcut layer is evaluated, the gamma mean value of each layer is ordered, and the smallest layer pruning is taken. To ensure the integrity of the YOLOV3 network structure, every time a shortcut structure is cut, one shortcut layer and two convolution layers in front of it are cut at the same time. In the invention, 5 shortcut are cut off, and the network is light and the precision is reduced little. The width of the model is compressed by the channel and layer bi-directional pruning, respectively.
2.2: the improvement loss function, YOLOV3, is:
wherein E is coord Representing coordinate loss, E conf Indicating confidence loss, E class Representing class loss, S representing mesh size;
specifically, the three formulas are developed, wherein the coordinate loss function E coord The calculation formula of (2) is as follows:
wherein S represents the scale of the feature map; b represents the number of prediction frames generated in each cell; (x, y) represents the prediction frame center coordinates,representing the center coordinates of the real target box, (w, h) representing the width and width of the prediction box,/->Represents the width and height of a real target box lambda coord The weight value representing the coordinate loss is typically set to 5,/for>Indicating that if targets exist in the box at the i and j positions, the targets are 1, otherwise, the targets are 0; the whole formula operation process shows that when the jth anchor frame anchor box of the ith grid is responsible for a certain real target, the frame sounding box generated by the anchor frame anchor box is compared with the real box, and the center coordinate error and the width and height error are obtained through calculation.
Confidence loss function E conf The calculation formula of (2) is as follows:
wherein C is i In order to predict the confidence level of the frame,representing the true value, the value being determined by whether or not a certain object is in charge of the bridging box of the grid, lambda noobj To lose weight, ++>Indicating that its value is 1 when there is no target, and 0 otherwise. It is likely that the detected object occupies only a small part of the image in the input image, which results in a far smaller computational portion where the detected object is present than that where the object is absent, resulting in a network comparisonPreferably, the prediction unit does not contain a portion of the target to be measured. Therefore, the weight coefficient lambda is increased in the lost portion not including the target area to be measured noobj The value is generally set to 0.5, so that the network can effectively predict the area containing the target to be measured. The first term in the formula is confidence error of the border binding box of the existing object with +.>Meaning that only the confidence level of the relatively large predicted border bounding box of the IOU counts errors; the second term indicates that there is no confidence error of the bounding box of the object.
Class loss function E class The calculation formula is as follows:
wherein p is i (c) For the confidence level of the target,representing the probability that the ith mesh has an object. When the anchor frame anchor box is responsible for a certain real target, the classification loss calculation is performed, otherwise, the correlation calculation is not performed, and finally, the optimal prediction frame is selected from the prediction boundary frames through NMS (Non Maximum Suppression).
The method is optimized, and the detected target is the human body all the time in the human body falling detection process, so that the actual loss function can optimize the classification error, the calculated amount of the loss function is reduced, and the complexity of the network is reduced, namely the improved loss function is shown as a formula 5.
In E coord Representing coordinate loss, E conf Indicating a confidence loss.
Step 3, training and outputting a model, training the improved YOLOV3 network model structure in the step 2 by using the training sample in the step 1, and iterating to obtain a lightweight YOLOV3 network model;
based on the single-class target COCO data set in the S1, the improved lightweight YOLOV3 network model training is carried out by using a Darknet deep learning framework, an end-to-end training mode is adopted, the initial learning rate is set to be 0.001, and the network model after 20000 iterations is saved.
Step 4, constructing a human body posture discrimination fusion algorithm more suitable for heterogeneous acceleration embedded environments:
the key point detection adopts a light-weight SquezeNet network structure for training, and the falling detection adopts a human body posture discrimination fusion algorithm for discriminating the human body height-width ratio and the key coordinate Euclidean distance; the method specifically comprises the following steps:
the traditional fall detection mainly carries out gesture judgment through an OPENPOSE algorithm, and mainly uses VGG-19 as a backbone network to extract bottom layer characteristics of an input image; and then inputting the extracted characteristic information into a next layer of neural network, realizing the generation of a confidence coefficient map, setting a confidence threshold value to locate key points of a human body, and then carrying out gesture estimation, wherein the VGG-19 network structure is relatively complex, so that the method is large in learning parameters and network calculation amount and unsuitable for transplanting an embedded environment. The adopted SquezeNet network simplifies the network structure under the condition of not obviously reducing the precision by reducing the calculated amount during model training and testing, reducing the size of the model network structure and reducing the quantity of the learnable parameters, thereby obtaining better portability.
The human body posture judgment fusion algorithm is more suitable for the heterogeneous acceleration embedded environment, and the human body posture is comprehensively judged by combining the Euclidean distance judgment of key points of each part of the human body and the aspect ratio of human body detection rectangle.
The target bounding box of the obtained result after human body detection can be equivalently a rectangle, and the aspect ratio H:W of the rectangle is used as a discrimination condition:
H:W=(H max -H min ):(W max -W min ) (6)
wherein H represents the height of the rectangular frame, W represents the width of the rectangular frame, H max And H min Respectively isMaximum value and minimum value of human body detection rectangle height, W max And W is min Respectively inquiring the maximum value and the minimum value of the width of the human body detection rectangle; when a human body normally moves, the height-width ratio tends to be stable and unchanged, the ratio is always kept to be more than 1, when the human body falls, the height-width ratio can be changed dynamically and greatly, and meanwhile, the ratio tends to be less than 1.
The key points of human body are mainly divided into head and trunk, and the fall determination can be mainly based on head coordinates (X head ,Y head ) Shoulder center coordinates (X) shoulder ,Y shoulder ) And ankle center coordinates (X) ankle ,Y ankle ) The change of the relative position is reflected in the severe jitter of the Euclidean distance, and the judgment is performed by setting a threshold value, wherein the Euclidean distance has the following calculation formula:
where d (he, an) represents the Euclidean distance of the head coordinate from the ankle center coordinate, d (sh, an) represents the Euclidean distance of the shoulder center coordinate from the ankle center coordinate, and i represents a different frame. The whole formula calculates d (he, an) and d (sh, an) of continuous frames, and compares jitter by numerical values.
The fusion algorithm carries out combination judgment on the aspect ratio and Euclidean distance judgment, when the aspect ratio value changes, whether the Euclidean distance in the formulas (7) and (8) shakes or not is further judged, when the two judgment conditions are met at the same time, the falling is considered, and otherwise, the misjudgment is considered. The algorithm reduces the operation complexity and the false judgment probability, and increases portability on the basis of realizing falling detection.
Step 5, model training and outputting, namely training the improved YOLOV3 network model structure in the step 3 by using the training sample in the step 1, and iterating to obtain an improved SqueezeNet network model;
based on the single-class target COCO data set in the step 1, performing improved SquezeNet network model training, adopting an end-to-end training mode, setting the initial learning rate to be 0.001, and storing the network model after 20000 iterations.
Step 6, quantifying the human body posture discrimination fusion algorithm constructed in the step 4 and the SquezeNet network model obtained in the step 5; the algorithm and the trained network are quantized, and the purpose is to express the network by using low order bits, so as to achieve the purpose of compressing data volume and reduce the requirement on storage space. According to the invention, a VITIS-AI development stack provided by Xilinx company is adopted in quantification, and the float32 model is converted into the int8 model through an AI Quantizer, so that the trained network model can be deployed on the FPGA for acceleration operation, and fine adjustment is performed through an evaluation model after quantification to obtain higher precision.
Step 7, evaluating the quantized model obtained in the step 6, and performing fine adjustment to obtain higher precision;
step 8, compiling the quantized model evaluated in the step 7; for the model generated by quantization, the model needs to be converted into a computational graph in an XIR format which can be operated by a target board, the process uses an AI Compiler of Xilinx company, and the model generated by S6 quantization uses heterogeneous optimization to generate an optimized machine code of a corresponding board card.
Step 9, transplanting the quantized model compiled in the step 8 to an upper plate, inputting images, and collecting images of indoor human bodies by adopting a Rogowski C920E camera connecting plate card; the camera acquires an image of a moving target under a non-ideal condition.
And (3) programming the program on an ultra96-v2 board card, and collecting indoor human body target image data from an original image through a Rogowski C920E camera connecting board card.
And step 10, image detection, namely performing fall detection on the image in the step 9 by utilizing a network transplanted to a board card, and sending out an early warning signal when falling. As shown in fig. 4, 5 and 6.
In summary, the invention can directly make accurate judgment on human body recognition and gesture detection through the miniaturized equipment, improves and transplants, not only improves the accuracy and instantaneity, but also reduces the requirement on hardware in practical application, and ensures that the falling detection task of the old under non-ideal conditions in practical application can be well completed under the conditions of low cost and miniaturization.
The above examples are only for illustrating the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the scope of protection defined by the claims without departing from the spirit of the design of the present invention.

Claims (9)

1. The old man fall detection method based on FPGA heterogeneous acceleration is characterized by comprising the following steps of: the method comprises the steps of fusing an algorithm part and a hardware acceleration part, taking a neural network as a basic frame as a whole, combining with an FPGA hardware acceleration technology, and realizing algorithm embedded transplanting through quantized compiling transplanting; the method specifically comprises the following steps:
step 1, acquiring a training sample, acquiring image characteristic information of a target, and manufacturing target data;
step 2, improving a YOLOV3 network model;
step 3, training the improved YOLOV3 network model structure in the step 2 by using the training sample obtained in the step 1, and iterating to obtain a lightweight YOLOV3 network model;
step 4, constructing a human body posture distinguishing and fusing algorithm which is more suitable for the heterogeneous acceleration embedded environment;
step 5, training the lightweight yolket 3 network model structure obtained in the iteration step 3 by utilizing the training sample obtained in the step 1, and obtaining an improved SqueezeNet network model in an iteration mode;
step 6, quantifying the human body posture discrimination fusion algorithm constructed in the step 4 and the SquezeNet network model obtained in the step 5;
step 7, evaluating the quantized model obtained in the step 6, and performing fine adjustment to obtain higher precision;
step 8, compiling the quantized model evaluated in the step 7;
step 9, transplanting the quantized model compiled in the step 8 to an upper plate, inputting images, and collecting images of indoor human bodies by adopting a Rogowski C920E camera connecting plate card;
and step 10, image detection, namely performing fall detection on the image in the step 9 by utilizing a network transplanted to a board card, and sending out an early warning signal when falling.
2. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: the fusion algorithm part mainly refers to human body detection and action recognition of images acquired by video acquisition equipment, the corresponding human body detection part is a human body detection method based on improved lightweight yolket 3, and the corresponding action recognition part is a fusion algorithm based on improved yolket 3 and SqueezeNet network; the hardware acceleration part mainly refers to transplanting a network structure of the algorithm part to an MPSOC architecture board card through a quantization and compiling method, and realizing algorithm realization of an embedded platform.
3. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in step 1, a COCO2017 data set is selected as a training sample.
4. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in step 1, the image characteristic information of the target is the image characteristic information of the target under the non-ideal condition.
5. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in step 2, the improved YOLOV3 network model specifically includes:
2.1, carrying out channel and layer bidirectional pruning on a backbone network, and compressing the width of a model;
the channel pruning is to prune based on a gamma coefficient of a BN layer, find out masks of all convolution layers according to a global threshold value, then collect and merge the pruning masks of all the connected convolution layers for each group of shortcut, prune by using the masks after merge, consider each relevant layer, limit reserved channels of each layer, add processing to an activation offset value, and reduce precision loss during pruning;
layer pruning is further pruning based on a channel pruning strategy, and is carried out by evaluating a CBL before each shortcut layer, sequencing gamma mean values of each layer and taking the smallest layer pruning; in order to ensure the integrity of the Yolov3 network structure, each short cut structure cuts off one short cut layer and two convolution layers in front of the short cut layer; totally cutting off 5 shortcut;
2.2 improving the loss function; the formula for the improved loss function is as follows:
wherein E is coord Representing coordinate loss, E conf Indicating a confidence loss.
6. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in the step 4, the key point detection adopts a lightweight SquezeNet network structure for training, and the falling detection adopts a human body posture discrimination fusion algorithm for discriminating the human body height-width ratio and the key coordinate Euclidean distance;
the adopted SquezeNet network simplifies the network structure under the condition of not obviously reducing the precision by reducing the calculated amount during model training and testing, reducing the size of the model network structure and reducing the quantity of the learnable parameters, thereby obtaining better portability; the Euclidean distance judgment of key points of each part of the human body is introduced, and the human body posture is comprehensively judged by combining the aspect ratio of human body detection rectangle;
the target bounding box of the obtained result after human body detection can be equivalently a rectangle, and the aspect ratio H:W of the rectangle is used as a discrimination condition:
H∶W=(H max -H min ):(W max -W min )
wherein H represents the height of the rectangular frame, W representsWidth of rectangular frame of watch, H max And H min Respectively maximum value and minimum value of human body detection rectangle height, W max And W is min Respectively inquiring the maximum value and the minimum value of the width of the human body detection rectangle;
when a human body normally moves, the height-width ratio tends to be stable and unchanged, the ratio is always kept to be more than 1, when the human body falls, the height-width ratio is dynamically changed greatly, and meanwhile, the ratio tends to be less than 1;
the key points of human body are mainly divided into head and trunk, and the fall determination can be mainly based on head coordinates (X head ,Y head ) Shoulder center coordinates (X) shoulder ,Y shoulder ) And ankle center coordinates (X) ankle ,Y ankle ) The change of the relative position is reflected in the severe jitter of the Euclidean distance, and the judgment is carried out by setting a threshold value, wherein the Euclidean distance d has the following calculation formula:
wherein d (he, an) represents the Euclidean distance of the head coordinate and the ankle center coordinate, d (sh, an) represents the Euclidean distance of the shoulder center coordinate and the ankle center coordinate, and i represents different frames; d (he, an) and d (sh, an) of continuous frames are calculated according to the whole formula, and jitter comparison is carried out through numerical values;
and carrying out combined judgment on the aspect ratio and Euclidean distance judgment, further judging whether the Euclidean distance in the formula is dithered or not when the aspect ratio value is changed, and considering falling when two judgment conditions are simultaneously met, otherwise, considering erroneous judgment.
7. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in step 6, the quantization adopts a VITIS-AI development stack, and the float32 model is converted into an int8 model through an AI Quantizer, so that the trained network model can be deployed on the FPGA to perform acceleration operation.
8. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in step 8, the compiling refers to that for the model generated by quantization, the model is also required to be converted into a computational graph in an XIR format which can be operated by the target board, and the process uses an AI Compiler to generate an optimized machine code of the corresponding board card by utilizing heterogeneous optimization of the xmodel generated by S6 quantization.
9. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in the step 9, the upper transplanting board is to burn the program onto an ultra96-v2 board card, and the original image is used for collecting indoor human body target image data through a compass C920E camera connecting board card; the camera acquires an image of a moving target under a non-ideal condition.
CN202110980385.2A 2021-08-25 2021-08-25 FPGA heterogeneous acceleration-based old people falling detection method Active CN113688734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110980385.2A CN113688734B (en) 2021-08-25 2021-08-25 FPGA heterogeneous acceleration-based old people falling detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110980385.2A CN113688734B (en) 2021-08-25 2021-08-25 FPGA heterogeneous acceleration-based old people falling detection method

Publications (2)

Publication Number Publication Date
CN113688734A CN113688734A (en) 2021-11-23
CN113688734B true CN113688734B (en) 2023-09-22

Family

ID=78582493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110980385.2A Active CN113688734B (en) 2021-08-25 2021-08-25 FPGA heterogeneous acceleration-based old people falling detection method

Country Status (1)

Country Link
CN (1) CN113688734B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273401B (en) * 2022-08-03 2024-06-14 浙江慧享信息科技有限公司 Method and system for automatically sensing falling of person
CN116451757B (en) * 2023-06-19 2023-09-08 山东浪潮科学研究院有限公司 Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729876A (en) * 2017-11-09 2018-02-23 重庆医科大学 Fall detection method in old man room based on computer vision
WO2019232894A1 (en) * 2018-06-05 2019-12-12 中国石油大学(华东) Complex scene-based human body key point detection system and method
CN111461042A (en) * 2020-04-07 2020-07-28 中国建设银行股份有限公司 Fall detection method and system
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11179064B2 (en) * 2018-12-30 2021-11-23 Altum View Systems Inc. Method and system for privacy-preserving fall detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729876A (en) * 2017-11-09 2018-02-23 重庆医科大学 Fall detection method in old man room based on computer vision
WO2019232894A1 (en) * 2018-06-05 2019-12-12 中国石油大学(华东) Complex scene-based human body key point detection system and method
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN111461042A (en) * 2020-04-07 2020-07-28 中国建设银行股份有限公司 Fall detection method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于YOLO网络的人体跌倒检测方法;杨雪旗;唐旭;章国宝;黄永明;;扬州大学学报(自然科学版)(第02期);全文 *

Also Published As

Publication number Publication date
CN113688734A (en) 2021-11-23

Similar Documents

Publication Publication Date Title
CN107818571B (en) Ship automatic tracking method and system based on deep learning network and average drifting
CN107352032B (en) Method for monitoring people flow data and unmanned aerial vehicle
JP5515647B2 (en) Positioning device
CN113688734B (en) FPGA heterogeneous acceleration-based old people falling detection method
CN107105159B (en) Embedded moving target real-time detection tracking system and method based on SoC
CN110135476A (en) A kind of detection method of personal safety equipment, device, equipment and system
CN104106260A (en) Geographic map based control
CN113177968A (en) Target tracking method and device, electronic equipment and storage medium
CN206968975U (en) A kind of unmanned plane
CN113962282A (en) Improved YOLOv5L + Deepsort-based real-time detection system and method for ship engine room fire
CN104637242A (en) Elder falling detection method and system based on multiple classifier integration
CN116958584B (en) Key point detection method, regression model training method and device and electronic equipment
CN114721403B (en) Automatic driving control method and device based on OpenCV and storage medium
CN115761537A (en) Power transmission line foreign matter intrusion identification method oriented to dynamic characteristic supplement mechanism
CN110929670A (en) Muck truck cleanliness video identification and analysis method based on yolo3 technology
CN117274375A (en) Target positioning method and system based on transfer learning network model and image matching
CN116416291A (en) Electronic fence automatic generation method, real-time detection method and device
CN117054967A (en) Positioning method based on intelligent positioning of mining safety helmet and product structure
CN115859078A (en) Millimeter wave radar fall detection method based on improved Transformer
CN113837086A (en) Reservoir phishing person detection method based on deep convolutional neural network
CN113792700A (en) Storage battery car boxing detection method and device, computer equipment and storage medium
CN110598599A (en) Method and device for detecting abnormal gait of human body based on Gabor atomic decomposition
CN112732083A (en) Unmanned aerial vehicle intelligent control method based on gesture recognition
CN109190762A (en) Upper limb gesture recognition algorithms based on genetic algorithm encoding
CN116840835B (en) Fall detection method, system and equipment based on millimeter wave radar

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant