CN113688734B

CN113688734B - FPGA heterogeneous acceleration-based old people falling detection method

Info

Publication number: CN113688734B
Application number: CN202110980385.2A
Authority: CN
Inventors: 张立国; 申前; 金梅; 秦芊; 杨红光; 王磊; 孟子杰; 黄文汉
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2023-09-22
Anticipated expiration: 2041-08-25
Also published as: CN113688734A

Abstract

The invention discloses a fall detection method for old people based on FPGA heterogeneous acceleration, which belongs to the technical field of target recognition, and comprises a fusion algorithm part and a hardware acceleration part, wherein the whole neural network is taken as a basic frame, the FPGA hardware acceleration technology is combined, algorithm embedded transplantation is realized through quantitative compiling transplantation, a YOLOv3 network is adopted by a human body detection part, and a portable improved lightweight YOLOv3 network is realized through bidirectional pruning and improved loss function; the fall detection algorithm part adopts a lightweight SquezeNet network, and realizes the fall detection of the old through a method of comprehensively judging the rectangular height-width ratio of the human body and the Euclidean distance of the main key points; the hardware part selects the mpsoc architecture board ultra96-v2 of Xilinx company. The invention not only improves the portability of the fall detection equipment for the old, but also reduces the cost.

Description

FPGA heterogeneous acceleration-based old people falling detection method

Technical Field

The invention relates to the technical field of target recognition, in particular to an old man falling detection method based on FPGA heterogeneous acceleration.

Background

With the development of society, the problem of large gap of medical staff and insufficient intelligent detection equipment exists in the relevant aspects of nursing of the old. The medical science shows that the falling event can reduce 80% of death risks under the condition of timely treatment, and the survival rate of the old is improved, so that the real-time and accurate detection of the falling event has great social and scientific significance.

Currently, the common old people fall detection methods mainly include 3 kinds:

the detection method based on the surrounding environment signals mainly relies on sensors arranged in the surrounding environment, and is used for detecting according to sound generated when a human body falls and changes of wall body ground pressure, is extremely easy to be interfered by other surrounding environment factors to cause false alarm, has extremely low efficiency, and is rarely adopted.

Secondly, the detection method based on the wearable equipment utilizes the gyroscope and the acceleration sensor which are arranged on the wearable equipment to detect falling, but the wearing for a long time increases the body load of the old and affects the daily activities of the old.

Third, based on the detection method of computer vision: the method is a traditional machine vision method, and is extremely easy to be influenced by ambient light and background by judging falling characteristics; the method is an artificial intelligence method, video information extracted by the acquisition equipment is input into a neural network for training and prediction, the identification accuracy is high, but the requirement on the equipment performance is high, and high equipment cost is caused.

Based on the defects of the method, it is necessary to develop a fall detection method for the old based on FPGA heterogeneous acceleration.

Disclosure of Invention

The invention aims to solve the technical problem of providing the fall detection method for the old based on the FPGA heterogeneous acceleration, which is characterized in that the network structure and the detection algorithm in the traditional artificial intelligence fall detection method for the old are subjected to light-weight adjustment and are transplanted into the ARM+FPGA portable embedded system, so that the portable installation of fall detection equipment is realized while the identification precision is not influenced, and the use cost is reduced.

In order to solve the technical problems, the invention adopts the following technical scheme:

the method comprises a fusion algorithm part and a hardware acceleration part, wherein a neural network is taken as a basic frame as a whole, and an FPGA hardware acceleration technology is combined, so that algorithm embedded transplanting is realized through quantitative compiling transplanting; the method specifically comprises the following steps:

step 1, acquiring a training sample, acquiring image characteristic information of a target, and manufacturing target data;

step 2, improving a YOLOV3 network model;

step 3, training the improved YOLOV3 network model structure in the step 2 by using the training sample obtained in the step 1, and iterating to obtain a lightweight YOLOV3 network model;

step 4, constructing a human body posture distinguishing and fusing algorithm which is more suitable for the heterogeneous acceleration embedded environment;

step 5, training the lightweight yolket 3 network model structure obtained in the iteration step 3 by utilizing the training sample obtained in the step 1, and obtaining an improved SqueezeNet network model in an iteration mode;

step 6, quantifying the human body posture discrimination fusion algorithm constructed in the step 4 and the SquezeNet network model obtained in the step 5;

step 7, evaluating the quantized model obtained in the step 6, and performing fine adjustment to obtain higher precision;

step 8, compiling the quantized model evaluated in the step 7;

step 9, transplanting the quantized model compiled in the step 8 to an upper plate, inputting images, and collecting images of indoor human bodies by adopting a Rogowski C920E camera connecting plate card;

and step 10, image detection, namely performing fall detection on the image in the step 9 by utilizing a network transplanted to a board card, and sending out an early warning signal when falling.

The technical scheme of the invention is further improved as follows: the fusion algorithm part mainly refers to human body detection and action recognition of images acquired by video acquisition equipment, the corresponding human body detection part is a human body detection method based on improved lightweight yolket 3, and the corresponding action recognition part is a fusion algorithm based on improved yolket 3 and SqueezeNet network; the hardware acceleration part mainly refers to transplanting a network structure of the algorithm part to an MPSOC architecture board card through a quantization and compiling method, and realizing algorithm realization of an embedded platform.

The technical scheme of the invention is further improved as follows: in step 1, a COCO2017 data set is selected as a training sample.

The technical scheme of the invention is further improved as follows: in step 1, the image characteristic information of the target is the image characteristic information of the target under the non-ideal condition.

The technical scheme of the invention is further improved as follows: in step 2, the improved YOLOV3 network model specifically includes:

2.1, carrying out channel and layer bidirectional pruning on a backbone network, and compressing the width of a model;

the channel pruning is to prune based on a gamma coefficient of a BN layer, find out masks of all convolution layers according to a global threshold value, then collect and merge the pruning masks of all the connected convolution layers for each group of shortcut, prune by using the masks after merge, consider each relevant layer, limit reserved channels of each layer, add processing to an activation offset value, and reduce precision loss during pruning;

layer pruning is further pruning based on a channel pruning strategy, and is carried out by evaluating a CBL before each shortcut layer, sequencing gamma mean values of each layer and taking the smallest layer pruning; in order to ensure the integrity of the Yolov3 network structure, each short cut structure cuts off one short cut layer and two convolution layers in front of the short cut layer; totally cutting off 5 shortcut;

2.2 improving the loss function; the formula for the improved loss function is as follows:

wherein E is _coord Representing coordinate loss, E _conf Indicating a confidence loss.

The technical scheme of the invention is further improved as follows: in the step 4, the key point detection adopts a lightweight SquezeNet network structure for training, and the falling detection adopts a human body posture discrimination fusion algorithm for discriminating the human body height-width ratio and the key coordinate Euclidean distance;

the adopted SquezeNet network simplifies the network structure under the condition of not obviously reducing the precision by reducing the calculated amount during model training and testing, reducing the size of the model network structure and reducing the quantity of the learnable parameters, thereby obtaining better portability; the Euclidean distance judgment of key points of each part of the human body is introduced, and the human body posture is comprehensively judged by combining the aspect ratio of human body detection rectangle;

the target bounding box of the obtained result after human body detection can be equivalently a rectangle, and the aspect ratio H:W of the rectangle is used as a discrimination condition:

H：W＝(H _max -H _min ):(W _max -W _min )

where H represents the height of the rectangular frame and W represents the width of the rectangular frame. H _max And H _min Respectively maximum value and minimum value of human body detection rectangle height, W _max And W is _min Respectively inquiring the maximum value and the minimum value of the width of the human body detection rectangle;

when a human body normally moves, the height-width ratio tends to be stable and unchanged, the ratio is always kept to be more than 1, when the human body falls, the height-width ratio is dynamically changed greatly, and meanwhile, the ratio tends to be less than 1;

the key points of human body are mainly divided into head and trunk, and the fall determination can be mainly based on head coordinates (X _head ，Y _head ) Shoulder center coordinates (X) _shoulder ，Y _shoulder ) And ankle center coordinates (X) _ankle ，Y _ankle ) The change of the relative position is shown in the Euclidean distanceAnd (3) violent shaking occurs, and a threshold value is set for judging, wherein the Euclidean distance d has the following calculation formula:

wherein d (he, an) represents the Euclidean distance between the head coordinate and the ankle center coordinate, d (sh, an) represents the Euclidean distance between the shoulder center coordinate and the ankle center coordinate, i represents different frames, and the whole formula calculates d (he, an) and d (sh, an) of continuous frames and compares jitter by numerical values;

and carrying out combined judgment on the aspect ratio and Euclidean distance judgment, further judging whether the Euclidean distance in the formula is dithered or not when the aspect ratio value is changed, and considering falling when two judgment conditions are simultaneously met, otherwise, considering erroneous judgment.

The technical scheme of the invention is further improved as follows: in step 6, the quantization adopts a VITIS-AI development stack, and the float32 model is converted into an int8 model through an AI Quantizer, so that the trained network model can be deployed on the FPGA to perform acceleration operation.

The technical scheme of the invention is further improved as follows: in step 8, the compiling refers to that for the model generated by quantization, the model is also required to be converted into a computational graph in an XIR format which can be operated by the target board, and the process uses an AI Compiler to generate an optimized machine code of the corresponding board card by utilizing heterogeneous optimization of the xmodel generated by S6 quantization.

The technical scheme of the invention is further improved as follows: in the step 9, the upper transplanting board is to burn the program onto an ultra96-v2 board card, and the original image is used for collecting indoor human body target image data through a compass C920E camera connecting board card; the camera acquires an image of a moving target under a non-ideal condition.

By adopting the technical scheme, the invention has the following technical progress:

1. the method adopts the FPGA heterogeneous acceleration lightweight network, improves the portability of the fall detection equipment for the old, and reduces the cost; by reducing the network model and improving the key point action recognition method, the running speed of the detection and recognition algorithm is increased, and the requirements of accuracy and instantaneity can be met in the actual non-ideal condition scene.

2. The human body detection part adopts a YOLOv3 network, and the portable improved lightweight YOLOv3 network is realized through bidirectional pruning and improved loss function.

3. The fall detection algorithm part of the invention adopts a lightweight SqueezeNet network, and realizes the fall detection of the old through a method of comprehensively judging the aspect ratio of the rectangle of the human body and the Euclidean distance of the main key points; compared with other fall detection algorithms, the lightweight yolket3+squezenet network structure and the improved human body posture discrimination fusion algorithm have the characteristics of small size and precision, so that the requirements on hardware equipment are not high; heterogeneous acceleration of ARM+FPGA can be realized through a quantitative compiling method, and the method can be more portable and applied to fall detection of the old indoor.

4. The hardware part of the invention selects the mpsoc framework board ultra96-v2 of Xilinx company, and realizes the algorithm embedded transplanting through the quantization compiling transplanting.

5. By improving and transplanting, the invention not only improves the accuracy and the instantaneity, but also reduces the requirement on hardware in the practical application, and ensures that the falling detection task of the old under the non-ideal condition in the practical application can be well completed under the conditions of low cost and miniaturization.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a lightweight YOLOV3 network architecture diagram of the present invention;

FIG. 3 is a flow chart of a human body posture discrimination fusion algorithm of the present invention;

FIG. 4 is a diagram showing the effect of detecting falls according to the present invention;

FIG. 5 is a second diagram of the fall detection effect of the present invention;

fig. 6 is a third diagram of the fall detection effect of the present invention.

Detailed Description

The invention is described in further detail below with reference to the attached drawings and examples:

1-3, the method comprises a fusion algorithm part and a hardware acceleration part, wherein a neural network is taken as a basic frame as a whole, and an FPGA hardware acceleration technology is combined, so that algorithm embedded transplantation is realized through quantitative compiling transplantation;

the fusion algorithm part mainly refers to human body detection and action recognition of images acquired by video acquisition equipment, the corresponding human body detection part is a human body detection method based on improved lightweight yolket 3, and the corresponding action recognition part is a fusion algorithm based on improved yolket 3 and SqueezeNet network; the hardware acceleration part mainly refers to transplanting a network structure of the algorithm part to an MPSOC architecture board card through a quantization and compiling method, and realizing algorithm realization of an embedded platform.

For the YOLOV3 network model, the backbone network is a dark net-53 network, the dark net-53 network comprises 21 convolution layers and a full connection layer, and a residual network structure is introduced between the convolution layers, so that the deep extraction of the features is improved, and the multi-scale prediction function is realized.

The improved network model of the YOLOV3 is improved, bidirectional pruning is carried out on channels and layers of the original network, the depth and the width of the model are compressed, and the complex structure of the YOLOV3 is improved in a light-weight mode under the condition that the accuracy is not affected. Meanwhile, the invention improves the original LOSS function, and improves the redundancy degree of the LOSS function by deleting LOSS parts brought by categories on the basis of the original LOSS function.

The method specifically comprises the following steps of:

step 1, acquiring a training sample, acquiring image characteristic information of a target, and manufacturing target data; the image characteristic information of the target is the image characteristic information of the target under non-ideal conditions;

manufacturing a single-category target data set for training a network model, selecting a currently popular COCO data set format, wherein the COCO target data set has a complex background and is more suitable for the network model of an actual detection condition; selecting a COCO2017 data set as a training sample;

step 2, improving a YOLOV3 network model; the method for carrying out channel and layer bidirectional pruning on the backbone network and improving the loss function specifically comprises the following steps:

2.1: the channel pruning is carried out on the trunk network in a channel and layer bidirectional pruning mode, the channel pruning is carried out on the basis of a gamma coefficient of a BN layer, the mask of each convolution layer is found out through a global threshold value, then for each group of shortcut, the connected pruning masks of each convolution layer are combined, the mask after merge is used for pruning, each relevant layer is considered, meanwhile, the reserved channel of each layer is limited, the activation offset value is processed, and the precision loss during pruning is reduced. The layer pruning is further pruning based on a channel pruning strategy, the CBL before each shortcut layer is evaluated, the gamma mean value of each layer is ordered, and the smallest layer pruning is taken. To ensure the integrity of the YOLOV3 network structure, every time a shortcut structure is cut, one shortcut layer and two convolution layers in front of it are cut at the same time. In the invention, 5 shortcut are cut off, and the network is light and the precision is reduced little. The width of the model is compressed by the channel and layer bi-directional pruning, respectively.

2.2: the improvement loss function, YOLOV3, is:

wherein E is _coord Representing coordinate loss, E _conf Indicating confidence loss, E _class Representing class loss, S representing mesh size;

specifically, the three formulas are developed, wherein the coordinate loss function E _coord The calculation formula of (2) is as follows:

wherein S represents the scale of the feature map; b represents the number of prediction frames generated in each cell; (x, y) represents the prediction frame center coordinates,representing the center coordinates of the real target box, (w, h) representing the width and width of the prediction box,/->Represents the width and height of a real target box lambda _coord The weight value representing the coordinate loss is typically set to 5,/for>Indicating that if targets exist in the box at the i and j positions, the targets are 1, otherwise, the targets are 0; the whole formula operation process shows that when the jth anchor frame anchor box of the ith grid is responsible for a certain real target, the frame sounding box generated by the anchor frame anchor box is compared with the real box, and the center coordinate error and the width and height error are obtained through calculation.

Confidence loss function E _conf The calculation formula of (2) is as follows:

wherein C is _i In order to predict the confidence level of the frame,representing the true value, the value being determined by whether or not a certain object is in charge of the bridging box of the grid, lambda _noobj To lose weight, ++>Indicating that its value is 1 when there is no target, and 0 otherwise. It is likely that the detected object occupies only a small part of the image in the input image, which results in a far smaller computational portion where the detected object is present than that where the object is absent, resulting in a network comparisonPreferably, the prediction unit does not contain a portion of the target to be measured. Therefore, the weight coefficient lambda is increased in the lost portion not including the target area to be measured _noobj The value is generally set to 0.5, so that the network can effectively predict the area containing the target to be measured. The first term in the formula is confidence error of the border binding box of the existing object with +.>Meaning that only the confidence level of the relatively large predicted border bounding box of the IOU counts errors; the second term indicates that there is no confidence error of the bounding box of the object.

Class loss function E _class The calculation formula is as follows:

wherein p is _i (c) For the confidence level of the target,representing the probability that the ith mesh has an object. When the anchor frame anchor box is responsible for a certain real target, the classification loss calculation is performed, otherwise, the correlation calculation is not performed, and finally, the optimal prediction frame is selected from the prediction boundary frames through NMS (Non Maximum Suppression).

The method is optimized, and the detected target is the human body all the time in the human body falling detection process, so that the actual loss function can optimize the classification error, the calculated amount of the loss function is reduced, and the complexity of the network is reduced, namely the improved loss function is shown as a formula 5.

In E _coord Representing coordinate loss, E _conf Indicating a confidence loss.

Step 3, training and outputting a model, training the improved YOLOV3 network model structure in the step 2 by using the training sample in the step 1, and iterating to obtain a lightweight YOLOV3 network model;

based on the single-class target COCO data set in the S1, the improved lightweight YOLOV3 network model training is carried out by using a Darknet deep learning framework, an end-to-end training mode is adopted, the initial learning rate is set to be 0.001, and the network model after 20000 iterations is saved.

Step 4, constructing a human body posture discrimination fusion algorithm more suitable for heterogeneous acceleration embedded environments:

the key point detection adopts a light-weight SquezeNet network structure for training, and the falling detection adopts a human body posture discrimination fusion algorithm for discriminating the human body height-width ratio and the key coordinate Euclidean distance; the method specifically comprises the following steps:

the traditional fall detection mainly carries out gesture judgment through an OPENPOSE algorithm, and mainly uses VGG-19 as a backbone network to extract bottom layer characteristics of an input image; and then inputting the extracted characteristic information into a next layer of neural network, realizing the generation of a confidence coefficient map, setting a confidence threshold value to locate key points of a human body, and then carrying out gesture estimation, wherein the VGG-19 network structure is relatively complex, so that the method is large in learning parameters and network calculation amount and unsuitable for transplanting an embedded environment. The adopted SquezeNet network simplifies the network structure under the condition of not obviously reducing the precision by reducing the calculated amount during model training and testing, reducing the size of the model network structure and reducing the quantity of the learnable parameters, thereby obtaining better portability.

The human body posture judgment fusion algorithm is more suitable for the heterogeneous acceleration embedded environment, and the human body posture is comprehensively judged by combining the Euclidean distance judgment of key points of each part of the human body and the aspect ratio of human body detection rectangle.

H：W＝(H _max -H _min ):(W _max -W _min ) (6)

wherein H represents the height of the rectangular frame, W represents the width of the rectangular frame, H _max And H _min Respectively isMaximum value and minimum value of human body detection rectangle height, W _max And W is _min Respectively inquiring the maximum value and the minimum value of the width of the human body detection rectangle; when a human body normally moves, the height-width ratio tends to be stable and unchanged, the ratio is always kept to be more than 1, when the human body falls, the height-width ratio can be changed dynamically and greatly, and meanwhile, the ratio tends to be less than 1.

The key points of human body are mainly divided into head and trunk, and the fall determination can be mainly based on head coordinates (X _head ，Y _head ) Shoulder center coordinates (X) _shoulder ，Y _shoulder ) And ankle center coordinates (X) _ankle ，Y _ankle ) The change of the relative position is reflected in the severe jitter of the Euclidean distance, and the judgment is performed by setting a threshold value, wherein the Euclidean distance has the following calculation formula:

where d (he, an) represents the Euclidean distance of the head coordinate from the ankle center coordinate, d (sh, an) represents the Euclidean distance of the shoulder center coordinate from the ankle center coordinate, and i represents a different frame. The whole formula calculates d (he, an) and d (sh, an) of continuous frames, and compares jitter by numerical values.

The fusion algorithm carries out combination judgment on the aspect ratio and Euclidean distance judgment, when the aspect ratio value changes, whether the Euclidean distance in the formulas (7) and (8) shakes or not is further judged, when the two judgment conditions are met at the same time, the falling is considered, and otherwise, the misjudgment is considered. The algorithm reduces the operation complexity and the false judgment probability, and increases portability on the basis of realizing falling detection.

Step 5, model training and outputting, namely training the improved YOLOV3 network model structure in the step 3 by using the training sample in the step 1, and iterating to obtain an improved SqueezeNet network model;

based on the single-class target COCO data set in the step 1, performing improved SquezeNet network model training, adopting an end-to-end training mode, setting the initial learning rate to be 0.001, and storing the network model after 20000 iterations.

Step 6, quantifying the human body posture discrimination fusion algorithm constructed in the step 4 and the SquezeNet network model obtained in the step 5; the algorithm and the trained network are quantized, and the purpose is to express the network by using low order bits, so as to achieve the purpose of compressing data volume and reduce the requirement on storage space. According to the invention, a VITIS-AI development stack provided by Xilinx company is adopted in quantification, and the float32 model is converted into the int8 model through an AI Quantizer, so that the trained network model can be deployed on the FPGA for acceleration operation, and fine adjustment is performed through an evaluation model after quantification to obtain higher precision.

step 8, compiling the quantized model evaluated in the step 7; for the model generated by quantization, the model needs to be converted into a computational graph in an XIR format which can be operated by a target board, the process uses an AI Compiler of Xilinx company, and the model generated by S6 quantization uses heterogeneous optimization to generate an optimized machine code of a corresponding board card.

Step 9, transplanting the quantized model compiled in the step 8 to an upper plate, inputting images, and collecting images of indoor human bodies by adopting a Rogowski C920E camera connecting plate card; the camera acquires an image of a moving target under a non-ideal condition.

And (3) programming the program on an ultra96-v2 board card, and collecting indoor human body target image data from an original image through a Rogowski C920E camera connecting board card.

And step 10, image detection, namely performing fall detection on the image in the step 9 by utilizing a network transplanted to a board card, and sending out an early warning signal when falling. As shown in fig. 4, 5 and 6.

In summary, the invention can directly make accurate judgment on human body recognition and gesture detection through the miniaturized equipment, improves and transplants, not only improves the accuracy and instantaneity, but also reduces the requirement on hardware in practical application, and ensures that the falling detection task of the old under non-ideal conditions in practical application can be well completed under the conditions of low cost and miniaturization.

The above examples are only for illustrating the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the scope of protection defined by the claims without departing from the spirit of the design of the present invention.

Claims

1. The old man fall detection method based on FPGA heterogeneous acceleration is characterized by comprising the following steps of: the method comprises the steps of fusing an algorithm part and a hardware acceleration part, taking a neural network as a basic frame as a whole, combining with an FPGA hardware acceleration technology, and realizing algorithm embedded transplanting through quantized compiling transplanting; the method specifically comprises the following steps:

step 2, improving a YOLOV3 network model;

step 8, compiling the quantized model evaluated in the step 7;

2. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: the fusion algorithm part mainly refers to human body detection and action recognition of images acquired by video acquisition equipment, the corresponding human body detection part is a human body detection method based on improved lightweight yolket 3, and the corresponding action recognition part is a fusion algorithm based on improved yolket 3 and SqueezeNet network; the hardware acceleration part mainly refers to transplanting a network structure of the algorithm part to an MPSOC architecture board card through a quantization and compiling method, and realizing algorithm realization of an embedded platform.

3. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in step 1, a COCO2017 data set is selected as a training sample.

4. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in step 1, the image characteristic information of the target is the image characteristic information of the target under the non-ideal condition.

5. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in step 2, the improved YOLOV3 network model specifically includes:

6. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in the step 4, the key point detection adopts a lightweight SquezeNet network structure for training, and the falling detection adopts a human body posture discrimination fusion algorithm for discriminating the human body height-width ratio and the key coordinate Euclidean distance;

H∶W＝(H _max -H _min )：(W _max -W _min )

wherein H represents the height of the rectangular frame, W representsWidth of rectangular frame of watch, H _max And H _min Respectively maximum value and minimum value of human body detection rectangle height, W _max And W is _min Respectively inquiring the maximum value and the minimum value of the width of the human body detection rectangle;

the key points of human body are mainly divided into head and trunk, and the fall determination can be mainly based on head coordinates (X _head ，Y _head ) Shoulder center coordinates (X) _shoulder ，Y _shoulder ) And ankle center coordinates (X) _ankle ，Y _ankle ) The change of the relative position is reflected in the severe jitter of the Euclidean distance, and the judgment is carried out by setting a threshold value, wherein the Euclidean distance d has the following calculation formula:

wherein d (he, an) represents the Euclidean distance of the head coordinate and the ankle center coordinate, d (sh, an) represents the Euclidean distance of the shoulder center coordinate and the ankle center coordinate, and i represents different frames; d (he, an) and d (sh, an) of continuous frames are calculated according to the whole formula, and jitter comparison is carried out through numerical values;

7. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in step 6, the quantization adopts a VITIS-AI development stack, and the float32 model is converted into an int8 model through an AI Quantizer, so that the trained network model can be deployed on the FPGA to perform acceleration operation.

8. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in step 8, the compiling refers to that for the model generated by quantization, the model is also required to be converted into a computational graph in an XIR format which can be operated by the target board, and the process uses an AI Compiler to generate an optimized machine code of the corresponding board card by utilizing heterogeneous optimization of the xmodel generated by S6 quantization.

9. The method for detecting the fall of the aged based on the FPGA heterogeneous acceleration according to claim 1, which is characterized by comprising the following steps: in the step 9, the upper transplanting board is to burn the program onto an ultra96-v2 board card, and the original image is used for collecting indoor human body target image data through a compass C920E camera connecting board card; the camera acquires an image of a moving target under a non-ideal condition.