CN117934674A - Deep learning and three-dimensional animation interactive cooperation method and system - Google Patents
Deep learning and three-dimensional animation interactive cooperation method and system Download PDFInfo
- Publication number
- CN117934674A CN117934674A CN202410161296.9A CN202410161296A CN117934674A CN 117934674 A CN117934674 A CN 117934674A CN 202410161296 A CN202410161296 A CN 202410161296A CN 117934674 A CN117934674 A CN 117934674A
- Authority
- CN
- China
- Prior art keywords
- module
- dimensional model
- dimensional
- data
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 30
- 238000013135 deep learning Methods 0.000 title claims abstract description 28
- 238000009792 diffusion process Methods 0.000 claims abstract description 23
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 20
- 230000008569 process Effects 0.000 claims abstract description 20
- 238000002474 experimental method Methods 0.000 claims abstract description 18
- 238000004088 simulation Methods 0.000 claims abstract description 14
- 238000011835 investigation Methods 0.000 claims abstract description 11
- 230000003993 interaction Effects 0.000 claims abstract description 6
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000012545 processing Methods 0.000 claims description 47
- 239000000463 material Substances 0.000 claims description 26
- 238000005457 optimization Methods 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 14
- 238000013515 script Methods 0.000 claims description 11
- 230000004927 fusion Effects 0.000 claims description 8
- 238000011161 development Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 238000013138 pruning Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000007613 environmental effect Effects 0.000 abstract description 6
- 230000008859 change Effects 0.000 abstract description 5
- 230000003190 augmentative effect Effects 0.000 abstract 1
- 241000894007 species Species 0.000 description 8
- 241001465754 Metazoa Species 0.000 description 7
- 244000062645 predators Species 0.000 description 5
- 239000013598 vector Substances 0.000 description 4
- 241000283984 Rodentia Species 0.000 description 3
- 241000270295 Serpentes Species 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000012010 growth Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 241000238631 Hexapoda Species 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 241001529936 Murinae Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003698 anagen phase Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000005059 dormancy Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000366 juvenile effect Effects 0.000 description 1
- 210000001161 mammalian embryo Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000017260 vegetative to reproductive phase transition of meristem Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/005—General purpose rendering architectures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Computer Graphics (AREA)
- General Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Tourism & Hospitality (AREA)
- Educational Technology (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention relates to a deep learning and three-dimensional animation interactive cooperation method and system, which are used for carrying out simulation interactive investigation experiments on a biological community and an environmental complex, and the invention comprises a data acquisition end, an administrator end and a user end, wherein the data acquisition end is used for preprocessing input data, the administrator end is provided with a pre-task and a post-task, the pre-task is involved in modeling and optimizing a three-dimensional model, a simulation natural ecological system is realized by using an augmented reality technology and registered in reality, the post-task is used for carrying out ecological system experiments on the data collected through man-machine interaction and man-human interaction, and the user end can utilize a machine vision technology to position and process the three-dimensional model through virtual reality equipment; the stable diffusion algorithm adopted by the invention optimizes the ecological system by designing the space-time attention layer, so that the scene of the user participating in the experiment is more realistic, and the process of space-time change of the ecological system after the ecological factors are changed can be shown.
Description
Technical Field
The invention relates to the field of teaching experiments, in particular to a method and a system for interactive cooperation of deep learning and three-dimensional animation.
Background
On the one hand, the current biological class teaching process has some disadvantages, such as: the teaching mode of trial guidance leads to the lack of deep understanding and exploratory ability of biology for students; the teaching content is too narrow: the students may only know knowledge points on some surfaces, and the students cannot fully understand and know biology; on the other hand, stable diffusion is a generation model based on diffusion process, which can generate high-resolution and realistic images, repair missing parts, add finer image features to low-resolution images for predicting dynamic changes of images and the like, but does not realize the related application of space-time shuttling on three-dimensional virtual animation in the field of teaching experiments.
Therefore, how to let the experimenter participate in the investigation experiment of the ecosystem through a teaching experiment mode of man-machine interaction of man and human, and how to show the space-time variation process of the ecosystem after the factors of the ecosystem are changed to the experimenter in a short time by using a deep learning algorithm becomes a technical problem to be solved by the technicians in the field.
Disclosure of Invention
In order to solve at least the above problems in the prior art, the present invention provides a method and a system for interactive collaboration of deep learning and three-dimensional animation, which allow experimenters to interactively collaborate to complete investigation experiments of an ecosystem, and observe a space-time variation process of the ecosystem after changing a certain factors in the ecosystem in a short time.
The invention aims at realizing the following technical scheme: the invention relates to a depth learning and three-dimensional animation interactive cooperation method, which comprises the following steps:
s1, inputting a three-dimensional model, two-dimensional images and voice data, and establishing an initial ecosystem model through processing of a material library module, a three-dimensional model optimization module and a voice prompt module;
S2, recognizing gestures and voices of the terminal user through processing of the voice prompt module, the biological recognition feature module and the positioning unit module;
S3, acquiring and analyzing the generated three-dimensional model, two-dimensional image and voice data through a data acquisition and analysis module, and then carrying out data fusion through a three-dimensional model optimization module, wherein the obtained data are used for generating a final ecosystem model through a data processing module and evaluating a current model;
The method further comprises the steps of processing the three-dimensional model, the two-dimensional image and the voice data, and comprises the following specific steps:
SS1. Construct a stable diffusion training data set comprising: the system comprises an input rough time sequence three-dimensional model, an input two-dimensional image of an ecological system, an input corresponding voice description and an output three-dimensional model;
SS2, constructing a stable diffusion deep learning open source framework, and designing an LDM network model based on a space-time attention layer in the framework, wherein the space-time attention layer models the correlation among an input coarse time sequence three-dimensional model, an input two-dimensional image of an ecological system and an input corresponding voice description so as to improve the time continuity of a simulation ecological system;
SS3, firstly, obtaining the characteristic Z X of the hidden space through an input encoder for the input rough three-dimensional ecological system and the two-dimensional image sequence; meanwhile, mapping the corresponding voice description and gesture to the hidden space through a condition encoder to obtain a condition characteristic Z C;
SS4. Input characteristic Z X and conditional characteristic Z C are input into LDM network model, adopt the space-time attention layer proposed to fuse the two characteristic, get the characteristic Z f after denoising in step T after carrying on T;
SS5, inputting the reconstructed characteristic Z f into a decoder to obtain a predicted time sequence three-dimensional model y, namely the data of the time sequence three-dimensional model of the ecological system;
SS6, calculating loss between the predicted time sequence three-dimensional model y and the real time sequence three-dimensional model y_gt, and updating parameters of the LDM network model through loss back propagation;
SS7, pruning and quantifying the network structure of the trained LDM network model to achieve the purposes of reducing model parameters and accelerating model efficiency;
And SS8, building an inference framework according to the quantized network model, and using the inference framework for three-dimensional animation interactive collaboration.
Preferably, in the step SS2, a stable diffusion deep learning open source framework is built, and an LDM network model based on a space-time attention layer is designed in the framework, which comprises the following steps:
Firstly, input is X e R H×W×3×F, which represents RGB image of size h×w sampled at time point F, n=hw/P 2 for each time point, where P represents the area size, N represents how many areas there are for each time point; it can be expressed as a vector P=1,.. t=1, a, F, performing the process; the input X is subjected to embedding processing,,/>Representing a learnable matrix,/>Representing a learnable spatial position code;
Second, the converter contains L layers of encoders, each block being a multi-headed attention; for the head a of the block 1, the input of the block is subjected to linear processing to generate a reduced-dimension query, key and value, and the query, key and value of each block are expressed as follows:
;
;
;
Where a=1,..a represents the number of heads of multi-head attention and D h represents the dimension of each head; and calling the current region q and other regions k to respectively calculate to obtain the attention layer value alpha of the region relative to other regions, wherein the attention layer value alpha is expressed as follows:
;
further, the vectors of different spatial positions at different time points without grouping are grouped, firstly, the attention layers of the regions of the same spatial position in different time dimensions are calculated, and then the attention layers of the regions of different spatial positions at the same time point are calculated, and the expression is as follows:
;
;
The updated value s on a certain head is obtained by weighted summation of the attention layer value and the corresponding value, and is expressed as follows:
;
the updated values on a plurality of heads are combined, linear projection is carried out on the updated values, residual error connection is added to obtain a multi-head updated value z', and the expression is as follows:
;
The final new characteristics of each region are obtained through operations such as a regularization layer, a full connection layer, residual connection and the like, and are expressed as follows:
。
preferably, the method for establishing the initial ecosystem model by inputting the three-dimensional model, the two-dimensional image and the voice data and processing the three-dimensional model, the three-dimensional model optimizing module and the voice prompt module comprises the following steps:
s11, firstly, building a material library, and storing data content of the material library in a material library module, wherein the step of building the material library comprises the steps of inputting a rough time sequence three-dimensional model, inputting an ecosystem two-dimensional image and inputting corresponding voice description;
s12, processing the input rough time sequence three-dimensional model which is modeled through a three-dimensional model optimization module, wherein the method comprises the following steps of:
S121, performing simulation development on a three-dimensional model of a material library by using three-dimensional modeling software, microsoft 3D Tools and plug-ins, and constructing a three-dimensional model of an initial ecological system and space-time evolution by optimizing a stable diffusion algorithm on the obtained simulation three-dimensional model;
S122, rendering the ecosystem model by the three-dimensional modeling software through data optimized by a stable diffusion algorithm to generate simulation ecosystem evolution generated along with time;
S123, simultaneously constructing an initial ecological system and a space-time evolution three-dimensional model through optimization of a stable diffusion algorithm to finish three-dimensional registration, so that the three-dimensional model is matched, positioned and fused with a real scene;
And S13, when the construction of the initial ecological system and the three-dimensional model is completed, the established voice prompt module carries out interactive guidance on the terminal user so as to complete the investigation experiment step.
Preferably, the processing of the voice prompt module, the biometric feature module and the positioning unit module is used for recognizing the gesture and voice of the terminal user, and the method comprises the following steps:
S21, a biological recognition feature module is established, wherein the biological recognition feature module comprises a gesture recognition module and a voice recognition module;
s211, the gesture recognition module is established, gestures are recognized based on detr models, the structure of the detr models comprises a1 st sublayer which is a multi-head attention layer, a 2 nd sublayer which is a full-connection layer, and residual connection is used between every two sublayers;
s212, establishing a voice recognition module, and recognizing voice by adopting a DNN-HMM acoustic model frame;
S22, realizing the function of processing space data and tasks by the positioning unit module by using a 3D SLAM algorithm;
S23, triggering script prompt words through recognition of gestures and voices of the terminal user, and calling the script to receive and process feedback information sent by the voice prompt module so as to guide the terminal user to complete experiment contents of the ecosystem along with the script.
Preferably, the data acquisition and analysis module acquires and analyzes the generated three-dimensional model, two-dimensional image and voice data, and the three-dimensional model optimization module performs data fusion, and the data obtained above generates a final ecosystem model by the data processing module and evaluates the current model, comprising the following steps:
S31, the data acquisition and analysis module collects and analyzes the data generated in the steps, namely, data information generated by recognizing the gestures and the voices of the terminal users is processed and analyzed through a population dynamics related model;
S32, fusing and optimizing the data generated in the steps by using a three-dimensional model optimization module of a stable diffration algorithm;
S33, using the data processed by the steps, evaluating the influence of the ecosystem by using feedback data through the created data processing module, and extracting data information in the data processing module; when the three-dimensional model in the ecological system is continuously interfered by the terminal user, the three-dimensional model optimizing module is restarted to correct; when the terminal user stops interfering the three-dimensional model in the ecological system, the experimental interaction can be ended according to the suggestion of the voice prompt module.
Preferably, the deep learning and three-dimensional animation interactive collaboration system comprises a data acquisition end, an administrator end and a user end;
The data acquisition end: the system comprises a material library module, a database module and a database module, wherein the module has the function of preprocessing a three-dimensional model, a two-dimensional image and voice data;
The manager integrates four functional modules: the system comprises a three-dimensional model optimization module, a voice prompt module, a data acquisition and analysis module and a data processing module, wherein the modules are responsible for processing and deforming a three-dimensional model and classifying and analyzing data, and jointly support an administrator terminal to realize efficient management;
The user terminal: the system comprises a biological identification feature module and a positioning unit module, wherein received data input by a terminal user are transmitted back to an administrator side in real time through wearing virtual reality equipment.
Preferably, the administrator terminal and the user terminal set the IP address through developing a script, so as to ensure that the administrator terminal and the user terminal establish a TCP/IP protocol connection in the same network.
Preferably, the user terminal includes a virtual reality device, where the virtual reality device includes an audio device, a video device, and a positioning device:
the audio equipment comprises a sound input device and a sound output device;
The audio equipment comprises a sound input device and a sound output device; the voice input device provides a function of collecting voice information of a user for the voice prompt module and the biological recognition feature module; the sound output equipment comprises a voice prompt module which is a key component for realizing a voice prompt function of the sound output equipment, is responsible for experimental guidance of ecological investigation experiment determination and flow, and presents the result of interference with the final environmental influence of the ecological system to a terminal user;
the video device: the system comprises a projection device and a video acquisition device; the projection equipment is responsible for presenting the three-dimensional model subjected to the deep learning process to an end user; the video acquisition equipment is responsible for acquiring gesture information of a terminal user and leading the gesture information into the manager side data acquisition and analysis data module;
The positioning device: and the two modules work cooperatively and accurately position the terminal user.
The beneficial effects of the invention are as follows:
According to the deep learning and three-dimensional animation interactive collaboration method and system provided by the invention, the scene of the terminal user participating in the investigation experiment is more real through the improvement of the stable diffration algorithm, and the ecological system space-time change process after the ecological system factors are changed can be inferred and displayed in a short time by utilizing the algorithm.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of a method and system for interactive collaboration of deep learning and three-dimensional animation in accordance with the present invention;
FIG. 2 is a schematic flow chart of a method for interactive cooperation of deep learning and three-dimensional animation based on an improved stable diffusion algorithm;
FIG. 3 is a flow chart of a method for designing a spatiotemporal attention layer in interactive collaboration of deep learning and three-dimensional animation according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, based on the examples herein, which are within the scope of the invention as defined by the claims, will be within the scope of the invention as defined by the claims.
As shown in FIG. 1, the invention provides a deep learning and three-dimensional animation interactive collaboration method, which comprises the following steps:
S1, inputting a three-dimensional model, two-dimensional images and voice data, and establishing an initial forest ecosystem model through processing of a material library module, a three-dimensional model optimization module and a voice prompt module;
S2, recognizing gestures and voices of the terminal user through processing of the voice prompt module, the biological recognition feature module and the positioning unit module;
S3, acquiring and analyzing the generated three-dimensional model, two-dimensional image and voice data through a data acquisition and analysis module, and carrying out data fusion through a three-dimensional model optimization module, wherein the obtained data are used for generating a final forest ecosystem model through a data processing module and evaluating the current forest ecosystem model;
the above steps further include processing the "three-dimensional model, two-dimensional image and voice data", as shown in fig. 2, specifically including the steps of:
SS1. Construct a stable diffusion training data set comprising: the system comprises an input rough time sequence three-dimensional model, an input two-dimensional image of an ecological system, an input corresponding voice description and an output three-dimensional model;
SS2, constructing a stable diffusion deep learning open source framework, and designing an LDM network model based on a space-time attention layer in the framework, wherein the space-time attention layer models the correlation among an input coarse time sequence three-dimensional model, an input two-dimensional image of an ecological system and an input corresponding voice description so as to improve the time continuity of a simulation ecological system;
SS3, firstly, obtaining the characteristic Z X of the hidden space through an input encoder for the input rough three-dimensional ecological system and the two-dimensional image sequence; meanwhile, mapping the corresponding voice description and gesture to the hidden space through a condition encoder to obtain a condition characteristic Z C;
SS4. Input characteristic Z X and conditional characteristic Z C are input into LDM network model, adopt the space-time attention layer proposed to fuse the two characteristic, get the characteristic Z f after denoising in step T after carrying on T;
SS5, inputting the reconstructed characteristic Z f into a decoder to obtain a predicted time sequence three-dimensional model y, namely the data of the time sequence three-dimensional model of the ecological system;
SS6, calculating loss between the predicted time sequence three-dimensional model y and the real time sequence three-dimensional model y_gt, and updating parameters of the LDM network model through loss back propagation;
SS7, pruning and quantifying the network structure of the trained LDM network model to achieve the purposes of reducing model parameters and accelerating model efficiency;
And SS8, building an inference framework according to the quantized network model, and using the inference framework for three-dimensional animation interactive collaboration.
Specifically, in the step SS2, a stable diffusion deep learning open source framework is built, and an LDM network model based on a space-time attention layer is designed in the framework, which comprises the following steps:
Firstly, input is X e R H×W×3×F, which represents RGB image of size h×w sampled at time point F, n=hw/P 2 for each time point, where P represents the area size, N represents how many areas there are for each time point; it can be expressed as a vector P=1,.. t=1, a, F, performing the process; the input X is subjected to embedding processing,,/>Representing a learnable matrix,/>Representing a learnable spatial position code;
Second, the converter contains L layers of encoders, each block being a multi-headed attention; for the head a of the block 1, the input of the block is subjected to linear processing to generate a reduced-dimension query, key and value, and the query, key and value of each block are expressed as follows:
;
;
;
Where a=1,..a represents the number of heads of multi-head attention and D h represents the dimension of each head; and calling the current region q and other regions k to respectively calculate to obtain the attention layer value alpha of the region relative to other regions, wherein the attention layer value alpha is expressed as follows:
;
As shown in fig. 3, vectors of different spatial positions at different time points without grouping are grouped, first, attention layers of regions of the same spatial position in different time dimensions are calculated, and then the attention layers of the regions of different spatial positions at the same time point are calculated, expressed as follows:
;
;
The updated value s on a certain head is obtained by weighted summation of the attention layer value and the corresponding value, and is expressed as follows:
;
the updated values on a plurality of heads are combined, linear projection is carried out on the updated values, residual error connection is added to obtain a multi-head updated value z', and the expression is as follows:
;
The final new characteristics of each region are obtained through operations such as a regularization layer, a full connection layer, residual connection and the like, and are expressed as follows:
。
Specifically, the method for establishing the initial ecosystem model by inputting the three-dimensional model, the two-dimensional image and the voice data and processing by the material library module, the three-dimensional model optimizing module and the voice prompt module comprises the following steps:
s11, firstly, building a material library, and storing data content of the material library in a material library module, wherein the step of building the material library comprises the steps of inputting a rough time sequence three-dimensional model, inputting an ecosystem two-dimensional image and inputting corresponding voice description;
The input rough time sequence three-dimensional model comprises a plant rough time sequence three-dimensional model, an animal rough time sequence three-dimensional model and an environment rough time sequence three-dimensional model;
The plant rough time sequence three-dimensional model comprises rough time sequence three-dimensional models of trees, shrubs, vines and herbaceous plants in different periods of embryo period, juvenile period, growing period, flowering period and dormancy period;
The animal-derived time-sequential three-dimensional model includes three-dimensional models of more than 3 classes of organisms ranging from insects to mammals over different growth phases, such as: snake, bird, murine;
the input environment rough time sequence three-dimensional model comprises three-dimensional models of soil, moisture, illumination and climate in different seasons;
the input corresponding text description specifically comprises detailed descriptions such as the fact that the forest ecosystem is distributed in the climate region of the earth, the role type played by the time sequence three-dimensional model in the food chain relation, the relative position of the three-dimensional model in the application scene and the like;
the output three-dimensional model is a three-dimensional model of a simulated forest ecological system;
s12, processing the input rough time sequence three-dimensional model which is modeled through a three-dimensional model optimization module, wherein the method comprises the following steps of:
S121, performing simulation development on a three-dimensional model of a material library by using three-dimensional modeling software, microsoft 3D Tools and plug-ins, and constructing a three-dimensional model of an initial ecological system and space-time evolution by optimizing a stable diffusion algorithm on the obtained simulation three-dimensional model;
S122, rendering the ecosystem model by the three-dimensional modeling software through data optimized by a stable diffusion algorithm to generate simulation ecosystem evolution generated along with time;
S123, simultaneously constructing an initial ecological system and a space-time evolution three-dimensional model through optimization of a stable diffusion algorithm to finish three-dimensional registration, so that the three-dimensional model is matched, positioned and fused with a real scene;
S13, when the construction of the initial ecological system and the three-dimensional model is completed, namely, the three-dimensional model is output to be a three-dimensional model of the simulation forest ecological system with time sequence, the established voice prompt module carries out interactive guidance on the terminal user;
That is, the system can send out voice prompt when the initial forest ecosystem model is completed, such as: let us now perform three-dimensional simulation experiments of the ecosystem, please participate in the evolution process of the ecosystem; "what you present is a simulated scene of the forest ecosystem, there are many animals and plants on your right side for you to choose from.
Specifically, the method for recognizing the gesture and the voice of the terminal user through the processing of the voice prompt module, the biological recognition feature module and the positioning unit module comprises the following steps:
S21, a biological recognition feature module is established, wherein the biological recognition feature module comprises a gesture recognition module and a voice recognition module;
S211, a gesture recognition module is established, dynamic gesture images of a terminal user are acquired through virtual reality equipment, gestures are recognized on the basis of detr models, the structure of the detr model comprises a1 st sub-layer which is a multi-head attention layer, a 2 nd sub-layer which is a full-connection layer, and residual connection is used between every two sub-layers;
s212, a voice recognition module is established, voice data of a terminal user are collected through virtual display equipment, and voice is recognized through a DNN-HMM acoustic model frame;
S22, realizing the function of processing space data and tasks by a positioning unit module by using a 3D SLAM algorithm, and enhancing the simultaneous positioning and mapping of virtual and reality;
S23, triggering script prompt words through recognition of gestures and voices of the terminal user, and calling the script to receive and process feedback information sent by the voice prompt module so as to guide the terminal user to complete experiment contents of the ecosystem along with the script; for example: after hearing the system voice prompt that one or more animals and plants can be selected to be 'picked up' and 'placed into the forest ecosystem in front of you, the terminal user can continuously' pick up 'one animal and plant and' place into the simulated forest ecosystem in front of you according to the voice prompt.
Specifically, the data acquisition and analysis module acquires and analyzes the generated three-dimensional model, two-dimensional image and voice data, and then the three-dimensional model optimization module performs data fusion, the obtained data is used for generating a final ecosystem model by the data processing module and evaluating the current model, and the method comprises the following steps:
S31, the data acquisition and analysis module collects and analyzes the data generated in the steps, namely, data information generated by recognizing the gestures and the voices of the end user is processed and analyzed through a population dynamics related model, for example: the competition relationship exists between two species, and the snake and the prey bird both feed on mice, so that a Lotka-Volterra competition model is utilized for simple analysis: for the prey population, the rate of change is expressed as: dN 1/dt=r1N1(1-N1/K1-αN2/K1), wherein N 1 represents the number of prey populations, r 1 represents the natural growth rate of the prey populations, K 1 represents the environmental containment of the prey, α represents the competition coefficient of the prey against the prey, and N 2 represents the number of prey populations; for predator populations, the rate of change is expressed as: dN 2/dt=r2N2(1-N1/K2-βN1 /K2), wherein N 2 represents the number of predators, r 2 represents the natural growth rate of the predators population, K 2 represents the environmental containment of predators, β represents the competition coefficient of the prey for predators, N 1 represents the number of prey populations, wherein the competition coefficients α and β are empirically set according to the specific species and environmental conditions, K 1>K2/β,K2<K1/α: species 1 wins and species 2 vanishes; k 1<K2/β,K2>K1/α: species 2 wins and species 1 vanishes; k 1>K2/β,K2>K1/α: species 1,2 may form co-existence, but are unstable; k 1<K2/β,K2<K1/α: species 1,2 may form stable coexistence; the number of the rodents in the current three-dimensional model of the forest ecosystem is relatively stable, if the number of the prey birds is increased in the current forest ecosystem by the end user, the prey birds can more prey the rodents, and the reduction of the number of the rodents can also lead to the reduction of the number of the snakes; vice versa;
S32, carrying out fusion and optimization processing on the data generated in the steps by using a three-dimensional model optimization module of a stable diffusion algorithm, namely inputting gesture and voice data generated by a terminal user and data preprocessed by a material library module;
S33, using the data processed by the steps, evaluating the influence of the ecosystem by using feedback data through the created data processing module, and extracting data information in the data processing module; when the three-dimensional model in the ecological system is continuously interfered by the terminal user, the three-dimensional model optimizing module is restarted to correct; when the terminal user stops interfering the three-dimensional model in the ecological system, the experimental interaction can be ended according to the suggestion of the voice prompt module;
I.e. hearing the voice prompt of the system, "if you have already added, please say 'done', we will end the participation process"; at this time, the terminal user should continuously pick up an animal and plant according to the voice prompt and put into the simulated forest ecosystem in front of the ground; if the interval is 1 minute or the investigation is finished, the investigation experiment step is finished, the data processing module is called in the system to perform data fusion and three-dimensional model optimization on the interfered forest ecosystem, namely new data generated by gestures and actions of the end user, and the evolution process of the forest ecosystem with time sequence is output to the end user, and when obvious influence is presented, the system has the following steps: animals and plants in any ecosystem disappear, the evolution of the forest ecosystem is stopped in the extreme state, and the final evolution result evaluation is presented to the end user in a voice broadcasting mode.
Specifically, the deep learning and three-dimensional animation interactive collaboration system comprises a data acquisition end, an administrator end and a user end, as shown in fig. 1;
The data acquisition end: the system comprises a material library module, a database module and a database module, wherein the module has the function of preprocessing a three-dimensional model, a two-dimensional image and voice data;
The manager integrates four functional modules: the system comprises a three-dimensional model optimization module, a voice prompt module, a data acquisition and analysis module and a data processing module, wherein the modules are responsible for processing and deforming a three-dimensional model and classifying and analyzing data, and jointly support an administrator terminal to realize efficient management;
The user terminal: the system comprises a biological identification feature module and a positioning unit module, wherein received data input by a terminal user are transmitted back to an administrator side in real time through wearing virtual reality equipment.
Specifically, the administrator side and the user side set an IP address through development scripts, so as to ensure that the administrator side and the user side establish TCP/IP protocol connection in the same network.
Specifically, the user side includes a virtual reality device, where the virtual reality device includes an audio device, a video device, and a positioning device:
the audio equipment comprises a sound input device and a sound output device;
The audio equipment comprises a sound input device and a sound output device; the voice input device provides a function of collecting voice information of a user for the voice prompt module and the biological recognition feature module; the sound output equipment comprises a voice prompt module which is a key component for realizing a voice prompt function of the sound output equipment, is responsible for experimental guidance of ecological investigation experiment determination and flow, and presents the result of interference with the final environmental influence of the ecological system to a terminal user;
the video device: the system comprises a projection device and a video acquisition device; the projection equipment is responsible for presenting the three-dimensional model subjected to the deep learning process to an end user; the video acquisition equipment is responsible for acquiring gesture information of a terminal user and leading the gesture information into the manager side data acquisition and analysis data module;
The positioning device: and the two modules work cooperatively and accurately position the terminal user.
In the embodiment of the invention, a coarse time sequence three-dimensional model, an ecological system two-dimensional image and voice description are taken as input, the influence on forest ecological system factors is completed in a gesture and voice mode under the system voice prompt, a population dynamics related model is taken as a theoretical basis support and reference basis, the condition that the influence changes with time on the population quantity related to the forest ecological system is predicted, and the scene of the investigation experiment is more real through an improved stable diffusion algorithm, and particularly, the space-time change process of the forest ecological system after the ecological system factors are changed can be displayed in a reasoning manner in a short time.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.
It should be noted that in the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "plurality" means at least two.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Claims (7)
1. A depth learning and three-dimensional animation interactive cooperation method is characterized in that:
s1, inputting a three-dimensional model, two-dimensional images and voice data, and establishing an initial ecosystem model through processing of a material library module, a three-dimensional model optimization module and a voice prompt module;
S2, recognizing gestures and voices of the terminal user through processing of the voice prompt module, the biological recognition feature module and the positioning unit module;
S3, acquiring and analyzing the generated three-dimensional model, two-dimensional image and voice data through a data acquisition and analysis module, and then carrying out data fusion through a three-dimensional model optimization module, wherein the obtained data are used for generating a final ecosystem model through a data processing module and evaluating a current model;
The method further comprises the steps of processing the three-dimensional model, the two-dimensional image and the voice data, and comprises the following specific steps:
SS1. Construct a stable diffusion training data set comprising: the system comprises an input rough time sequence three-dimensional model, an input two-dimensional image of an ecological system, an input corresponding voice description and an output three-dimensional model;
SS2, constructing a stable diffusion deep learning open source framework, and designing an LDM network model based on a space-time attention layer in the framework, wherein the space-time attention layer models the correlation among an input coarse time sequence three-dimensional model, an input ecological system two-dimensional image and an input corresponding voice description;
SS3, firstly, obtaining the characteristic Z X of the hidden space through an input encoder for the input rough three-dimensional ecological system and the two-dimensional image sequence; meanwhile, mapping the corresponding voice description and gesture to the hidden space through a condition encoder to obtain a condition characteristic Z C;
SS4. Input characteristic Z X and conditional characteristic Z C are input into LDM network model, adopt the space-time attention layer proposed to fuse the two characteristic, get the characteristic Z f after denoising in step T after carrying on T;
SS5, inputting the reconstructed characteristic Z f into a decoder to obtain a predicted time sequence three-dimensional model y;
SS6, calculating loss between the predicted time sequence three-dimensional model y and the real time sequence three-dimensional model y_gt, and updating parameters of the LDM network model through loss back propagation;
SS7, pruning and quantifying the network structure by the LDM network model trained by the steps;
and SS8, building an inference framework according to the quantized network model and using the inference framework for three-dimensional animation interactive collaboration.
2. The interactive collaboration method of deep learning and three-dimensional animation according to claim 1, wherein in the step SS2, a stable diffusion deep learning open source framework is built, and an LDM network model based on a space-time attention layer is designed in the framework, and the space-time attention layer is processed step by step, including:
calculating the attention layer of the space-time attention layer in the regions with the same spatial position in different time dimensions;
and calculating the attention layers of the areas with different spatial positions at the same time point.
3. The interactive cooperation method of deep learning and three-dimensional animation according to claim 1, wherein the initial ecosystem model is built by inputting three-dimensional model, two-dimensional image and voice data and processing by a material library module, a three-dimensional model optimizing module and a voice prompt module, comprising the steps of:
s11, firstly, building a material library, and storing data content of the material library in a material library module, wherein the step of building the material library comprises the steps of inputting a rough time sequence three-dimensional model, inputting an ecosystem two-dimensional image and inputting corresponding voice description;
s12, processing the input rough time sequence three-dimensional model which is modeled through a three-dimensional model optimization module, wherein the method comprises the following steps of:
S121, performing simulation development on a three-dimensional model of a material library by using three-dimensional modeling software, microsoft 3D Tools and plug-ins, and constructing a three-dimensional model of an initial ecological system and space-time evolution by optimizing a stable diffusion algorithm on the obtained simulation three-dimensional model;
S122, simultaneously constructing an initial ecological system and a space-time evolution three-dimensional model through optimization of a stable diffusion algorithm to finish three-dimensional registration, so that the three-dimensional model is matched, positioned and fused with a real scene;
And S13, when the construction of the initial ecological system and the three-dimensional model is completed, the established voice prompt module carries out interactive guidance on the terminal user so as to complete the investigation experiment step.
4. The interactive deep learning and three-dimensional animation collaboration method of claim 1, wherein the recognition of the gestures and voices of the end user by the processing of the voice prompt module, the biometric feature module and the location unit module comprises the steps of:
S21, a biological recognition feature module is established, wherein the biological recognition feature module comprises a gesture recognition module and a voice recognition module;
S211, establishing a gesture recognition module, and recognizing gestures based on detr models;
s212, establishing a voice recognition module, and recognizing voice by adopting a DNN-HMM acoustic model frame;
S22, realizing the function of processing space data and tasks by the positioning unit module by using a 3D SLAM algorithm;
S23, triggering script prompt words through recognition of gestures and voices of the terminal user, and calling the script to receive and process feedback information sent by the voice prompt module so as to guide the terminal user to complete experiment contents of the ecosystem along with the script.
5. The interactive cooperation method of deep learning and three-dimensional animation according to claim 1, wherein the data acquisition and analysis module acquires and analyzes the generated three-dimensional model, two-dimensional image and voice data, and the data fusion is performed by the three-dimensional model optimization module, and the data obtained above is used for generating a final ecosystem model by the data processing module and evaluating the current model, and the method comprises the following steps:
S31, the data acquisition and analysis module collects and analyzes the data generated in the steps, namely, data information generated by recognizing the gestures and the voices of the terminal users is processed and analyzed through a population dynamics related model;
S32, fusing and optimizing the data generated in the steps by using a three-dimensional model optimization module of a stable diffration algorithm;
S33, using the data processed by the steps, evaluating the influence of the ecosystem by using feedback data through the created data processing module, and extracting data information in the data processing module; when the three-dimensional model in the ecological system is continuously interfered by the terminal user, the three-dimensional model optimizing module is restarted to correct; when the terminal user stops interfering the three-dimensional model in the ecological system, the experimental interaction can be ended according to the suggestion of the voice prompt module.
6. The deep learning and three-dimensional animation interactive collaboration system of claim 1, comprising:
And a data acquisition end: the system comprises a material library module, a database module and a database module, wherein the module has the function of preprocessing a three-dimensional model, a two-dimensional image and voice data;
Administrator side: integrating a plurality of functional modules: the system comprises a three-dimensional model optimization module, a voice prompt module, a data acquisition and analysis module and a data processing module, wherein the modules support an administrator terminal to realize efficient management of a three-dimensional model, voice prompt, data acquisition and analysis and data processing;
the user terminal: comprises a biometric feature module and a location unit module for receiving and processing user input and data.
7. The interactive deep learning and three-dimensional animation collaboration system of claim 6, wherein the client comprises a virtual reality device comprising an audio device, a video device, and a positioning device:
The audio equipment comprises a sound input device and a sound output device; the voice input device provides a function of collecting voice information of a user for the voice prompt module and the biological recognition feature module; the voice output equipment comprises a voice prompt module which is a key component for realizing a voice prompt function;
the video device: the system comprises a projection device and a video acquisition device; the projection equipment is responsible for presenting the three-dimensional model subjected to the deep learning process to an end user; the video acquisition equipment is responsible for acquiring gesture information of a terminal user and leading the gesture information into the manager side data acquisition and analysis data module;
The positioning device: and the two modules work cooperatively and accurately position the terminal user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410161296.9A CN117934674B (en) | 2024-02-05 | 2024-02-05 | Deep learning and three-dimensional animation interactive cooperation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410161296.9A CN117934674B (en) | 2024-02-05 | 2024-02-05 | Deep learning and three-dimensional animation interactive cooperation method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117934674A true CN117934674A (en) | 2024-04-26 |
CN117934674B CN117934674B (en) | 2024-09-17 |
Family
ID=90768457
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410161296.9A Active CN117934674B (en) | 2024-02-05 | 2024-02-05 | Deep learning and three-dimensional animation interactive cooperation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117934674B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107515674A (en) * | 2017-08-08 | 2017-12-26 | 山东科技大学 | It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality |
CN116343334A (en) * | 2023-03-27 | 2023-06-27 | 青岛科技大学 | Motion recognition method of three-stream self-adaptive graph convolution model fused with joint capture |
CN116682293A (en) * | 2023-05-31 | 2023-09-01 | 东华大学 | Experiment teaching system based on augmented reality and machine vision |
CN117115695A (en) * | 2023-09-01 | 2023-11-24 | 厦门大学 | Human-object interaction detection method based on virtual enhancement |
CN117333645A (en) * | 2023-10-31 | 2024-01-02 | 深圳天际云数字创意展示有限公司 | Annular holographic interaction system and equipment thereof |
-
2024
- 2024-02-05 CN CN202410161296.9A patent/CN117934674B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107515674A (en) * | 2017-08-08 | 2017-12-26 | 山东科技大学 | It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality |
CN116343334A (en) * | 2023-03-27 | 2023-06-27 | 青岛科技大学 | Motion recognition method of three-stream self-adaptive graph convolution model fused with joint capture |
CN116682293A (en) * | 2023-05-31 | 2023-09-01 | 东华大学 | Experiment teaching system based on augmented reality and machine vision |
CN117115695A (en) * | 2023-09-01 | 2023-11-24 | 厦门大学 | Human-object interaction detection method based on virtual enhancement |
CN117333645A (en) * | 2023-10-31 | 2024-01-02 | 深圳天际云数字创意展示有限公司 | Annular holographic interaction system and equipment thereof |
Non-Patent Citations (1)
Title |
---|
张玉军;孟晓军;白漫涛;: "基于HMM的虚拟场景控制手势识别研究", 电子设计工程, no. 24, 20 December 2018 (2018-12-20) * |
Also Published As
Publication number | Publication date |
---|---|
CN117934674B (en) | 2024-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tsolakis et al. | Agros: A robot operating system based emulation tool for agricultural robotics | |
CN108961369B (en) | Method and device for generating 3D animation | |
Takagi | Interactive evolutionary computation: System optimization based on human subjective evaluation | |
KR20190082270A (en) | Understanding and creating scenes using neural networks | |
Mania et al. | A framework for self-training perceptual agents in simulated photorealistic environments | |
Devo et al. | Deep reinforcement learning for instruction following visual navigation in 3D maze-like environments | |
WO2019057019A1 (en) | Robot interaction method and device | |
CN115578770A (en) | Small sample facial expression recognition method and system based on self-supervision | |
CN118587623A (en) | Instance-level scene recognition using visual language models | |
CN112242002B (en) | Object identification and panoramic roaming method based on deep learning | |
Chen et al. | Vision-language models provide promptable representations for reinforcement learning | |
CN117934674B (en) | Deep learning and three-dimensional animation interactive cooperation method and system | |
Breckling | Individual‐Based Modelling Potentials and Limitations | |
Qu et al. | A spatially explicit agent-based simulation platform for investigating effects of shared pollination service on ecological communities | |
KR20230065339A (en) | Model data processing method, device, electronic device and computer readable medium | |
CN113538643A (en) | Grid model processing method, storage medium and equipment | |
CN111461253A (en) | Automatic feature extraction system and method | |
KR20090126237A (en) | Human transparency paradigm | |
US20220391752A1 (en) | Generating labeled synthetic images to train machine learning models | |
Bai | [Retracted] Sustainable Development Garden Landscape Design Based on Data Mining and Virtual Reality | |
Haefner | Two metaphors of the niche | |
Tomkins et al. | Where the wild things will be: Adaptive visualisation with spatial computing | |
Bird et al. | The blurring of art and alife | |
Kruse | Interactive evolutionary computation in design applications for virtual worlds | |
Duhart et al. | Deep learning for environmental sensing toward social wildlife database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |