CN117934674A

CN117934674A - Deep learning and three-dimensional animation interactive cooperation method and system

Info

Publication number: CN117934674A
Application number: CN202410161296.9A
Authority: CN
Inventors: 林远志; 温鼎铭; 周航
Original assignee: Shenzhen Mengxiang Culture Communication Co ltd
Current assignee: Shenzhen Mengxiang Culture Communication Co ltd
Priority date: 2024-02-05
Filing date: 2024-02-05
Publication date: 2024-04-26
Anticipated expiration: 2044-02-05
Also published as: CN117934674B

Abstract

The invention relates to a deep learning and three-dimensional animation interactive cooperation method and system, which are used for carrying out simulation interactive investigation experiments on a biological community and an environmental complex, and the invention comprises a data acquisition end, an administrator end and a user end, wherein the data acquisition end is used for preprocessing input data, the administrator end is provided with a pre-task and a post-task, the pre-task is involved in modeling and optimizing a three-dimensional model, a simulation natural ecological system is realized by using an augmented reality technology and registered in reality, the post-task is used for carrying out ecological system experiments on the data collected through man-machine interaction and man-human interaction, and the user end can utilize a machine vision technology to position and process the three-dimensional model through virtual reality equipment; the stable diffusion algorithm adopted by the invention optimizes the ecological system by designing the space-time attention layer, so that the scene of the user participating in the experiment is more realistic, and the process of space-time change of the ecological system after the ecological factors are changed can be shown.

Description

Deep learning and three-dimensional animation interactive cooperation method and system

Technical Field

The invention relates to the field of teaching experiments, in particular to a method and a system for interactive cooperation of deep learning and three-dimensional animation.

Background

On the one hand, the current biological class teaching process has some disadvantages, such as: the teaching mode of trial guidance leads to the lack of deep understanding and exploratory ability of biology for students; the teaching content is too narrow: the students may only know knowledge points on some surfaces, and the students cannot fully understand and know biology; on the other hand, stable diffusion is a generation model based on diffusion process, which can generate high-resolution and realistic images, repair missing parts, add finer image features to low-resolution images for predicting dynamic changes of images and the like, but does not realize the related application of space-time shuttling on three-dimensional virtual animation in the field of teaching experiments.

Therefore, how to let the experimenter participate in the investigation experiment of the ecosystem through a teaching experiment mode of man-machine interaction of man and human, and how to show the space-time variation process of the ecosystem after the factors of the ecosystem are changed to the experimenter in a short time by using a deep learning algorithm becomes a technical problem to be solved by the technicians in the field.

Disclosure of Invention

In order to solve at least the above problems in the prior art, the present invention provides a method and a system for interactive collaboration of deep learning and three-dimensional animation, which allow experimenters to interactively collaborate to complete investigation experiments of an ecosystem, and observe a space-time variation process of the ecosystem after changing a certain factors in the ecosystem in a short time.

The invention aims at realizing the following technical scheme: the invention relates to a depth learning and three-dimensional animation interactive cooperation method, which comprises the following steps:

s1, inputting a three-dimensional model, two-dimensional images and voice data, and establishing an initial ecosystem model through processing of a material library module, a three-dimensional model optimization module and a voice prompt module;

S2, recognizing gestures and voices of the terminal user through processing of the voice prompt module, the biological recognition feature module and the positioning unit module;

S3, acquiring and analyzing the generated three-dimensional model, two-dimensional image and voice data through a data acquisition and analysis module, and then carrying out data fusion through a three-dimensional model optimization module, wherein the obtained data are used for generating a final ecosystem model through a data processing module and evaluating a current model;

The method further comprises the steps of processing the three-dimensional model, the two-dimensional image and the voice data, and comprises the following specific steps:

SS1. Construct a stable diffusion training data set comprising: the system comprises an input rough time sequence three-dimensional model, an input two-dimensional image of an ecological system, an input corresponding voice description and an output three-dimensional model;

SS2, constructing a stable diffusion deep learning open source framework, and designing an LDM network model based on a space-time attention layer in the framework, wherein the space-time attention layer models the correlation among an input coarse time sequence three-dimensional model, an input two-dimensional image of an ecological system and an input corresponding voice description so as to improve the time continuity of a simulation ecological system;

SS3, firstly, obtaining the characteristic Z _X of the hidden space through an input encoder for the input rough three-dimensional ecological system and the two-dimensional image sequence; meanwhile, mapping the corresponding voice description and gesture to the hidden space through a condition encoder to obtain a condition characteristic Z _C;

SS4. Input characteristic Z _X and conditional characteristic Z _C are input into LDM network model, adopt the space-time attention layer proposed to fuse the two characteristic, get the characteristic Z _f after denoising in step T after carrying on T;

SS5, inputting the reconstructed characteristic Z _f into a decoder to obtain a predicted time sequence three-dimensional model y, namely the data of the time sequence three-dimensional model of the ecological system;

SS6, calculating loss between the predicted time sequence three-dimensional model y and the real time sequence three-dimensional model y_gt, and updating parameters of the LDM network model through loss back propagation;

SS7, pruning and quantifying the network structure of the trained LDM network model to achieve the purposes of reducing model parameters and accelerating model efficiency;

And SS8, building an inference framework according to the quantized network model, and using the inference framework for three-dimensional animation interactive collaboration.

Preferably, in the step SS2, a stable diffusion deep learning open source framework is built, and an LDM network model based on a space-time attention layer is designed in the framework, which comprises the following steps:

Firstly, input is X e R ^H×W×3×F, which represents RGB image of size h×w sampled at time point F, n=hw/P ² for each time point, where P represents the area size, N represents how many areas there are for each time point; it can be expressed as a vector P=1,.. t=1, a, F, performing the process; the input X is subjected to embedding processing,，/>Representing a learnable matrix,/>Representing a learnable spatial position code;

Second, the converter contains L layers of encoders, each block being a multi-headed attention; for the head a of the block 1, the input of the block is subjected to linear processing to generate a reduced-dimension query, key and value, and the query, key and value of each block are expressed as follows:

；

Where a=1,..a represents the number of heads of multi-head attention and D _h represents the dimension of each head; and calling the current region q and other regions k to respectively calculate to obtain the attention layer value alpha of the region relative to other regions, wherein the attention layer value alpha is expressed as follows:

；

further, the vectors of different spatial positions at different time points without grouping are grouped, firstly, the attention layers of the regions of the same spatial position in different time dimensions are calculated, and then the attention layers of the regions of different spatial positions at the same time point are calculated, and the expression is as follows:

；

The updated value s on a certain head is obtained by weighted summation of the attention layer value and the corresponding value, and is expressed as follows:

；

the updated values on a plurality of heads are combined, linear projection is carried out on the updated values, residual error connection is added to obtain a multi-head updated value z', and the expression is as follows:

；

The final new characteristics of each region are obtained through operations such as a regularization layer, a full connection layer, residual connection and the like, and are expressed as follows:

。

preferably, the method for establishing the initial ecosystem model by inputting the three-dimensional model, the two-dimensional image and the voice data and processing the three-dimensional model, the three-dimensional model optimizing module and the voice prompt module comprises the following steps:

s11, firstly, building a material library, and storing data content of the material library in a material library module, wherein the step of building the material library comprises the steps of inputting a rough time sequence three-dimensional model, inputting an ecosystem two-dimensional image and inputting corresponding voice description;

s12, processing the input rough time sequence three-dimensional model which is modeled through a three-dimensional model optimization module, wherein the method comprises the following steps of:

S121, performing simulation development on a three-dimensional model of a material library by using three-dimensional modeling software, microsoft 3D Tools and plug-ins, and constructing a three-dimensional model of an initial ecological system and space-time evolution by optimizing a stable diffusion algorithm on the obtained simulation three-dimensional model;

S122, rendering the ecosystem model by the three-dimensional modeling software through data optimized by a stable diffusion algorithm to generate simulation ecosystem evolution generated along with time;

S123, simultaneously constructing an initial ecological system and a space-time evolution three-dimensional model through optimization of a stable diffusion algorithm to finish three-dimensional registration, so that the three-dimensional model is matched, positioned and fused with a real scene;

And S13, when the construction of the initial ecological system and the three-dimensional model is completed, the established voice prompt module carries out interactive guidance on the terminal user so as to complete the investigation experiment step.

Preferably, the processing of the voice prompt module, the biometric feature module and the positioning unit module is used for recognizing the gesture and voice of the terminal user, and the method comprises the following steps:

S21, a biological recognition feature module is established, wherein the biological recognition feature module comprises a gesture recognition module and a voice recognition module;

s211, the gesture recognition module is established, gestures are recognized based on detr models, the structure of the detr models comprises a1 st sublayer which is a multi-head attention layer, a 2 nd sublayer which is a full-connection layer, and residual connection is used between every two sublayers;

s212, establishing a voice recognition module, and recognizing voice by adopting a DNN-HMM acoustic model frame;

S22, realizing the function of processing space data and tasks by the positioning unit module by using a 3D SLAM algorithm;

S23, triggering script prompt words through recognition of gestures and voices of the terminal user, and calling the script to receive and process feedback information sent by the voice prompt module so as to guide the terminal user to complete experiment contents of the ecosystem along with the script.

Preferably, the data acquisition and analysis module acquires and analyzes the generated three-dimensional model, two-dimensional image and voice data, and the three-dimensional model optimization module performs data fusion, and the data obtained above generates a final ecosystem model by the data processing module and evaluates the current model, comprising the following steps:

S31, the data acquisition and analysis module collects and analyzes the data generated in the steps, namely, data information generated by recognizing the gestures and the voices of the terminal users is processed and analyzed through a population dynamics related model;

S32, fusing and optimizing the data generated in the steps by using a three-dimensional model optimization module of a stable diffration algorithm;

S33, using the data processed by the steps, evaluating the influence of the ecosystem by using feedback data through the created data processing module, and extracting data information in the data processing module; when the three-dimensional model in the ecological system is continuously interfered by the terminal user, the three-dimensional model optimizing module is restarted to correct; when the terminal user stops interfering the three-dimensional model in the ecological system, the experimental interaction can be ended according to the suggestion of the voice prompt module.

Preferably, the deep learning and three-dimensional animation interactive collaboration system comprises a data acquisition end, an administrator end and a user end;

The data acquisition end: the system comprises a material library module, a database module and a database module, wherein the module has the function of preprocessing a three-dimensional model, a two-dimensional image and voice data;

The manager integrates four functional modules: the system comprises a three-dimensional model optimization module, a voice prompt module, a data acquisition and analysis module and a data processing module, wherein the modules are responsible for processing and deforming a three-dimensional model and classifying and analyzing data, and jointly support an administrator terminal to realize efficient management;

The user terminal: the system comprises a biological identification feature module and a positioning unit module, wherein received data input by a terminal user are transmitted back to an administrator side in real time through wearing virtual reality equipment.

Preferably, the administrator terminal and the user terminal set the IP address through developing a script, so as to ensure that the administrator terminal and the user terminal establish a TCP/IP protocol connection in the same network.

Preferably, the user terminal includes a virtual reality device, where the virtual reality device includes an audio device, a video device, and a positioning device:

the audio equipment comprises a sound input device and a sound output device;

The audio equipment comprises a sound input device and a sound output device; the voice input device provides a function of collecting voice information of a user for the voice prompt module and the biological recognition feature module; the sound output equipment comprises a voice prompt module which is a key component for realizing a voice prompt function of the sound output equipment, is responsible for experimental guidance of ecological investigation experiment determination and flow, and presents the result of interference with the final environmental influence of the ecological system to a terminal user;

the video device: the system comprises a projection device and a video acquisition device; the projection equipment is responsible for presenting the three-dimensional model subjected to the deep learning process to an end user; the video acquisition equipment is responsible for acquiring gesture information of a terminal user and leading the gesture information into the manager side data acquisition and analysis data module;

The positioning device: and the two modules work cooperatively and accurately position the terminal user.

The beneficial effects of the invention are as follows:

According to the deep learning and three-dimensional animation interactive collaboration method and system provided by the invention, the scene of the terminal user participating in the investigation experiment is more real through the improvement of the stable diffration algorithm, and the ecological system space-time change process after the ecological system factors are changed can be inferred and displayed in a short time by utilizing the algorithm.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a method and system for interactive collaboration of deep learning and three-dimensional animation in accordance with the present invention;

FIG. 2 is a schematic flow chart of a method for interactive cooperation of deep learning and three-dimensional animation based on an improved stable diffusion algorithm;

FIG. 3 is a flow chart of a method for designing a spatiotemporal attention layer in interactive collaboration of deep learning and three-dimensional animation according to the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, based on the examples herein, which are within the scope of the invention as defined by the claims, will be within the scope of the invention as defined by the claims.

As shown in FIG. 1, the invention provides a deep learning and three-dimensional animation interactive collaboration method, which comprises the following steps:

S1, inputting a three-dimensional model, two-dimensional images and voice data, and establishing an initial forest ecosystem model through processing of a material library module, a three-dimensional model optimization module and a voice prompt module;

S3, acquiring and analyzing the generated three-dimensional model, two-dimensional image and voice data through a data acquisition and analysis module, and carrying out data fusion through a three-dimensional model optimization module, wherein the obtained data are used for generating a final forest ecosystem model through a data processing module and evaluating the current forest ecosystem model;

the above steps further include processing the "three-dimensional model, two-dimensional image and voice data", as shown in fig. 2, specifically including the steps of:

Specifically, in the step SS2, a stable diffusion deep learning open source framework is built, and an LDM network model based on a space-time attention layer is designed in the framework, which comprises the following steps:

；

As shown in fig. 3, vectors of different spatial positions at different time points without grouping are grouped, first, attention layers of regions of the same spatial position in different time dimensions are calculated, and then the attention layers of the regions of different spatial positions at the same time point are calculated, expressed as follows:

；

。

Specifically, the method for establishing the initial ecosystem model by inputting the three-dimensional model, the two-dimensional image and the voice data and processing by the material library module, the three-dimensional model optimizing module and the voice prompt module comprises the following steps:

The input rough time sequence three-dimensional model comprises a plant rough time sequence three-dimensional model, an animal rough time sequence three-dimensional model and an environment rough time sequence three-dimensional model;

The plant rough time sequence three-dimensional model comprises rough time sequence three-dimensional models of trees, shrubs, vines and herbaceous plants in different periods of embryo period, juvenile period, growing period, flowering period and dormancy period;

The animal-derived time-sequential three-dimensional model includes three-dimensional models of more than 3 classes of organisms ranging from insects to mammals over different growth phases, such as: snake, bird, murine;

the input environment rough time sequence three-dimensional model comprises three-dimensional models of soil, moisture, illumination and climate in different seasons;

the input corresponding text description specifically comprises detailed descriptions such as the fact that the forest ecosystem is distributed in the climate region of the earth, the role type played by the time sequence three-dimensional model in the food chain relation, the relative position of the three-dimensional model in the application scene and the like;

the output three-dimensional model is a three-dimensional model of a simulated forest ecological system;

S13, when the construction of the initial ecological system and the three-dimensional model is completed, namely, the three-dimensional model is output to be a three-dimensional model of the simulation forest ecological system with time sequence, the established voice prompt module carries out interactive guidance on the terminal user;

That is, the system can send out voice prompt when the initial forest ecosystem model is completed, such as: let us now perform three-dimensional simulation experiments of the ecosystem, please participate in the evolution process of the ecosystem; "what you present is a simulated scene of the forest ecosystem, there are many animals and plants on your right side for you to choose from.

Specifically, the method for recognizing the gesture and the voice of the terminal user through the processing of the voice prompt module, the biological recognition feature module and the positioning unit module comprises the following steps:

S211, a gesture recognition module is established, dynamic gesture images of a terminal user are acquired through virtual reality equipment, gestures are recognized on the basis of detr models, the structure of the detr model comprises a1 st sub-layer which is a multi-head attention layer, a 2 nd sub-layer which is a full-connection layer, and residual connection is used between every two sub-layers;

s212, a voice recognition module is established, voice data of a terminal user are collected through virtual display equipment, and voice is recognized through a DNN-HMM acoustic model frame;

S22, realizing the function of processing space data and tasks by a positioning unit module by using a 3D SLAM algorithm, and enhancing the simultaneous positioning and mapping of virtual and reality;

S23, triggering script prompt words through recognition of gestures and voices of the terminal user, and calling the script to receive and process feedback information sent by the voice prompt module so as to guide the terminal user to complete experiment contents of the ecosystem along with the script; for example: after hearing the system voice prompt that one or more animals and plants can be selected to be 'picked up' and 'placed into the forest ecosystem in front of you, the terminal user can continuously' pick up 'one animal and plant and' place into the simulated forest ecosystem in front of you according to the voice prompt.

Specifically, the data acquisition and analysis module acquires and analyzes the generated three-dimensional model, two-dimensional image and voice data, and then the three-dimensional model optimization module performs data fusion, the obtained data is used for generating a final ecosystem model by the data processing module and evaluating the current model, and the method comprises the following steps:

S31, the data acquisition and analysis module collects and analyzes the data generated in the steps, namely, data information generated by recognizing the gestures and the voices of the end user is processed and analyzed through a population dynamics related model, for example: the competition relationship exists between two species, and the snake and the prey bird both feed on mice, so that a Lotka-Volterra competition model is utilized for simple analysis: for the prey population, the rate of change is expressed as: dN ₁/dt=r₁N₁(1-N₁/K₁-αN₂/K₁), wherein N ₁ represents the number of prey populations, r ₁ represents the natural growth rate of the prey populations, K ₁ represents the environmental containment of the prey, α represents the competition coefficient of the prey against the prey, and N ₂ represents the number of prey populations; for predator populations, the rate of change is expressed as: dN ₂/dt=r₂N₂(1-N₁/K₂-βN₁ /K₂), wherein N ₂ represents the number of predators, r ₂ represents the natural growth rate of the predators population, K ₂ represents the environmental containment of predators, β represents the competition coefficient of the prey for predators, N ₁ represents the number of prey populations, wherein the competition coefficients α and β are empirically set according to the specific species and environmental conditions, K ₁>K₂/β,K₂<K₁/α: species 1 wins and species 2 vanishes; k ₁<K₂/β,K₂>K₁/α: species 2 wins and species 1 vanishes; k ₁>K₂/β,K₂>K₁/α: species 1,2 may form co-existence, but are unstable; k ₁<K₂/β,K₂<K₁/α: species 1,2 may form stable coexistence; the number of the rodents in the current three-dimensional model of the forest ecosystem is relatively stable, if the number of the prey birds is increased in the current forest ecosystem by the end user, the prey birds can more prey the rodents, and the reduction of the number of the rodents can also lead to the reduction of the number of the snakes; vice versa;

S32, carrying out fusion and optimization processing on the data generated in the steps by using a three-dimensional model optimization module of a stable diffusion algorithm, namely inputting gesture and voice data generated by a terminal user and data preprocessed by a material library module;

S33, using the data processed by the steps, evaluating the influence of the ecosystem by using feedback data through the created data processing module, and extracting data information in the data processing module; when the three-dimensional model in the ecological system is continuously interfered by the terminal user, the three-dimensional model optimizing module is restarted to correct; when the terminal user stops interfering the three-dimensional model in the ecological system, the experimental interaction can be ended according to the suggestion of the voice prompt module;

I.e. hearing the voice prompt of the system, "if you have already added, please say 'done', we will end the participation process"; at this time, the terminal user should continuously pick up an animal and plant according to the voice prompt and put into the simulated forest ecosystem in front of the ground; if the interval is 1 minute or the investigation is finished, the investigation experiment step is finished, the data processing module is called in the system to perform data fusion and three-dimensional model optimization on the interfered forest ecosystem, namely new data generated by gestures and actions of the end user, and the evolution process of the forest ecosystem with time sequence is output to the end user, and when obvious influence is presented, the system has the following steps: animals and plants in any ecosystem disappear, the evolution of the forest ecosystem is stopped in the extreme state, and the final evolution result evaluation is presented to the end user in a voice broadcasting mode.

Specifically, the deep learning and three-dimensional animation interactive collaboration system comprises a data acquisition end, an administrator end and a user end, as shown in fig. 1;

Specifically, the administrator side and the user side set an IP address through development scripts, so as to ensure that the administrator side and the user side establish TCP/IP protocol connection in the same network.

Specifically, the user side includes a virtual reality device, where the virtual reality device includes an audio device, a video device, and a positioning device:

the audio equipment comprises a sound input device and a sound output device;

In the embodiment of the invention, a coarse time sequence three-dimensional model, an ecological system two-dimensional image and voice description are taken as input, the influence on forest ecological system factors is completed in a gesture and voice mode under the system voice prompt, a population dynamics related model is taken as a theoretical basis support and reference basis, the condition that the influence changes with time on the population quantity related to the forest ecological system is predicted, and the scene of the investigation experiment is more real through an improved stable diffusion algorithm, and particularly, the space-time change process of the forest ecological system after the ecological system factors are changed can be displayed in a reasoning manner in a short time.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.

It should be noted that in the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "plurality" means at least two.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. A depth learning and three-dimensional animation interactive cooperation method is characterized in that:

SS2, constructing a stable diffusion deep learning open source framework, and designing an LDM network model based on a space-time attention layer in the framework, wherein the space-time attention layer models the correlation among an input coarse time sequence three-dimensional model, an input ecological system two-dimensional image and an input corresponding voice description;

SS5, inputting the reconstructed characteristic Z _f into a decoder to obtain a predicted time sequence three-dimensional model y;

SS7, pruning and quantifying the network structure by the LDM network model trained by the steps;

and SS8, building an inference framework according to the quantized network model and using the inference framework for three-dimensional animation interactive collaboration.

2. The interactive collaboration method of deep learning and three-dimensional animation according to claim 1, wherein in the step SS2, a stable diffusion deep learning open source framework is built, and an LDM network model based on a space-time attention layer is designed in the framework, and the space-time attention layer is processed step by step, including:

calculating the attention layer of the space-time attention layer in the regions with the same spatial position in different time dimensions;

and calculating the attention layers of the areas with different spatial positions at the same time point.

3. The interactive cooperation method of deep learning and three-dimensional animation according to claim 1, wherein the initial ecosystem model is built by inputting three-dimensional model, two-dimensional image and voice data and processing by a material library module, a three-dimensional model optimizing module and a voice prompt module, comprising the steps of:

S122, simultaneously constructing an initial ecological system and a space-time evolution three-dimensional model through optimization of a stable diffusion algorithm to finish three-dimensional registration, so that the three-dimensional model is matched, positioned and fused with a real scene;

4. The interactive deep learning and three-dimensional animation collaboration method of claim 1, wherein the recognition of the gestures and voices of the end user by the processing of the voice prompt module, the biometric feature module and the location unit module comprises the steps of:

S211, establishing a gesture recognition module, and recognizing gestures based on detr models;

5. The interactive cooperation method of deep learning and three-dimensional animation according to claim 1, wherein the data acquisition and analysis module acquires and analyzes the generated three-dimensional model, two-dimensional image and voice data, and the data fusion is performed by the three-dimensional model optimization module, and the data obtained above is used for generating a final ecosystem model by the data processing module and evaluating the current model, and the method comprises the following steps:

6. The deep learning and three-dimensional animation interactive collaboration system of claim 1, comprising:

And a data acquisition end: the system comprises a material library module, a database module and a database module, wherein the module has the function of preprocessing a three-dimensional model, a two-dimensional image and voice data;

Administrator side: integrating a plurality of functional modules: the system comprises a three-dimensional model optimization module, a voice prompt module, a data acquisition and analysis module and a data processing module, wherein the modules support an administrator terminal to realize efficient management of a three-dimensional model, voice prompt, data acquisition and analysis and data processing;

the user terminal: comprises a biometric feature module and a location unit module for receiving and processing user input and data.

7. The interactive deep learning and three-dimensional animation collaboration system of claim 6, wherein the client comprises a virtual reality device comprising an audio device, a video device, and a positioning device:

The audio equipment comprises a sound input device and a sound output device; the voice input device provides a function of collecting voice information of a user for the voice prompt module and the biological recognition feature module; the voice output equipment comprises a voice prompt module which is a key component for realizing a voice prompt function;