CN118762112A

CN118762112A - Virtual digital person management method, device, equipment and storage medium

Info

Publication number: CN118762112A
Application number: CN202411248495.XA
Authority: CN
Inventors: 史济轩; 辛明海; 曾毅
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2024-09-06
Filing date: 2024-09-06
Publication date: 2024-10-11

Abstract

The application provides a virtual digital person management method, a device, equipment and a storage medium, which are applied to the technical field of data processing. The method comprises the steps of obtaining target user information and a training sample set; preprocessing a training sample set to generate a training sample set with identification information; performing feature extraction processing on the training sample set with the identification information to generate an original feature vector; processing the original feature vector based on a preset processing rule to generate a training set and a verification set; acquiring an initial digital man-driven model matched with the identification information; training the initial digital man-driven model based on the training set and the verification set to generate a target digital man-driven model; processing the target user information to generate attribute information of the target user; and processing the attribute information of the target user based on the target digital person driving model to generate target digital person information, wherein the target digital person information is used for representing information reply to the target user based on the target digital person.

Description

Virtual digital person management method, device, equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for managing virtual digital people.

Background

The virtual digital man is a multi-mode intelligent man-machine interaction technology which integrates multiple technologies such as computer vision, voice recognition, voice synthesis, natural language processing, terminal display and the like, and creates a highly anthropomorphic virtual image which is interacted and communicated with a person like a real person. Major drawbacks of existing virtual digital person and intelligent cargo systems include:

1. The interactive experience is poor: virtual digital people have shortcomings in terms of speech recognition, natural language processing and speech synthesis, resulting in delays and unsmoothness of conversations. The recommendation algorithm of the intelligent belt cargo system is low in accuracy, and user experience and sales transformation are affected.

2. The technology integration is difficult, and the adaptability is poor: systems incorporate multiple technologies but face compatibility and integration challenges, resulting in high maintenance costs and computational demands. The adaptability and stability of the virtual digital man system in different application scenes are insufficient.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The application aims to provide a management method, a device, equipment and a storage medium for virtual digital people, which at least overcome the problems existing in the prior art to a certain extent, collect and preprocess user data for creating digital people capable of replying to users, extract characteristics such as inverse document frequency values and emotion characteristics, train and optimize a digital person model, and finally generate digital person information capable of understanding and responding to user demands.

Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.

According to an aspect of the present application, there is provided a management method of a virtual digital person, including: acquiring target user information and a training sample set, wherein the target user information comprises voice data, limb motion data and facial expression data of a target user in a preset time period, and the training sample set comprises other user information matched with the target user information; preprocessing the training sample set to generate a training sample set with identification information, wherein the identification information is used for representing that other user data are in an abnormal state; performing feature extraction processing on the training sample set with the identification information to generate an original feature vector, wherein the original feature vector comprises an initial text word frequency, an initial inverse document frequency value and initial emotion features, and the initial emotion features are generated based on voice data, limb motion data and facial expression data of other users; processing the original feature vector based on a preset processing rule to generate a training set and a verification set; acquiring an initial digital man-driven model matched with the identification information; training the initial digital man-driven model based on the training set and the verification set to generate a target digital man-driven model; processing the target user information to generate attribute information of a target user; and processing the attribute information of the target user based on the target digital person driving model to generate target digital person information, wherein the target digital person information is used for representing information reply to the target user based on the target digital person.

In one embodiment of the present application, performing feature extraction processing on the training sample set with the identification information to generate an original feature vector, including: the method further comprises a calculation formula for calculating the initial emotion characteristics, wherein the calculation formula is as follows:

；

Wherein, Respectively an input door, a forget door and an output door,Is the state of the cell, i.e. the initial emotional characteristic,Is in a hidden state and is in a closed state,AndIs the weight and bias parameter of the device,Is a sigmoid function of the number of bits,Is a function of the hyperbolic tangent,Is a characteristic weight matrix of the input gate,Is the hidden state of the previous time step,Is an input of the current time step,Is a bias term of the input gate,Is a forgetful door which is a door which is left,Is a characteristic weight matrix of the forgetting gate,Is an offset item of the forgetting door,Is a characteristic weight matrix of the output gate,Is a bias term for the output gate,Is a candidate for the memory cell to be used,Is a weight matrix of candidate memory cell states,Is an offset term for the candidate memory cell state,Cell status of the last time step.

In one embodiment of the present application, processing the original feature vector based on a preset processing rule to generate a training set and a verification set includes:

Acquiring a preset judgment matrix, feature weight information matched with the original feature vector and a plurality of engine information;

Processing the original feature vector and feature weight information matched with the original feature vector to generate a target feature vector, wherein the target feature vector comprises word frequency, an inverse document frequency value and emotion features;

Processing the target feature vector and a plurality of engine information to generate an engine selection result;

Generating a training set and a verification set based on the engine selection result;

the method further comprises a calculation formula for calculating the characteristic weight information, wherein the calculation formula is as follows:

；

Wherein, Is an element in a preset judgment matrix,Is the number of criteria that are to be used,Is a characteristic weight matrix of the input gate;

The method further comprises a calculation formula for calculating the target feature vector, wherein the calculation formula is as follows:

；

wherein, word frequency Inverse document frequency valueAnd emotional characteristicsTheir weights are respectively、And。

In one embodiment of the present application, processing the target feature vector and the plurality of engine information to generate an engine selection result includes: the method further comprises a calculation formula for calculating a probability of selecting the target engine based on the target feature vector, wherein the calculation formula is as follows: ; wherein, Is characterized in thatDown selection engineProbability of (2),Is characterized byIn the engineThe following conditional probability,Is an enginePrior probability of (2),Is characterized byIs a priori probability of (c).

In one embodiment of the application, training the initial digital person-driven model based on the training set and the validation set to generate a target digital person-driven model includes: training the initial digital man-driven model based on a training set to generate a prediction result label; performing prediction processing on the initial digital man-driven model based on the verification set to generate a target result label; processing the predicted result label and the target result label to generate a model loss function; processing the model loss function to generate model adjustment parameters; processing the initial digital man-driven model based on the model adjustment parameters to generate a target digital man-driven model; the method further comprises a calculation formula for calculating a model loss function, the calculation formula being: Wherein, Is a model loss function,Is a target result label,Is a prediction result label, N is the total number of samples, and i is the index of the samples.

In one embodiment of the present application, the processing the target user information to generate attribute information of the target user includes: acquiring real-time data processing information of a target server, wherein the real-time data processing information of the target server is used for representing data processing progress information of the target server; performing data cleaning processing on the target user information to generate real-time physiological parameter information of the target user and to-be-handled matter information of the target user; processing the real-time physiological parameter information of the target user and the target user agent information based on the data processing progress information of the target server to generate priority information of target user agent information; and generating attribute information of the target user based on the priority information of the target user to-do matters information, wherein the attribute information of the target user is used for representing the real-time to-do matters information of the target user.

In one embodiment of the present application, processing attribute information of the target user based on the target digital person driving model to generate target digital person information includes: processing attribute information of the target user based on the target digital person driving model to generate a data processing result matched with implementation to-be-handled matter information of the target user; processing the data processing result based on the target digital person driving model to generate character information of the digital person, action information of the digital person and expression information of the digital person; and processing the text information of the digital person, the action information of the digital person and the expression information of the digital person to generate target digital person information.

In another aspect of the present application, a virtual digital person management apparatus includes: the system comprises an acquisition module, a training sample set and a processing module, wherein the acquisition module is used for acquiring target user information and the training sample set, the target user information comprises voice data, limb action data and facial expression data of a target user in a preset time period, and the training sample set comprises other user information matched with the target user information; acquiring an initial digital man-driven model matched with the identification information; the processing module is used for preprocessing the training sample set to generate a training sample set with identification information, wherein the identification information is used for representing that other user data are in an abnormal state; performing feature extraction processing on the training sample set with the identification information to generate an original feature vector, wherein the original feature vector comprises an initial text word frequency, an initial inverse document frequency value (TF-IDF value) and initial emotion features, and the initial emotion features are generated based on voice data, limb motion data and facial expression data of other users; processing the original feature vector based on a preset processing rule to generate a training set and a verification set; training the initial digital man-driven model based on the training set and the verification set to generate a target digital man-driven model; processing the target user information to generate attribute information of a target user; and processing the attribute information of the target user based on the target digital person driving model to generate target digital person information, wherein the target digital person information is used for representing information reply to the target user based on the target digital person.

According to still another aspect of the present application, an electronic apparatus, comprising: a first processor; and a memory for storing executable instructions of the first processor; wherein the first processor is configured to perform a method of managing a virtual digital person implementing the above via execution of the executable instructions.

According to still another aspect of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a second processor, implements the above-described virtual digital person management method.

According to a further aspect of the present application, there is provided a computer program product comprising a computer program, characterized in that the computer program, when executed by a third processor, implements the method of managing virtual digital persons as described above.

The application provides a management method, a device, equipment and a storage medium for virtual digital people, wherein a server collects voice, limb actions and facial expression data of a user, constructs training samples, cleans the data and extracts characteristics such as text word frequency, TF-IDF values and emotion characteristics. A training set and a validation set are created using the features, training a digital human driven model. And through model training, optimizing parameters, and generating a target digital human driving model capable of accurately reflecting the user demands. And finally, processing the user information, and generating digital person information which can be effectively replied to the user by combining the model, so as to ensure that the digital person can provide personalized and accurate service.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

FIG. 1 is a flow chart of a method for managing virtual digital people according to an embodiment of the present application;

Fig. 2 is a schematic structural diagram of a virtual digital person management device according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an electronic device according to an embodiment of the present application;

Fig. 4 is a schematic diagram of a storage medium according to an embodiment of the present application.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

A method of managing a virtual digital person according to an exemplary embodiment of the present application is described below with reference to fig. 1. It should be noted that the following application scenarios are only shown for facilitating understanding of the spirit and principles of the present application, and embodiments of the present application are not limited in this respect. Rather, embodiments of the application may be applied to any scenario where applicable.

In an implementation manner, the application further provides a virtual digital person management method, a device, equipment and a storage medium. Fig. 1 schematically shows a flow diagram of a method of managing virtual digital persons according to an embodiment of the application. As shown in fig. 1, the method is applied to a server, and includes:

s101, acquiring target user information and a training sample set.

In one embodiment, the target user information includes voice data, limb motion data, and facial expression data of the target user over a preset period of time. The analysis of the target user information involves the comprehensive processing of the voice data, limb motion data and facial expression data to determine what the target user is about to express, i.e., to determine what the target user is about to express, including the corresponding voice, motion and expression information matching it.

For processing of target user information, speech, motion and expression data may be converted into a format suitable for processing by a bayesian classifier using methods similar to text classification. For example, feature extraction techniques may be used to convert speech to a spectrogram, and limb movements and facial expressions to quantifiable feature vectors. These feature vectors may then be used as inputs to train a naive bayes classifier to identify and predict user intent.

The construction of the training sample set requires the inclusion of other user information that matches the target user information to ensure the generalization ability of the model. Model optimization may be performed by adjusting the hyper-parameters of the bayesian classifier (e.g., laplace smoothing coefficients in polynomial naive bayes) during training. In terms of model evaluation and tuning, cross-validation and grid search techniques may be used to find the best model parameters. In addition, the performance of the model can be evaluated through indexes such as confusion matrix, accuracy rate and recall rate, and through comprehensive analysis of target user information, the recognition and prediction accuracy of user intention can be improved by using a naive Bayesian classifier.

S102, preprocessing the training sample set to generate a training sample set with identification information.

In one embodiment, preprocessing steps are critical in processing multimodal data of target user information, including cleansing and normalization of voice data, limb motion data, and facial expression data, to ensure the quality of the data. For the identification information generation of the abnormal state, first, thorough cleaning of the data is required to remove noise and abnormal values, including processing of voice and video data using a filtering algorithm or a noise reduction algorithm. Statistical methods or machine learning models are used to identify outliers in the data. For example, algorithms such as Z-score, IQR (quartile range) or machine learning based isolated forests, DBSCAN, LOF (local outliers), etc. may be utilized.

Features that characterize abnormal states, such as volume of sound, magnitude of limb motion, degree of distortion of facial expression, etc., are constructed and can be used as inputs to the model to help identify the abnormal state. In anomaly detection, the detection results of a plurality of algorithms can be combined by using Bayesian theory, so that the accuracy of anomaly detection is improved. For example, if multiple algorithms all identify a sample as anomalous, the probability that the sample is truly anomalous may be very high. The pattern of the normal state is learned by training a sample set, and then this model is used to identify sample points that are significantly different from the normal pattern, which can be achieved by training a naive bayes classifier or other machine learning model. In multi-modality learning, special attention is required to how to handle fusion and application of different modality data. For example, in multi-modal learning of images and text, CNNs may be used to extract image features, RNNs or transformers may be used to process text data, and then the features may be fused.

By the method, the training sample set can be effectively preprocessed, and the training sample set with the identification information is generated to represent that other user data are in an abnormal state, so that a solid foundation is provided for subsequent model training and abnormality detection.

And S103, carrying out feature extraction processing on the training sample set with the identification information to generate an original feature vector.

In one embodiment, the original feature vector includes an initial text word frequency, an initial inverse document frequency value (TF-IDF value), and an initial emotion feature that is generated based on voice data, limb motion data, and facial expression data of other users.

Wherein, by calculating the occurrence times of each word in the input text, the initial text word frequency can be obtained. In particular, the method comprises the steps of,; Wherein, The term(s) is (are) represented,A document is represented and,Representing a termIn a documentIs used to determine the number of occurrences of the picture,Representing documentsIs a total word number of (a).

The inverse document frequency value is a measure of the importance of a word in all documents, reflected by calculating how rare the word appears in all documents, and in particular,Wherein, Representing the total number of documents,Representation contains termsIs a number of documents.

Feature extraction of initial emotion features is an important task in Natural Language Processing (NLP) that can help us understand emotion tendencies and emotion intensities in text data. The application can finish the extraction of the emotion characteristics in the following way.

The emotional tendency of text is evaluated using predefined positive and negative vocabulary lists, for example, words such as "good", "excelent" may be labeled positive, and words such as "bad", "terrible" may be labeled negative. Training a machine learning model to automatically recognize emotion tendencies of text is usually performed by using labeled data sets, and the model can be based on traditional machine learning, such as a Support Vector Machine (SVM), random forest and the like, and can also be based on deep learning, such as a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN) and the like. In addition to emotional tendency, emotional intensity is also an important aspect of emotional analysis, using emotional intensity analysis tools such as VADER (VALENCE AWARE Dictionary AND SENTIMENT Reasoner), which not only distinguish between positive and negative emotions, but also provide a score for emotional intensity.

Using syntactic analysis to understand sentence structure, identifying the context and strength of an emotion word, for example, a negation (e.g., "not") or a degree adverb (e.g., "ver", "extremely") may change the meaning and strength of the emotion word. And analyzing the dependency relationship among the words in the text to determine modifiers of emotion words and factors affecting emotion strength. Considering the overall context of text, including irony, metaphor, etc., which may affect an accurate judgment of emotion, emotion feature extraction may also require consideration of such multimodal information if the text is combined with images or other modal data.

The emotion feature extraction comprises text preprocessing (such as word segmentation and stop word removal), feature extraction, model training and application, and the emotion feature extraction can help to realize emotion analysis, product comment analysis, social media monitoring and other functions.

The method further comprises a calculation formula for calculating the initial emotion characteristics, wherein the calculation formula is as follows:

；

Wherein, Respectively an input gate, a forgetting gate and an output gate, respectively determining the influence of the current input on the cell state, which information in the cell state needs to be forgotten and which information is to be output as a hidden state,Is the state of the cell, i.e. the initial emotional characteristic,Is in a hidden state and is in a closed state,AndIs the weight and bias parameter of the device,Is a sigmoid function of the number of bits,Is a function of the hyperbolic tangent,Is a characteristic weight matrix of the input gate, which is associated with the input [ input ]，Multiplying the values of the values,Is the hidden state of the previous time step,Is an input of the current time step,Is a bias term of the input gate,Is a forgetting gate which determines how much old information is kept in the cell state at the current time step t, which is also calculated by a neural network layer with sigmoid activation function; Is a characteristic weight matrix of the forgetting gate, Is an offset item of the forgetting door,Is a characteristic weight matrix of the output gate,Is a bias term for the output gate,Is a candidate for the memory cell to be used,Is a weight matrix of candidate memory cell states,Is an offset term for the candidate memory cell state,Cell status of the last time step.

And extracting emotion change characteristics on the time sequence from the input of the user, and processing the emotion change characteristics based on the emotion change characteristics to generate initial emotion characteristics. For example, when a user inputs a piece of text, the previous emotion information is processed word by word and retained, and the emotion state, cell state, of the current time step is determined by calculating the values of the input gate, the forgetting gate and the output gatePreserving long-term dependency information of emotion while hiding stateThe emotional characteristics of the current moment are output.

S104, processing the original feature vector based on a preset processing rule to generate a training set and a verification set.

In one embodiment, a preset judgment matrix, feature weight information matched with the original feature vector and a plurality of engine information are obtained, and the calculation of the feature weight information is completed based on the following calculation formula, specificallyWherein, Is an element in the judgment matrix and,Is the number of criteria that are to be used,Is a characteristic weight matrix of the input gate.

And processing the original feature vector and feature weight information matched with the original feature vector to generate a target feature vector, wherein the target feature vector comprises word frequency, an inverse document frequency value and emotion features. Assume that the application has three features: word frequencyInverse document frequency valueAnd emotional characteristicsTheir weights are respectively、AndThe calculation of the target feature vector is completed based on the following calculation formula, specifically; Wherein, word frequencyInverse document frequency valueAnd emotional characteristicsTheir weights are respectively、And。

Processing the target feature vector and a plurality of engine information to generate an engine selection result; the calculation of the feature weight information is completed based on the following calculation formula, specificallyWherein, the method comprises the steps of, wherein,Is characterized in thatDown selection engineProbability of (2),Is characterized byIn the engineThe following conditional probability can be estimated by historical data,Is an engineRepresenting the prior probability of selecting in the absence of any characteristic informationProbability of (2),Is characterized byIs a priori probability of (c).

A training set and a validation set are generated based on the engine selection results, and an appropriate algorithm or model is selected according to the type of problem and the data characteristics. For example, for classification problems, naive bayes, support vector machines, random forests, etc. might be chosen, with data segmentation techniques being used to divide the data set into training and validation sets. In general, a "thin_test_split" function may be used, implemented in the "sklearn" library, allowing the scale of the test set to be specified (e.g., 'test_size=0.2' represents 20% of the data as the validation set).

The original feature vectors are divided into training sets and verification sets, the training sets are generally used for training an initial certificate identification model according to random distribution of a certain proportion (such as 80% training sets and 20% verification sets), model parameters are optimized through iteration, model performance is evaluated on the verification sets, model structures or parameters are adjusted to improve accuracy and generalization capability, the model is further optimized according to performances on the verification sets by using technologies such as cross verification and super parameter tuning, and after the training and verification processes are completed, an optimized digital person driving model is obtained, the model can process data and generate digital person driving information, and comprehensive evaluation including indexes such as accuracy, recall rate and F1 score is performed on the digital person driving model.

In another embodiment, adjacent features are generated based on distances between any number of data features and other number of data features of the same class, wherein the adjacent features include a preset number of any number of data features; in the field of data analysis and machine learning, generating adjacent features based on distances between data features is a feature engineering method aimed at discovering correlations and patterns between features. Specifically, a certain number of data features are selected from the original feature data, distances between any data feature and other data features in the same class are calculated by using distance measures (such as euclidean distance, manhattan distance or cosine similarity), other data features closest to any data feature are determined according to the calculated distances, the features are considered to be adjacent, a preset number is set, namely the number of data features which are expected to be contained in adjacent feature sets, the data features closest to the preset number are selected for each data feature to form adjacent feature sets, adjacent feature sets are analyzed, potential relations and modes between the features are identified, the relations may be critical to understanding the use condition of the electronic certificate, importance of each data feature in the adjacent feature sets is evaluated, and which features are most critical to the predicted target variable is determined. That is, the user may go to which venues during which time periods, and whether there is a sequence of occurrence between different venues, etc.

If the number of features in adjacent feature sets is excessive, a dimension reduction technique (e.g., principal component analysis PCA) may be required to reduce the number of features while retaining important information, the generated adjacent feature sets may be used to train a machine learning model to predict or classify the usage of the electronic document, evaluate the performance of the model, such as accuracy, recall, and F1 score, and optimize the model as needed, and may require multiple adjustments to preset number and distance metrics to find the optimal feature set. By the method, the complexity of the service condition of the electronic certificate can be better understood, and the prediction capability of the early warning model is improved. This is important for security management and risk assessment of electronic certificates. Meanwhile, the method is also beneficial to improving the recognition capability of the model on abnormal behaviors and ensuring the safety and reliability of an electronic certificate system.

Determining a sampling ratio based on the number of data features in the training set, determining a sampling ratio based on the sampling ratio, sampling adjacent features based on the sampling ratio, generating a preset number of sampling features, and generating a plurality of groups of data groups based on any one data feature and each sampling feature, wherein each group of data groups contains a preset number of data samples, and at least one data sample contains identification information.

In data analysis and machine learning, the sampling proportion and sampling ratio are determined based on the number of data features in a training set, the distribution condition of different features is known by counting the number of the data features in the training set, and the sampling proportion is determined according to the number of the data features and the scale of the data set. The sampling ratio may help us decide how many samples to extract from each feature, from which a specific sampling ratio is calculated. The sampling ratio is a ratio of an actual number of samples to a total number of samples, and adjacent features are sampled using the sampling ratio to generate a preset number of sampled features. This process ensures the representativeness and diversity of the sample by generating a set of sampled features through the sampling process that will be used for subsequent data analysis and model training, constructing multiple sets of data sets based on any one data feature and each sampled feature. Each data set contains a predetermined number of data samples. Ensuring that the samples in each set of data sets are diverse, including at least one sample with identification information, facilitates model learning to feature representation under different conditions, model training using the data sets, evaluating contributions of different features to a prediction target, training a machine learning model using a set of sampled features, and evaluating performance of the model by a validation set, it may be necessary to adjust sampling proportions and ratios to optimize the predictive power of the model based on the results of model training and validation. By this method, large-scale data sets can be processed efficiently while maintaining the quality of the data and generalization ability of the model, helping to identify potential security risks and abnormal behavior.

S105, acquiring an initial digital man-driven model matched with the identification information.

In one embodiment, the acquisition of an initial digital human driven model matched with the identification information involves the detection and processing of user anomalies that may include the absence of sound, excessive limb movement amplitude, or facial distortion. There is a need for a system that accurately analyzes user needs and preferences to provide personalized merchandise and health advice, and optimizes based on immediate feedback from the user to improve relevance and practicality of the advice. In the field of personalized recommendation, recommendation algorithms based on user behavior analysis play a vital role. For example, the user behavior information is converted into a user scoring matrix, and the deviation information is added by using an improved regularized non-negative matrix factorization algorithm, so that the behavior information such as clicking, purchasing, browsing, collecting and the like of the user is fully utilized, and interested items are actively recommended to the user.

Based on the above, a corresponding digital person is generated, and the corresponding information display is performed according to the requirements of the user, and the facial expression and the limb motion of the digital person can be generated from the audio through the AI algorithm, so that a unified frame is required to process the facial expression and the limb motion simultaneously, and the overall coordination of the animation is ensured.

In addition, digital twinning provides new opportunities for real-time monitoring of data processing, changes in the processing and operating environments can be considered, and improvement of anomaly detection accuracy and fault diagnosis results can be facilitated. The framework consists of three parts, namely physical products, virtual products and data stream connection, and can be applied to real-time tool state monitoring (TCM). The Wav2Lip algorithm is adopted to realize voice-driven Lip movement to generate video, and the accuracy of mouth shape and audio frequency synchronization is improved through a special mouth shape synchronization discriminator.

And S106, training the initial digital man-driven model based on the training set and the verification set, and generating a target digital man-driven model.

In one embodiment, the initial digital person-driven model is trained based on a training set to generate a predicted outcome tag, the initial digital person-driven model is predicted based on a verification set to generate a target outcome tag, and the predicted outcome tag and the target outcome tag are processed to generate a model loss function. Before training the model, the data of the training set and the verification set need to be preprocessed, which may include normalization, denoising, feature extraction and the like, so as to improve the training efficiency and accuracy of the model, the training set data is used for training the digital human driven model, and at this stage, parameters of the model are optimized by learning patterns and rules in the training data. After training is completed, the model can generate prediction result labels for the data of the training set, wherein the labels are obtained by the model based on classification or regression prediction of the data by the learned knowledge, the trained model is predicted by using the data of the verification set, and a target result label is generated and used for evaluating the performance of the model on unseen data, namely the generalization capability of the model. The loss function is a function for measuring the difference between the model predicted value and the actual value on the verification set, and the prediction error of the model can be quantized by calculating the loss function, and common loss functions include Mean Square Error (MSE), cross entropy loss and the like, which are not limited by the application.

The value of the loss function is used for evaluating the performance of the model, and if the value of the loss function is higher, the difference between the prediction of the model and the actual label is larger, and the performance of the model needs to be improved. And adjusting parameters or structures of the model according to the result of the loss function, such as changing super parameters of network layer number, neuron number, learning rate and the like, so as to optimize the performance of the model. Depending on the performance feedback on the validation set, multiple adjustments and retraining of the model may be required to gradually reduce the value of the loss function, improve the predictive accuracy of the model, and after model adjustment, the validation set is used again to evaluate the performance of the model, which may require multiple iterations until the model reaches a satisfactory level of performance. Finally, a separate test set may be used to test the performance of the final model, ensuring generalization ability of the model over new data.

Throughout the process, the goal is to train a digital human driven model that can be accurately predicted or classified while ensuring that the model does not over-fit (perform well on training data but perform poorly on new data) or under-fit (perform poorly on training data). The performance and application effect of the digital human driven model can be improved by reasonably selecting a loss function and an optimization algorithm and adjusting the model structure and super parameters.

The model loss function is processed to generate model adjustment parameters, and the method further comprises a calculation formula for calculating the model loss function, wherein the calculation formula is as follows: ; wherein, Is a model loss function,Is a target result label,Is a prediction result label, N is the total number of samples, and i is the index of the samples. The initial digital man-driven model is processed based on the model adjustment parameters to generate a target digital man-driven model, and through the dynamic adjustment mechanism, the system can continuously improve the selection strategy in long-term use, and the response quality is improved. The loss function is typically used to measure the difference between the predicted output of the model and the actual target result, and after the loss function calculation is completed, the gradient of each parameter needs to be calculated by a back propagation algorithm (Backpropagation). The gradient indicates the sensitivity of the loss function to each parameter and is key information for adjusting the parameters in the optimization process.

The parameters of the model are updated using gradient descent (GRADIENT DESCENT) or variants thereof (e.g., adam, RMSprop, etc.), and by iterating the above steps continuously during model training, the parameters of the model are gradually adjusted to minimize the loss function, and this dynamic adjustment mechanism allows the model to learn and adapt to new data continuously during long-term operation, thereby improving the performance and response quality of the model. In addition to model parameters, the choice of superparameter such as learning rate, batch size, optimizer type, etc. can also have a significant impact on model performance. It is often necessary to determine the optimal hyper-parameter settings by experimental or automated hyper-parameter optimization techniques (e.g., grid search, random search, bayesian optimization, etc.).

In the training process, the performance of the model needs to be evaluated on a verification set regularly, an optimal model version is selected according to performance indexes (such as accuracy, F1 score and the like), feedback and new data can be continuously collected after the model is deployed, and the model is continuously updated through online learning or regular retraining so as to adapt to environmental changes and deviation of data distribution. By the method, the digital human driven model can be continuously optimized, and accuracy and naturalness of the digital human driven model in the aspects of speech synthesis, facial expression generation, limb action simulation and the like are improved, so that more real and attractive interaction experience is provided.

S107, processing the target user information to generate attribute information of the target user.

In one embodiment, real-time data processing information of a target server is obtained, wherein the real-time data processing information of the target server is used for representing data processing progress information of the target server, and data cleaning processing is performed on target user information to generate real-time physiological parameter information of a target user and to-be-handled matter information of the target user.

The acquisition of real-time data processing information of a target server and the data cleaning processing of target user information are two key steps, which ensure the progress and quality of data processing. First, for real-time data processing information of a target server, real-time acquisition, analysis and processing of data are typically involved. An efficient real-time data processing system needs to be able to handle highly concurrent data streams and guarantee low latency, and the core goal of real-time data processing is to be able to process, store and analyze data from the source immediately in order to provide valuable information immediately when needed.

Second, performing a data cleansing process on the target user information is another important step to ensure data quality, and data cleansing involves identifying and correcting erroneous, incomplete, inaccurate, uncorrelated, or repeated data to improve the quality of the data and make it more suitable for data analysis or data mining. The difficulties of data cleaning include data quality problems, large data volume, multiple data sources and time cost, and in order to solve these problems, a specialized data processing tool, such as FINEDATALINK, can be used, which provides functions of data filtering, newly added calculation columns, data association and the like, helps users to quickly complete data cleaning and processing, and does not need to write complex SQL sentences, thereby improving development efficiency. By monitoring the data processing progress of the target server in real time and thoroughly cleaning the user information, the real-time physiological parameter information and to-do information of the target user can be generated, so that support is provided for decision making.

Processing the real-time physiological parameter information of the target user and the target user agent information based on the data processing progress information of the target server, generating priority information of the target user agent information, and generating attribute information of the target user based on the priority information of the target user agent information, wherein the attribute information of the target user is used for representing the real-time agent information of the target user. That is, the server may process various data of the target user, so as to complete the commodity recommendation, the doctor seeing, etc., but the most urgent to-be-handled by the current target server needs to be confirmed by considering the current voice, limb motion data, facial expression and proxy of the target user, and the most urgent to-be-handled is regarded as the attribute of the target user.

Based on the data processing progress information of the target server, the real-time physiological parameter information of the target user and the agent information, the priority of the target user to-do information can be generated. This process involves the real-time acquisition, analysis and processing of data, as well as the monitoring and assessment of physiological parameters of the user.

First, data, such as physiological parameter information of the user's voice, limb movements, facial expressions, etc., and progress and status of the agent, need to be collected in real time from a plurality of sources, and the real-time collection of the data is critical to understanding the current needs and urgency of the user. And then, analyzing the acquired data through an intelligent algorithm to determine which physiological parameters are beyond a normal range or which agents are about to reach the expiration date, so as to evaluate the emergency degree of each agent. For example, a digital personal system that integrates multiple AI capabilities can dynamically adjust and optimize the user's task priority based on the user's real-time physiological parameters and agent information. In addition, by using the user portrait generation tool, such as InstantPersonas, the behavior mode, the requirement and the pain point of the target user can be further understood, so that the attribute information of the target user can be generated more accurately, and the real-time to-do matter information of the user can be represented. Finally, through the to-do Todo follow-up template, priority ordering, task calendar management and quantity statistics can be carried out on the user's agent, so that the most urgent agent to be handled is ensured to be processed preferentially. Therefore, the server can finish commodity recommendation, doctor seeing and other matters according to the real-time data and the priority information of the target user, and confirms the current most urgent to-be-handled matters and processes the current most urgent to-be-handled matters as the attribute of the target user.

By comprehensively analyzing the data processing progress information of the target server, the real-time physiological parameter information of the target user and the agent information, the priority information of the target user to-do matters can be effectively generated, and the attribute information of the target user is generated according to the priority information, so that the real-time to-do matters of the user are represented. This process requires not only the technical capabilities of real-time data processing, but also a deep understanding of the user's behavior and needs.

S108, processing the attribute information of the target user based on the target digital person driving model to generate target digital person information.

In one embodiment, the method includes the steps of processing attribute information of a target user based on a target digital person driving model, generating a data processing result matched with implementation to-do matter information of the target user, processing attribute information of the target user based on the target digital person driving model, and generating a data processing result matched with implementation to-do matter information of the target user, and relates to multiple steps and application of technologies.

Firstly, data cleaning and preprocessing are required to be carried out on the real-time physiological parameter information and the agent information of the target user so as to ensure the accuracy and usability of the data. This step is the basis of data processing, and can be accomplished by a specialized data processing tool, such as FINEDATALINK, which provides functions of data filtering, newly added calculation columns, etc., and helps users to quickly complete data cleaning and processing without writing complex SQL sentences. And then, by utilizing a digital person driving model and combining the real-time physiological parameters and the agent information of the user, the priority information of the target user to-do matters can be generated. This process can be implemented by intelligent algorithms, such as Whenable applications, which incorporate AI technology to automatically translate user tasks and target priorities into personalized life roadmaps.

Further, attribute information of the target user may be generated based on priority information of the target user to-do matter information. The attribute information can represent real-time to-do matter information of a target user, such as InstantPersonas platform, which converts complex user data into clear and easily understood user portraits through AI technology, helps enterprises to better locate target markets, optimizes product designs and formulates more effective marketing strategies. Finally, through the dynamic adjustment mechanism, the digital human system can continuously improve the selection strategy in long-term use, and the response quality is improved. For example, the digital person can process various data of the target user, finish commodity recommendation, seeing and other matters, optimize according to instant feedback of the user, and improve the relevance and practicability of recommendation.

In addition, when the digital man system is constructed, a plurality of links such as voice input and recognition, AI interaction processing, voice synthesis and the like need to be considered, and the application of the technologies enables the digital man to interact with a user more naturally. The processing of the target user attribute information based on the target digital person driving model relates to multiple aspects of data cleaning, intelligent algorithm application, user portrait generation, system optimization and the like, and a data processing result matched with the to-be-handled matter information implemented by the target user can be generated through the steps.

And processing the data processing result based on the target digital person driving model to generate character information of the digital person, action information of the digital person and expression information of the digital person. Firstly, a large language model LLM, an EMAGE framework and a VideoReTalking model are integrated into a target digital person driving model, the generation of the word information of the digital person can be realized through integrating the Large Language Model (LLM), the models have the capability of understanding and generating natural language texts, and the interaction and the sense of reality of the digital person are improved through the FastAPI call. Second, motion information generation for digital persons typically relies on motion capture data and advanced animation techniques. For example, EMAGE framework is able to generate human body whole body movements, including facial, local limb, hand, and global movements, from audio and motion masks, which provides powerful technical support for the generation of digital human motion information.

Finally, facial expression recognition and synthesis techniques are required for digital person expression information generation. VideoReTalking is an open-source speech-driven facial expression model, which can generate mouth shapes and facial expressions synchronous with speech according to sound, and realize highly-realistic dynamic portrait video. In addition, by combining the deep learning model with the audio and the facial marker points, a highly realistic dynamic portrait video is created, which supports multiple languages and is suitable for multiple scenes. By integrating the technologies, the digital person driving model can process the input text, audio and video data to generate digital persons with rich expressions and actions, and a more natural and real interactive experience is provided.

And processing the word information of the digital person, the action information of the digital person and the expression information of the digital person to generate target digital person information, wherein the target digital person information is used for representing information reply to the target user based on the target digital person. The digital person needs to be able to "hear" the user's instructions, typically through Automatic Speech Recognition (ASR) techniques, to convert the speech input into text, and the digital person's "brain" function is provided by a Large Language Model (LLM) that is able to understand the natural language input and generate appropriate response content. The digital person needs to be able to "say" by means of a speech synthesis (TTS) AI model, which converts text into an audio stream, and can choose to synthesize timbres or train the synthesis model using the real person's speech, and finally, use the synthesized audio stream to drive the digital person so that it can "say" while "moving", including sound, motion, expression. This may require the use of different engines or AI models, such as SADTALKER or RAD-NeRF, depending on the type of digital person.

Through communication technologies such as WebSocket, audio streams, emotion data and the like generated by a rear-end controller are sent to a front-end digital person to drive the front-end digital person to generate corresponding actions and expressions, and personalized design and optimization are performed on digital person images, behavior logics, technical selection and engineering optimization according to application scenes so as to improve stability and response speed of the system. Through these steps, a highly interactive digital person can be created that can understand and reply to the targeted user's information, providing a personalized interactive experience.

In another embodiment, text feedback: based on the emotion classification results, the system generates an appropriate text reply using a rule-based approach. For example, for a user that is angry, the system may generate placating text. And (3) voice feedback: the system generates a voice response with emotional color using voice synthesis techniques.

Speech speed = reference speed +Intensity of emotion;

Speech intonation = reference pitch + Intensity of emotion;

Wherein, AndThe adjustment coefficients of the reference speed and the reference tone respectively, and the emotion intensity is determined by the emotion probability value output by the emotion recognition module.

The method comprises the steps that a server acquires target user information and a training sample set, wherein the target user information comprises voice data, limb motion data and facial expression data of a target user in a preset time period, and the training sample set comprises other user information matched with the target user information; preprocessing a training sample set to generate a training sample set with identification information, wherein the identification information is used for representing that other user data are in an abnormal state; acquiring a preset judgment matrix, feature weight information matched with an original feature vector and a plurality of engine information; processing the original feature vector and feature weight information matched with the original feature vector to generate a target feature vector, wherein the target feature vector comprises word frequency, an inverse document frequency value and emotion features; processing the target feature vector and a plurality of engine information to generate an engine selection result; a training set and a validation set are generated based on the engine selection result.

Acquiring an initial digital man-driven model matched with the identification information; training the initial digital man-driven model based on the training set to generate a prediction result label; performing prediction processing on the initial digital man-driven model based on the verification set to generate a target result label; processing the predicted result label and the target result label to generate a model loss function; processing the model loss function to generate model adjustment parameters; and processing the initial digital human driving model based on the model adjustment parameters to generate a target digital human driving model.

Acquiring real-time data processing information of a target server, wherein the real-time data processing information of the target server is used for representing data processing progress information of the target server; performing data cleaning processing on the target user information to generate real-time physiological parameter information of the target user and to-be-handled matter information of the target user; processing the real-time physiological parameter information of the target user and the target user agent information based on the data processing progress information of the target server to generate priority information of the target user agent information; generating attribute information of the target user based on priority information of the target user to-do matters information, wherein the attribute information of the target user is used for representing real-time to-do matters information of the target user; processing attribute information of a target user based on a target digital person driving model to generate a data processing result matched with implementation to-be-handled matter information of the target user; processing the data processing result based on the target digital person driving model to generate character information of the digital person, action information of the digital person and expression information of the digital person; and processing the word information of the digital person, the action information of the digital person and the expression information of the digital person to generate target digital person information, wherein the target digital person information is used for representing information reply to the target user based on the target digital person. Collecting voice, limb actions and facial expression data of a user, constructing training samples, cleaning the data and extracting features such as text word frequency, TF-IDF values and emotion features. A training set and a validation set are created using the features, training a digital human driven model. And through model training, optimizing parameters, and generating a target digital human driving model capable of accurately reflecting the user demands. And finally, processing the user information, and generating digital person information which can be effectively replied to the user by combining the model, so as to ensure that the digital person can provide personalized and accurate service.

In one embodiment, as shown in fig. 2, the present application further provides a management device for a virtual digital person, including:

An obtaining module 201, configured to obtain target user information and a training sample set, where the target user information includes voice data, limb motion data, and facial expression data of a target user in a preset period of time, and the training sample set includes other user information matched with the target user information; acquiring an initial digital man-driven model matched with the identification information;

The processing module 202 is configured to pre-process the training sample set to generate a training sample set with identification information, where the identification information is used to characterize that other user data is in an abnormal state; performing feature extraction processing on the training sample set with the identification information to generate an original feature vector, wherein the original feature vector comprises an initial text word frequency, an initial inverse document frequency value (TF-IDF value) and initial emotion features, and the initial emotion features are generated based on voice data, limb motion data and facial expression data of other users; processing the original feature vector based on a preset processing rule to generate a training set and a verification set; training the initial digital man-driven model based on the training set and the verification set to generate a target digital man-driven model; processing the target user information to generate attribute information of a target user; and processing the attribute information of the target user based on the target digital person driving model to generate target digital person information, wherein the target digital person information is used for representing information reply to the target user based on the target digital person.

The embodiment of the application provides an electronic device, as shown in fig. 3, the electronic device 3 includes a first processor 300, a memory 301, a bus 302 and a communication interface 303, where the first processor 300, the communication interface 303 and the memory 301 are connected through the bus 302; the memory 301 stores a computer program that can be executed on the first processor 300, and when the first processor 300 executes the computer program, the virtual digital person management method provided in any of the foregoing embodiments of the present application is executed.

The memory 301 may include a high-speed random access memory (RAM: random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 303 (which may be wired or wireless), the internet, a wide area network, a local network, a metropolitan area network, etc. may be used.

Bus 302 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. The memory 301 is configured to store a program, and after receiving an execution instruction, the first processor 300 executes the program, and the method for managing a virtual digital person disclosed in any of the foregoing embodiments of the present application may be applied to the first processor 300 or implemented by the first processor 300.

The first processor 300 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in software form in the first processor 300. The first processor 300 may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied as a hardware decoding processor executing or a combination of hardware and software modules executing in the decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 301 and the first processor 300 reads the information in the memory 301 and in combination with its hardware performs the steps of the above method.

The electronic device provided by the above embodiment of the present application and the virtual digital person management method provided by the embodiment of the present application have the same beneficial effects as the method adopted, operated or implemented by the application program stored therein, because of the same inventive concept.

An embodiment of the present application provides a computer readable storage medium, as shown in fig. 4, where the computer readable storage medium 401 stores a computer program, and when the computer program is read and executed by the second processor 402, the method for managing a virtual digital person as described above is implemented.

The technical solution of the embodiment of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing an electronic device (which may be an air conditioner, a refrigeration device, a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the method of the embodiment of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The computer readable storage medium provided by the above embodiment of the present application has the same advantages as the method adopted, operated or implemented by the application program stored therein, because of the same inventive concept as the method for managing the virtual digital person provided by the embodiment of the present application.

Embodiments of the present application provide a computer program product comprising a computer program for execution by a third processor to implement a method as described above.

The computer program product provided by the above embodiment of the present application and the method for managing a virtual digital person provided by the embodiment of the present application have the same advantageous effects as the method adopted, operated or implemented by the application program stored therein, because of the same inventive concept.

It is noted that in the present application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments of the present application are described in a related manner, and the same similar parts between the embodiments are all mutually referred, and each embodiment is mainly described in the differences from the other embodiments. In particular, for the embodiments of the management method, the electronic apparatus, the electronic device, and the readable storage medium for evaluating the virtual digital person, since they are substantially similar to the embodiments of the management method for the virtual digital person described above, the description is relatively simple, and the relevant points are referred to the partial description of the embodiments of the management method for the virtual digital person described above.

Although the present application is disclosed above, the present application is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the application, and the scope of the application should be assessed accordingly to that of the appended claims.

Claims

1. A method of managing virtual digital persons, comprising:

Acquiring target user information and a training sample set, wherein the target user information comprises voice data, limb motion data and facial expression data of a target user in a preset time period, and the training sample set comprises other user information matched with the target user information;

Preprocessing the training sample set to generate a training sample set with identification information, wherein the identification information is used for representing that other user data are in an abnormal state;

Performing feature extraction processing on the training sample set with the identification information to generate an original feature vector, wherein the original feature vector comprises an initial text word frequency, an initial inverse document frequency value and initial emotion features, and the initial emotion features are generated based on voice data, limb motion data and facial expression data of other users;

Processing the original feature vector based on a preset processing rule to generate a training set and a verification set;

acquiring an initial digital man-driven model matched with the identification information;

training the initial digital man-driven model based on the training set and the verification set to generate a target digital man-driven model;

processing the target user information to generate attribute information of a target user;

And processing the attribute information of the target user based on the target digital person driving model to generate target digital person information, wherein the target digital person information is used for representing information reply to the target user based on the target digital person.

2. The method of claim 1, wherein performing feature extraction processing on the training sample set with the identification information to generate an original feature vector comprises:

；

3. The method of claim 1, wherein processing the original feature vector based on a preset processing rule to generate a training set and a validation set comprises:

；

4. The method of claim 3, wherein processing the target feature vector and a number of engine information to generate an engine selection result comprises:

The method further comprises a calculation formula for calculating a probability of selecting the target engine based on the target feature vector, wherein the calculation formula is as follows:

；

Wherein, Is characterized in thatDown selection engineProbability of (2),Is characterized byIn the engineThe following conditional probability,Is an enginePrior probability of (2),Is characterized byIs a priori probability of (c).

5. The method of claim 1, wherein training the initial digital person-driven model based on the training set and the validation set to generate a target digital person-driven model comprises:

training the initial digital man-driven model based on a training set to generate a prediction result label;

performing prediction processing on the initial digital man-driven model based on the verification set to generate a target result label;

Processing the predicted result label and the target result label to generate a model loss function;

Processing the model loss function to generate model adjustment parameters;

Processing the initial digital man-driven model based on the model adjustment parameters to generate a target digital man-driven model;

the method further comprises a calculation formula for calculating a model loss function, the calculation formula being:

Wherein, Is a model loss function,Is a target result label,Is a prediction result label, N is the total number of samples, and i is the index of the samples.

6. The method of claim 1, wherein processing the target user information to generate attribute information for a target user comprises:

Acquiring real-time data processing information of a target server, wherein the real-time data processing information of the target server is used for representing data processing progress information of the target server;

Performing data cleaning processing on the target user information to generate real-time physiological parameter information of the target user and to-be-handled matter information of the target user;

processing the real-time physiological parameter information of the target user and the target user agent information based on the data processing progress information of the target server to generate priority information of target user agent information;

And generating attribute information of the target user based on the priority information of the target user to-do matters information, wherein the attribute information of the target user is used for representing the real-time to-do matters information of the target user.

7. The method of claim 6, wherein processing the attribute information of the target user based on the target digital person driven model to generate target digital person information comprises:

Processing attribute information of the target user based on the target digital person driving model to generate a data processing result matched with implementation to-be-handled matter information of the target user;

processing the data processing result based on the target digital person driving model to generate character information of the digital person, action information of the digital person and expression information of the digital person;

And processing the text information of the digital person, the action information of the digital person and the expression information of the digital person to generate target digital person information.

8. A virtual digital person management apparatus, the apparatus comprising:

The system comprises an acquisition module, a training sample set and a processing module, wherein the acquisition module is used for acquiring target user information and the training sample set, the target user information comprises voice data, limb action data and facial expression data of a target user in a preset time period, and the training sample set comprises other user information matched with the target user information; acquiring an initial digital man-driven model matched with the identification information;

The processing module is used for preprocessing the training sample set to generate a training sample set with identification information, wherein the identification information is used for representing that other user data are in an abnormal state; performing feature extraction processing on the training sample set with the identification information to generate an original feature vector, wherein the original feature vector comprises an initial text word frequency, an initial inverse document frequency value and initial emotion features, and the initial emotion features are generated based on voice data, limb motion data and facial expression data of other users; processing the original feature vector based on a preset processing rule to generate a training set and a verification set; training the initial digital man-driven model based on the training set and the verification set to generate a target digital man-driven model; processing the target user information to generate attribute information of a target user; and processing the attribute information of the target user based on the target digital person driving model to generate target digital person information, wherein the target digital person information is used for representing information reply to the target user based on the target digital person.

9. An electronic device, comprising:

A first processor; and a memory for storing executable instructions of the first processor;

wherein the first processor is configured to perform the method of managing virtual digital persons of any one of claims 1 to 7 via execution of the executable instructions.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a second processor, implements the method of managing virtual digital persons according to any one of claims 1 to 7.