WO2024188277A1 - Text semantic matching method and refrigeration device system - Google Patents
Text semantic matching method and refrigeration device system Download PDFInfo
- Publication number
- WO2024188277A1 WO2024188277A1 PCT/CN2024/081468 CN2024081468W WO2024188277A1 WO 2024188277 A1 WO2024188277 A1 WO 2024188277A1 CN 2024081468 W CN2024081468 W CN 2024081468W WO 2024188277 A1 WO2024188277 A1 WO 2024188277A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- data
- text data
- semantic matching
- matching method
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000005057 refrigeration Methods 0.000 title claims abstract description 57
- 238000000605 extraction Methods 0.000 claims abstract description 53
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 238000013528 artificial neural network Methods 0.000 claims abstract description 7
- 230000003993 interaction Effects 0.000 claims description 40
- 230000002452 interceptive effect Effects 0.000 claims description 40
- 230000004927 fusion Effects 0.000 claims description 35
- 238000003860 storage Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 19
- 238000012545 processing Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 12
- 230000008439 repair process Effects 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 10
- 230000001419 dependent effect Effects 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000004140 cleaning Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims description 7
- 238000009792 diffusion process Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000007689 inspection Methods 0.000 claims description 4
- 235000020803 food preference Nutrition 0.000 claims description 3
- 238000013140 knowledge distillation Methods 0.000 claims description 3
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 2
- 230000006872 improvement Effects 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000008054 signal transmission Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000013518 transcription Methods 0.000 description 5
- 230000035897 transcription Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 3
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 3
- 239000004615 ingredient Substances 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013502 data validation Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Definitions
- the present invention relates to the technical field of refrigeration equipment, and in particular to a text semantic matching method and a refrigeration equipment system.
- an object of the present invention is to provide a text semantic matching method and a refrigeration equipment system with fast interactive response speed and accurate feedback information.
- an embodiment of the present invention provides a text semantic matching method, comprising the following steps:
- the text data corresponding to the failure is transmitted to the deep fusion network model for feature extraction, and then the matching result of the text semantics is calculated based on the feature extraction result;
- the deep fusion network model is a fusion model of a text vectorization model and a multi-dimensional feature extraction model
- the text vectorization model vectorizes the text data
- the multi-dimensional feature extraction model extracts multi-dimensional interaction features and correlation features.
- the step of performing result matching on both the unmarkable text and the marked text in the total text data through slot extraction includes:
- the rule engine detects a problem
- the rule definition is automatically analyzed and repaired through the quick repair module, and the result matching is performed again through the rule engine.
- the present invention further comprises the steps of:
- the quick repair module cannot solve the problem detected by the rule engine, or the rule engine still cannot match the results after the rule is repaired, the slot extraction match fails; wherein the problem includes inaccurate rule definition, rule conflict, or inefficient rule execution.
- the steps further include:
- the text data corresponding to the failure is transmitted to the deep fusion network model for feature extraction, and the interaction features are extracted;
- the text semantic matching results are calculated based on the feature information, aggregated interactive feature information, and differentiated interactive feature information.
- the step of calculating aggregated interaction feature information between interaction features includes:
- the step of calculating the differentiated interactive feature information between the interactive features includes:
- the differentiated interactive feature information between interactive features is calculated.
- the step of marking the markable text in the total text data includes:
- the annotable text is subjected to pre-annotation, formal annotation and annotation quality inspection in sequence.
- the value of the text after formal annotation is judged by the annotation quality inspection to be lower than the preset threshold, the text is returned to the pre-annotation process for re-annotation.
- the step of marking the markable text in the total text data includes:
- Annotate the annotable texts in the total text data and store the annotated texts as training data and test data respectively;
- the step of transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction includes:
- the deep fusion network model is trained with the training data, and the result is predicted by the deep fusion network model using the test data.
- the present invention further comprises the steps of:
- the step of transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction includes:
- an unsupervised learning algorithm is used to train the deep fusion network model with the unlabeled text.
- the total text data is obtained by transcribing all multimodal data and/or multi-source heterogeneous data into text data and summarizing them.
- the present invention further comprises the steps of:
- multimodal data includes text, audio and video data
- preprocessing includes cleaning, format conversion and storage of the multimodal data
- the text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data are aggregated into the total text data.
- the step of transcribing the video data into text data comprises:
- the step of recognizing an image based on spatiotemporal and long-distance dependent features and transcribing the image into text data includes:
- the image recognition based on spatiotemporal and long-distance dependent features is generated into a student model through a fusion model of knowledge distillation and diffusion model, and the image data is transcribed into text data through the student model.
- the step of transcribing the video data into text data comprises:
- the key frame image is recognized to generate text data.
- the step of transcribing the audio data into text data comprises:
- the historical text data includes historical record data and historical interaction data, wherein the historical record data includes the user's ingredient preference data, interest data and comment data, and the historical interaction data includes interaction records obtained from the client or the interaction end of the refrigeration equipment.
- the text vectorization model encodes text at the character, word, phrase, and sentence levels
- the multi-dimensional feature extraction model uses a multi-head attention mechanism to extract the interaction and association features of characters, words, phrases, and sentences, as well as contextual semantic information.
- the step of calculating the text semantic matching result based on the feature extraction result includes: the vector after feature extraction is sequentially passed through a fully connected layer and a self-attention mechanism to calculate the text semantic matching result.
- the present invention further comprises the steps of:
- an embodiment of the present invention provides a refrigeration equipment system, comprising:
- a storage module storing a computer program
- the processing module can implement the steps in the above-mentioned text semantic matching method when executing the computer program.
- the matching of the text semantic matching method utilizes the mechanism of first marking and then slot extraction, so that some text data are easier to slot extract after being marked, and the unmarked ones are generally phrases, which are easy to slot extract, so when the slot is extracted, three situations are included: the unmarked ones are successfully matched after slot extraction; the marked ones are successfully matched after slot extraction; the marked ones fail to match after slot extraction, and the category of failed matching is matched again through a deep fusion network model and calculation; the overall matching speed of the text is greatly improved, and the matching accuracy is high, which improves the user experience.
- FIG1 is a schematic structural diagram of a refrigeration equipment system according to an embodiment of the present invention.
- FIG. 2 is a flow chart of a schematic diagram of the structure of a text semantic matching method according to an embodiment of the present invention
- FIG. 3 is a partial flow chart of a structural schematic diagram of a text semantic matching method according to an embodiment of the present invention.
- FIG4 is a flow chart of slot extraction according to an embodiment of the present invention.
- FIG5 is a data flow diagram of a text semantic matching method according to an embodiment of the present invention.
- FIG6 is a schematic diagram of a module according to an embodiment of the present invention.
- FIG7 is a block diagram of a refrigeration system according to an embodiment of the present invention.
- 100 refrigeration equipment; 10, interactive screen; 20, camera; 30, microphone; 40, speaker; 50, processing module; 60, storage module; 70, communication bus; 200, client.
- An embodiment of the present invention provides a text semantic matching method and a refrigeration equipment system with fast interactive response speed and accurate feedback information.
- the refrigeration equipment system may include a refrigeration equipment 100 and a client 200 corresponding to the refrigeration equipment 100.
- the refrigeration equipment 100 may be a refrigerator, and the client 200 may be a mobile phone or an app on the mobile phone.
- the refrigeration equipment system is shown in FIG1 , and the refrigeration equipment 100 and the client 200 may be connected via a wireless signal.
- the refrigeration equipment 100 is described below using a refrigerator as an example.
- the refrigeration device 100 may be a refrigerator with the functions of audio acquisition, video acquisition, and interaction with a user interface.
- the refrigerator is provided with a microphone 30 for collecting audio, a camera 20 for shooting video, a speaker 40 for interacting with user voice, and an interactive screen 10 for interacting with the user's text or graphical interface.
- the interactive screen 10 may be provided on the door of the refrigerator. After the user opens the door of the refrigerator, the camera 20 records the user's operation to form video data. The speaker 40 and the microphone 30 are combined to interact with the user in the form of question and answer audio.
- the client 200 can communicate with the refrigerator via text or voice through the mobile phone, or manage the food information in the refrigerator and control the operating status of the refrigerator through the mobile phone.
- the refrigeration equipment system may also include other external devices, such as external temperature sensors, cameras 20 or microphones 30 and speakers 40 provided by other devices, smart speakers, etc. These devices can be connected to the refrigeration equipment 100 or the client 200 via wireless signals.
- external devices such as external temperature sensors, cameras 20 or microphones 30 and speakers 40 provided by other devices, smart speakers, etc. These devices can be connected to the refrigeration equipment 100 or the client 200 via wireless signals.
- the data of each device in the refrigeration system forms multi-source heterogeneous data.
- the multi-source heterogeneous data collected by multiple devices can be transmitted via wired, WiFi, Bluetooth, etc.
- Various types of data such as text, audio, video, etc. constitute multimodal data. These data can be real-time online or offline data, or they can be stored historical data.
- the refrigeration equipment scenario includes the interaction between the user and the refrigeration equipment 100, and the interaction between the user and the corresponding client 200 of the refrigeration equipment 100, such as the interaction of ingredients, the interaction of instructions, the recording of videos when the user operates the refrigeration equipment 100, the user's control of the temperature and humidity inside the refrigeration equipment 100, and the user's comments on the ingredients on the client 200, the user's preferences, etc.
- the data generated by directly operating the refrigeration equipment 100 and the data generated by the client 200 related to the refrigeration equipment 100 are all data in the refrigeration equipment scenario.
- This embodiment utilizes the multimodal real-time, offline data generated in the refrigeration equipment 100 usage scenario and accumulates massive text historical data to fully mine the semantic, grammatical and contextual information of natural language understanding such as the data itself and between data, so as to make the text semantic matching results in the refrigeration equipment scenario more accurate.
- semantic matching is to convert text into semantic vector representation and calculate other vectors that are close in distance or similarity to these vectors to determine the relevance between texts. Semantic matching can help us better understand and process natural language texts. It can be applied to various scenarios, such as text classification, knowledge graph construction, intelligent customer service, etc.
- the present invention greatly speeds up the matching speed by using the method of pre-marking and slot extraction.
- the method of pre-marking and slot extraction there are three situations: (1) unmarked documents are successfully matched after slot extraction; (2) marked documents are successfully matched after slot extraction; (3) marked documents fail to match after slot extraction, and the category that fails to match is matched again through deep fusion network model and calculation.
- the matching speed of two categories of documents is improved, and the remaining category that fails to match is also greatly improved by neural network calculation compared to the existing semantic matching speed, and the matching accuracy of slot extraction and neural network is also very high. Therefore, the present invention greatly speeds up the semantic matching speed and improves the accuracy.
- the following is a further description of the matching method.
- a text semantic matching method provided by an embodiment of the present invention is described below.
- the present application provides the method operation steps shown in the following implementation or flowchart, based on routine or no creative labor, the execution order of the steps in the method where there is no necessary causal relationship in logic is not limited to the execution order provided in the implementation of the present application.
- the acquisition order of steps S20, S30 and S40 below can be adjusted arbitrarily or performed simultaneously, without distinguishing the order in time sequence.
- Step S10 Collect and pre-process multimodal and/or multi-source heterogeneous data, wherein the multimodal data includes text, audio and video data related to the refrigeration equipment scenario.
- the preprocessing includes cleaning, format conversion and storage of the multimodal data.
- the format conversion includes parsing the data format.
- the text is collected through the client 200 and/or the interactive screen 10 on the refrigerator.
- the client 200 may include devices such as mobile phones, pads, and PCs.
- the text data generated on the application it also includes text from customer service, the web, and text on mini-programs or public accounts.
- the preprocessing of the text data may include stop words, deduplication, etc.
- Audio data can be collected by microphone 30 on the mobile phone and/or refrigerator, or can be wirelessly connected to other sound collection units via WiFi to obtain voice from sound collection units on other devices.
- Microphone 30 can be a single microphone 30 or a microphone 30 array.
- the video data can be collected by the client 200 and/or the camera 20 on the refrigerator, for example, through a mobile phone app, camera 20, Bluetooth, etc., and the voice and video can also be separated by a script to obtain valid voice and video data.
- multimodal data collection tasks such as multiple channels and terminals are completed to ensure data integrity and multimodal cognitive characteristics.
- the text source of the present invention includes data of various modes generated by users, which is further trained and matched with text semantics, with higher accuracy.
- Step S20 transcribe the video data into text data.
- This step includes the following two embodiments. In one embodiment, it includes the following steps:
- Recognition of text in image data can be directly transcribed into text data, while recognition of non-text in images can be based on multiple images of changes in the user's mouth movements to recognize what the user said.
- the present invention takes into account that if only the image features of the speaker are used, the sentences that may be recognized may be more complex, so the present invention combines sentence length factors and context relevance for recognition.
- the sentence length factor includes the features of different sentence lengths and different word compositions, and uses recognition based on spatiotemporal and long-distance dependency features to mine the rich semantic feature information of sentence sequences.
- the image recognition based on spatiotemporal and long-distance dependent features is generated through the fusion model of knowledge distillation and diffusion model, and the knowledge of the original large model is transferred to the student network.
- the image data is transcribed into text data through the student model.
- the student model is a model with a discrete time step and a short number of steps.
- the student model can be distilled to half the number of steps of the teacher model.
- the key frame image in the video data may be obtained by a diffusion model
- the key frame image is recognized to generate text data.
- Step S30 combining the speech temporal and spatial characteristics and contextual relationship features of the refrigeration device 100, the audio data is transcribed into text data.
- the audio data in this step may be directly collected audio data, or may include audio data segmented from video data.
- This embodiment combines the spatiotemporal characteristics of voice data in the usage scenarios of refrigeration equipment 100 such as refrigerators, and establishes a deep recurrent convolutional network model based on the fusion neural network MMCNN-RNN, CTC, and Attention through an end-to-end learning method to transcribe audio data into text data, thereby obtaining rich high-level speech feature information and improving the accuracy of the model's speech-to-text transcription.
- Step S40 Acquire historical text data, wherein the historical text data includes historical record data and historical interaction data.
- the multimodal data collected in the above step S10 is real-time data
- what is obtained in step S40 is historical data.
- the information of the historical text data itself can be utilized, and on the other hand, a complementary and related relationship can be formed with the real-time data to fully obtain the semantic information of the text data.
- the historical text data may be a lot of unlabeled texts accumulated on the refrigeration equipment 100 or the client 200. After obtaining the historical text data, the collected data may be cleaned, formatted, and processed in a unified manner, so that the text format of the historical text data and the real-time collected data transcription is unified, thereby ensuring the comprehensiveness and specificity of the data characteristics.
- the historical record data includes the user's food preference data, interest data and comment data;
- the historical interaction data includes the interaction records obtained from the client 200 or the interaction end of the refrigeration device 100.
- the historical text data also includes the data collected by the refrigeration device 100 and the data on the client 200 corresponding to the refrigeration device 100.
- Step S50 pre-processing the total text data.
- the text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data are aggregated into total text data.
- the summary of total text data reflects that this embodiment uses multi-source heterogeneous data, real-time, offline voice, video, image, text and user historical comments, food preferences, food interests and other text data.
- the total text data can include two types of data.
- the first type is data that can be pre-annotated
- the second type is data that cannot or does not need to be annotated.
- Data that cannot or does not need to be annotated are generally short phrases, which are also easy to extract slots in subsequent steps. Pre-annotation is generally required for longer sentences. Annotating this type of sentence can facilitate subsequent classification and recognition. Annotated data is more conducive to matching results in slot extraction in subsequent steps.
- Step S51 annotate the annotatable texts in the total text data, and store the annotated texts as training data and test data respectively.
- the annotable text is pre-annotated, formally annotated, annotated quality checked, and stored in sequence, wherein when the value of the formally annotated text is judged to be lower than a preset threshold by the annotation quality check, the text is returned to the pre-annotation process for re-annotation.
- the annotated data may include training data and test data.
- Step S52 cleaning the text that cannot be marked in the total text data.
- step S52 the text data that cannot be labeled are directly subjected to data cleaning, format conversion and other tasks, and then directly participate in the subsequent slot extraction without being labeled.
- Step S60 performing result matching on both the unmarkable text and the marked text in the total text data through slot extraction, and judging the matching result, as shown in FIG3 .
- the rule engine detects a problem
- the rule definition is automatically analyzed and repaired through the quick repair module, and the result matching is performed again through the rule engine.
- a rule engine is a software tool that can define and execute various rules in a system, such as business rules, process rules, data validation rules, etc.
- a rule engine may have various problems, such as inaccurate rule definitions, rule conflicts, inefficient rule execution, etc.
- a rule engine detects a violation of a rule, it can generate a warning or error message.
- Quick fix techniques can quickly identify and resolve many issues.
- Quick fix techniques usually involve methods such as automated testing, code analysis, and debugging tools to quickly identify issues and provide solutions.
- this embodiment realizes the rapid repair of rules by combining the rule engine and the rapid repair technology.
- the rapid repair technology can automatically analyze and repair the rule definition, reduce the time and cost of manual repair, improve the reliability and efficiency of the system, and achieve the effect of real-time update and repair of rules, avoiding the problem of offline no updated rules, and needing to spend a lot of time and energy to define and maintain the rule set.
- the following article performs semantic matching on the text that fails slot extraction.
- Step S70 Calculate the semantic matching result through the neural network.
- Step S70 is shown in FIG3 , and specifically includes step S71 , step S72 , and step S73 ;
- Step S71 extract features from the total text data through a deep fusion network model, train the deep fusion network model with the training data, and predict results with the deep fusion network model using the test data.
- the annotated data may include training data and test data.
- Step S71 may perform pre-training tasks on the constructed deep fusion model based on the training data, and then predict the results based on the test data to obtain rich semantic feature information, thereby obtaining the best and most effective model, ensuring the optimal prediction results and higher accuracy of user feedback information.
- the text vectorization model of step S71 encodes text at the character, word, phrase, and sentence levels; the multi-dimensional feature extraction model uses a multi-head attention mechanism to extract character, word, phrase, sentence interaction and association features, and contextual semantic information.
- Step S72 Transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction, and extracting the interaction features;
- the differentiated interactive feature information between interactive features is calculated.
- Aggregating interactive feature information can aggregate the interactive information between different features to obtain a richer feature representation. It can improve the prediction performance of the model, especially when there are complex interactive relationships between features. Feature information can be learned and aggregated through attention weighted summation to obtain a richer feature representation.
- Differentiated interactive feature information is to construct a new feature representation by calculating the differences between different features. It can capture the important interactive relationships between different features, thereby improving the prediction ability of the model.
- the calculation method of differentiated interactive feature information can be to use the attention mechanism to enhance the calculation of the differences between interactive features. Differentiated interactive feature information can be used as the input of the model, and can be learned and aggregated in subsequent neural network models to obtain more accurate prediction results.
- Step S73 Calculate the text semantic matching result according to the feature information, the aggregated interaction feature information, and the differentiated interaction feature information.
- step S73 the feature-extracted vector is sequentially passed through the fully connected layer and the self-attention mechanism to obtain the semantic matching result of the text.
- the semantic matching result is first calculated through the fully connected layer, and then the self-attention mechanism is used to calculate the semantic relationship quantification result with stronger text interactivity.
- step S73 can also be calculated using semantic matching similarity with disambiguation, which can be achieved based on the threshold value control of the distance between terms, so that some redundant information in the text can be removed.
- Step S80 Result reached.
- the result contact method can adopt a variety of built-in or external forms, such as outbound calls, SMS contact, email notification, large-screen display, voice broadcast, text output, smart speakers, pop-up UI, app, PAD, web and other result contact methods to meet the needs of result presentation and digital display.
- this embodiment has the following beneficial effects:
- the matching of this text semantic matching method utilizes the mechanism of first marking and then slot extraction, so that some text data are easier to slot extract after being marked, and the unmarked ones are generally phrases, which are easy to slot extract. Therefore, when extracting the slots, there are three situations: the unmarked ones are successfully matched after slot extraction; the marked ones are successfully matched after slot extraction; the marked ones fail to match after slot extraction, and the category of failed matching is matched again through deep fusion network model and calculation; the overall matching speed of the text is greatly improved, and the matching accuracy is high, which improves the user experience.
- the present invention also proposes a refrigeration equipment system, which includes a storage module 60 and a processing module 50.
- the processing module 50 executes the computer program, it can implement the steps in the above-mentioned text semantic matching method, that is, implement the steps in any one of the technical solutions in the above-mentioned text semantic matching method.
- the refrigeration equipment system may also include the following multiple modules as shown in FIG6 , and the specific functions of each module are as follows:
- a collection module used for collecting and preprocessing multimodal data and/or multi-source heterogeneous data, wherein the multimodal data includes text, audio and video data related to the refrigeration equipment scene;
- a video transcription module used to transcribe video data into text data
- An audio transcription module used to transcribe audio data into text data by combining the speech spatiotemporal characteristics and contextual relationship characteristics of the refrigeration device 100;
- An acquisition module used for acquiring historical text data, wherein the historical text data includes historical record data and historical interaction data;
- An intelligent annotation module for aggregating the text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data into total text data, and annotating the annotatable text in the total text data;
- a slot extraction module used for matching the results of the unmarkable text and the marked text in the total text data through slot extraction
- a feature extraction module used for extracting features from the total text data through a deep fusion model after the matching fails, wherein the deep fusion model is a fusion model of a text vectorization model and a multi-dimensional feature extraction model;
- An aggregation difference module is used to calculate the aggregated interaction feature information between the interaction features and calculate the differentiated interaction feature information between the interaction features;
- Semantic matching module used to calculate semantic matching results
- the refrigeration equipment system may also include refrigeration equipment 100, computing devices such as mobile phones, computers, notebooks, PDAs, and cloud servers, and include but are not limited to a processing module 50, a storage module 60, and a computer program stored in the storage module 60 and executable on the processing module 50, such as the above-mentioned text semantic matching method program.
- a processing module 50 executes the computer program, the steps in the above-mentioned text semantic matching method embodiments are implemented, such as the steps shown in FIGS. 2 to 5.
- the refrigeration equipment system may further include a signal transmission module and a communication bus 70.
- the signal transmission module is used to send data to the processing module 50 or the server, for example, data is transmitted between the refrigeration equipment 100 and the mobile phone, or between the refrigeration equipment 100 and the server through the signal transmission module, and the signal transmission module may transmit data in the form of a wireless connection, such as Bluetooth, wifi, ZigBee, etc.
- the communication bus 70 is used to establish a connection between the signal transmission module, the processing module 50 and the storage module 60, and the communication bus 70 may include a passage to transmit information between the above-mentioned signal transmission module, the processing module 50 and the storage module 60.
- the processing module 50 and the storage module 60 may be a part integrated into the refrigeration device 100, or a part of a mobile phone, a local terminal device, or a part of a cloud server.
- the processing module 50 can be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor can be a microprocessor or any conventional processor.
- the processing module 50 is the control center of the refrigeration equipment system, and uses various interfaces and lines to connect various parts of the entire refrigeration equipment system.
- the storage module 60 can be used to store the computer program and/or module.
- the processing module 50 realizes various functions of the refrigeration equipment system by running or executing the computer program and/or module stored in the storage module 60 and calling the data stored in the storage module 60.
- the storage module 60 can mainly include a storage program area and a storage data area, wherein the storage program area can store an operating system, at least one application required for a function, etc.
- the storage module 60 can include a high-speed random access memory, and can also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), at least one disk storage device, a flash memory device, or other volatile solid-state storage devices.
- a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), at least one disk storage device, a flash memory device, or other volatile solid-state storage devices.
- a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash
- the computer program may be divided into one or more modules/units, which are stored in the storage module 60 and executed by the processing module 50 to implement the present invention.
- the one or more modules/units may be a series of computer program instruction segments capable of implementing specific functions, and the instruction segments are used to describe the execution process of the computer program in the refrigeration equipment system.
- an embodiment of the present invention provides a readable storage medium storing a computer program, which, when executed by the processing module 50, can implement the steps in the above-mentioned text semantic matching method, that is, implement the steps in any one of the technical solutions in the above-mentioned text semantic matching method.
- the module integrated in the text semantic matching method can be stored in a computer-readable storage medium.
- the present invention implements all or part of the processes in the above-mentioned embodiment method, and can also be completed by instructing related hardware through a computer program.
- the computer program can be stored in a computer-readable storage medium.
- the computer program includes computer program code, which may be in source code form, object code form, executable file or some intermediate form, etc.
- the computer readable medium may include: any entity or device capable of carrying the computer program code, recording medium, disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer readable media do not include electric carrier signals and telecommunication signals.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
Disclosed in the present invention are a text semantic matching method and a refrigeration device system. The method comprises: annotating annotatable text in total text data; performing result matching on both unannotatable text and annotated text in the total text data by means of slot extraction; if the matching is successful, outputting a matching result; and if the matching fails, calculating a matching result by means of a neural network. In the method, matching is performed by using a mechanism of performing annotation first and then performing slot extraction, such that the overall matching speed of text is greatly increased.
Description
本申请要求了申请日为2023年03月15日,申请号为202310247263.1,发明名称为“文本语义匹配方法及制冷设备系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on March 15, 2023, with application number 202310247263.1, and invention name “Text Semantic Matching Method and Refrigeration Equipment System”, all contents of which are incorporated by reference into this application.
本发明涉及制冷设备技术领域,尤其涉及一种文本语义匹配方法及制冷设备系统。The present invention relates to the technical field of refrigeration equipment, and in particular to a text semantic matching method and a refrigeration equipment system.
随着人工智能的进步,人们希望将人工智能引入冰箱领域,使冰箱更加智能化。在冰箱智能化的过程中,涉及大量的针对冰箱场景的优化过程,这种优化包括用户与冰箱的口语的、文本的、视频的各种类型的交互的优化,在优化的过程中,发明人发现现有技术存在这样的问题:With the advancement of artificial intelligence, people hope to introduce artificial intelligence into the refrigerator field to make refrigerators more intelligent. In the process of refrigerator intelligence, a large number of optimization processes for refrigerator scenarios are involved, including optimization of various types of interactions between users and refrigerators, such as oral, text, and video. During the optimization process, the inventor found that the existing technology has the following problems:
现有的交互的响应速度慢,且准确率不够高,不能达到用户即时清晰地交流的需求,使用户明显感受到人机对话的不便,而不是如人与人对话一般的自然,使用体验较差。The existing interactions have a slow response speed and are not accurate enough, which cannot meet the user's needs for instant and clear communication. Users clearly feel the inconvenience of human-computer dialogue, rather than the naturalness of human-to-human dialogue, and the user experience is poor.
为解决上述的现有技术问题中的至少其一,本发明的目的在于提供一种交互响应速度快、反馈的信息精准的文本语义匹配方法及制冷设备系统。In order to solve at least one of the above-mentioned problems in the prior art, an object of the present invention is to provide a text semantic matching method and a refrigeration equipment system with fast interactive response speed and accurate feedback information.
为实现上述发明目的,本发明一实施方式提供一种文本语义匹配方法,包括如下步骤:To achieve the above-mentioned object of the invention, an embodiment of the present invention provides a text semantic matching method, comprising the following steps:
将总文本数据中可标注的文本进行标注;Annotate the annotable texts in the total text data;
通过槽位抽取对所述总文本数据中不可标注的文本和已标注的文本均进行结果匹配,并判断匹配的结果;Performing result matching on both the unmarkable text and the marked text in the total text data through slot extraction, and determining the matching result;
若匹配成功则输出匹配结果;If the match is successful, the matching result is output;
若匹配失败,将失败对应的文本数据传输至深度融合网络模型进行特征提取,再根据特征提取结果计算文本语义的匹配结果;其中,所述深度融合网络模型是文本向量化模型和多维度特征提取模型的融合模型,所述文本向量化模型将文本数据向量化,所述多维度特征提取模型提取多维度交互特征和关联特征。If the matching fails, the text data corresponding to the failure is transmitted to the deep fusion network model for feature extraction, and then the matching result of the text semantics is calculated based on the feature extraction result; wherein, the deep fusion network model is a fusion model of a text vectorization model and a multi-dimensional feature extraction model, the text vectorization model vectorizes the text data, and the multi-dimensional feature extraction model extracts multi-dimensional interaction features and correlation features.
作为本发明的进一步改进,所述步骤通过槽位抽取对所述总文本数据中不可标注的文本和已标注的文本均进行结果匹配包括:As a further improvement of the present invention, the step of performing result matching on both the unmarkable text and the marked text in the total text data through slot extraction includes:
通过规则引擎对所述总文本数据中不可标注的文本和已标注的文本均进行结果匹配;Matching the results of the unmarkable text and the marked text in the total text data by using a rule engine;
当所述规则引擎检测出问题时,通过快速修复模块自动分析和修复规则定义,并重新通过规则引擎进行结果匹配。When the rule engine detects a problem, the rule definition is automatically analyzed and repaired through the quick repair module, and the result matching is performed again through the rule engine.
作为本发明的进一步改进,还包括步骤:As a further improvement of the present invention, the present invention further comprises the steps of:
当所述快速修复模块无法解决所述规则引擎检测出的问题、或者规则修复后所述规则引擎依旧无法进行结果匹配时,所述槽位抽取匹配失败;其中,所述问题包括规则定义不准确、规则冲突、或规则执行效率低下。When the quick repair module cannot solve the problem detected by the rule engine, or the rule engine still cannot match the results after the rule is repaired, the slot extraction match fails; wherein the problem includes inaccurate rule definition, rule conflict, or inefficient rule execution.
作为本发明的进一步改进,所述步骤若匹配失败还包括:As a further improvement of the present invention, if the matching fails, the steps further include:
将失败对应的文本数据传输至深度融合网络模型进行特征提取,提取出交互特征;The text data corresponding to the failure is transmitted to the deep fusion network model for feature extraction, and the interaction features are extracted;
计算交互特征之间的聚合交互特征信息;Calculate aggregated interactive feature information between interactive features;
计算交互特征之间的差异化交互特征信息;Calculate the differential interaction feature information between the interaction features;
根据特征信息、聚合交互特征信息、差异化交互特征信息,计算文本语义的匹配结果。The text semantic matching results are calculated based on the feature information, aggregated interactive feature information, and differentiated interactive feature information.
作为本发明的进一步改进,所述步骤计算交互特征之间的聚合交互特征信息包括:As a further improvement of the present invention, the step of calculating aggregated interaction feature information between interaction features includes:
通过注意力加权求和,计算交互特征之间的聚合交互特征信息;By weighted summation of attention, the aggregated interactive feature information between interactive features is calculated;
所述步骤计算交互特征之间的差异化交互特征信息包括:The step of calculating the differentiated interactive feature information between the interactive features includes:
通过注意力机制增强,计算交互特征之间的差异化交互特征信息。Enhanced by the attention mechanism, the differentiated interactive feature information between interactive features is calculated.
作为本发明的进一步改进,所述步骤对所述总文本数据中可标注的文本进行标注包括:As a further improvement of the present invention, the step of marking the markable text in the total text data includes:
对可标注的文本依次进行预标注、正式标注和标注质检,其中,当文本经过正式标注的值经过标注质检判断低于预设阈值时,将该文本返回预标注中重新标注。The annotable text is subjected to pre-annotation, formal annotation and annotation quality inspection in sequence. When the value of the text after formal annotation is judged by the annotation quality inspection to be lower than the preset threshold, the text is returned to the pre-annotation process for re-annotation.
作为本发明的进一步改进,所述步骤将总文本数据中可标注的文本进行标注包括:As a further improvement of the present invention, the step of marking the markable text in the total text data includes:
将总文本数据中可标注的文本进行标注,并将标注后的文本分别存储为训练数据和测试数据;Annotate the annotable texts in the total text data, and store the annotated texts as training data and test data respectively;
所述步骤将失败对应的文本数据传输至深度融合网络模型进行特征提取包括:The step of transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction includes:
对于匹配失败的被标注的文本数据,用所述训练数据训练所述深度融合网络模型,用所述测试数据通过所述深度融合网络模型预测结果。For the annotated text data that fails to match, the deep fusion network model is trained with the training data, and the result is predicted by the deep fusion network model using the test data.
作为本发明的进一步改进,还包括步骤:As a further improvement of the present invention, the present invention further comprises the steps of:
对所述总文本数据中不可标注的文本进行数据清洗;Performing data cleaning on the text that cannot be labeled in the total text data;
所述步骤将失败对应的文本数据传输至深度融合网络模型进行特征提取包括:The step of transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction includes:
对于匹配失败的不可标注的文本数据,采用无监督学习算法用所述不可标注的文本训练所述深度融合网络模型。For the unlabeled text data that failed to match, an unsupervised learning algorithm is used to train the deep fusion network model with the unlabeled text.
作为本发明的进一步改进,所述总文本数据是由多模态数据和/或多源异构数据全部转写为文本数据并汇总而成。As a further improvement of the present invention, the total text data is obtained by transcribing all multimodal data and/or multi-source heterogeneous data into text data and summarizing them.
作为本发明的进一步改进,还包括步骤:As a further improvement of the present invention, the present invention further comprises the steps of:
采集多模态数据和/或多源异构数据并预处理,其中,所述多模态数据包括与文本、音频和视频数据,所述预处理包括对所述多模态数据的清洗、格式转化和存储;Collecting multimodal data and/or multi-source heterogeneous data and preprocessing them, wherein the multimodal data includes text, audio and video data, and the preprocessing includes cleaning, format conversion and storage of the multimodal data;
将视频数据转写为文本数据;Transcribing video data into text data;
将音频数据转写为文本数据;transcribe audio data into text data;
获取历史文本数据;Get historical text data;
将所述多模态数据中的文本数据、所述音频数据转写的文本数据、所述视频数据转写的文本数据、以及所述历史文本数据汇总为所述总文本数据。The text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data are aggregated into the total text data.
作为本发明的进一步改进,所述步骤将视频数据转写为文本数据包括:As a further improvement of the present invention, the step of transcribing the video data into text data comprises:
将所述视频数据中的音频和图像分离,得到音频数据和图像数据;Separating the audio and image in the video data to obtain audio data and image data;
识别所述图像数据中的文字信息,转写为文本数据;Recognize text information in the image data and transcribe it into text data;
基于时空和长距离依赖特征识别图像数据,转写为文本数据;Recognize image data based on spatiotemporal and long-distance dependent features and transcribe them into text data;
其中,所述步骤基于时空和长距离依赖特征识别图像,转写为文本数据包括:The step of recognizing an image based on spatiotemporal and long-distance dependent features and transcribing the image into text data includes:
将基于时空和长距离依赖特征识别图像通过知识蒸馏和扩散模型的融合模型生成学生模型,通过所述学生模型将图像数据转写为文本数据。The image recognition based on spatiotemporal and long-distance dependent features is generated into a student model through a fusion model of knowledge distillation and diffusion model, and the image data is transcribed into text data through the student model.
作为本发明的进一步改进,所述步骤将视频数据转写为文本数据包括:As a further improvement of the present invention, the step of transcribing the video data into text data comprises:
通过扩散网络模型获取所述视频数据中的关键帧图像;Acquire key frame images in the video data through a diffusion network model;
识别所述关键帧图像生成文本数据。The key frame image is recognized to generate text data.
作为本发明的进一步改进,所述步骤将音频数据转写为文本数据包括:As a further improvement of the present invention, the step of transcribing the audio data into text data comprises:
结合的获取数据的场景下的语音时空特性和上下文关系特征,建立基于融合神经网络MMCNN-RNN、CTC、以及Attention的深度循环卷积网络模型将音频数据转写为文本数据。Combining the spatiotemporal characteristics of speech and contextual relationship features in the scenario where the data is acquired, a deep recurrent convolutional network model based on the fusion neural network MMCNN-RNN, CTC, and Attention is established to transcribe audio data into text data.
作为本发明的进一步改进,所述历史文本数据包括历史记录数据和历史交互数据,其中,所述历史记录数据包括用户的食材偏好数据、兴趣数据和评论数据,所述历史交互数据包括从客户端或制冷设备的交互端获取的交互记录。As a further improvement of the present invention, the historical text data includes historical record data and historical interaction data, wherein the historical record data includes the user's ingredient preference data, interest data and comment data, and the historical interaction data includes interaction records obtained from the client or the interaction end of the refrigeration equipment.
作为本发明的进一步改进,所述文本向量化模型对字、词、短语、句子层级文本编码;As a further improvement of the present invention, the text vectorization model encodes text at the character, word, phrase, and sentence levels;
所述多维度特征提取模型使用多头注意力机制,分别提取字、词、词语、句子交互与关联特征、上下文语义信息。The multi-dimensional feature extraction model uses a multi-head attention mechanism to extract the interaction and association features of characters, words, phrases, and sentences, as well as contextual semantic information.
作为本发明的进一步改进,所述步骤再根据特征提取结果计算文本语义的匹配结果包括:将特征提取后的向量依次经过全连接层和自注意力机制计算得到文本语义匹配结果。As a further improvement of the present invention, the step of calculating the text semantic matching result based on the feature extraction result includes: the vector after feature extraction is sequentially passed through a fully connected layer and a self-attention mechanism to calculate the text semantic matching result.
作为本发明的进一步改进,还包括步骤:As a further improvement of the present invention, the present invention further comprises the steps of:
将匹配成功的匹配结果和匹配失败经过重新计算出的匹配结果均结果触达。The matching results of successful matches and the matching results of failed matches after recalculation are all reached.
为实现上述发明目的之一,本发明一实施例提供了一种制冷设备系统,包括:To achieve one of the above-mentioned purposes of the invention, an embodiment of the present invention provides a refrigeration equipment system, comprising:
存储模块,存储计算机程序;A storage module storing a computer program;
处理模块,执行所述计算机程序时可实现上述的文本语义匹配方法中的步骤。The processing module can implement the steps in the above-mentioned text semantic matching method when executing the computer program.
与现有技术相比,本发明具有以下有益效果:该文本语义匹配方法的匹配利用了先进行标注再槽位抽取机制,使一些文本数据被标注后更易于槽位抽取,且未标注的一般是短语,短语易于槽位抽取,所以在槽位抽取时,包括三种情况:未标注的经槽位抽取后匹配成功;标注的经槽位抽取后匹配成功;标注的经槽位抽取后匹配失败,对匹配失败的一类再通过深度融合网络模型及计算匹配;使文本整体的匹配速度大大提高,且匹配精准度高,提高了用户的使用体验。Compared with the prior art, the present invention has the following beneficial effects: the matching of the text semantic matching method utilizes the mechanism of first marking and then slot extraction, so that some text data are easier to slot extract after being marked, and the unmarked ones are generally phrases, which are easy to slot extract, so when the slot is extracted, three situations are included: the unmarked ones are successfully matched after slot extraction; the marked ones are successfully matched after slot extraction; the marked ones fail to match after slot extraction, and the category of failed matching is matched again through a deep fusion network model and calculation; the overall matching speed of the text is greatly improved, and the matching accuracy is high, which improves the user experience.
图1是本发明一实施例的制冷设备系统的结构示意图;FIG1 is a schematic structural diagram of a refrigeration equipment system according to an embodiment of the present invention;
图2是本发明一实施例的文本语义匹配方法的结构示意图的流程图;2 is a flow chart of a schematic diagram of the structure of a text semantic matching method according to an embodiment of the present invention;
图3是本发明一实施例的文本语义匹配方法的结构示意图的部分流程图;3 is a partial flow chart of a structural schematic diagram of a text semantic matching method according to an embodiment of the present invention;
图4是本发明一实施例的槽位抽取的流程图;FIG4 is a flow chart of slot extraction according to an embodiment of the present invention;
图5是本发明一实施例的文本语义匹配方法的数据流图;FIG5 is a data flow diagram of a text semantic matching method according to an embodiment of the present invention;
图6是本发明一实施例的的模块示意图;FIG6 is a schematic diagram of a module according to an embodiment of the present invention;
图7是本发明一实施例的制冷设备系统的结构框图;FIG7 is a block diagram of a refrigeration system according to an embodiment of the present invention;
其中,100、制冷设备;10、交互屏幕;20、摄像头;30、麦克风;40、扬声器;50、处理模块;60、存储模块;70、通信总线;200、客户端。Among them, 100, refrigeration equipment; 10, interactive screen; 20, camera; 30, microphone; 40, speaker; 50, processing module; 60, storage module; 70, communication bus; 200, client.
以下将结合附图所示的具体实施方式对本发明进行详细描述。但这些实施方式并不限制本发明,本领域的普通技术人员根据这些实施方式所做出的结构、方法、或功能上的变换均包含在本发明的保护范围内。The present invention will be described in detail below in conjunction with the specific embodiments shown in the accompanying drawings. However, these embodiments do not limit the present invention, and any structural, methodological, or functional changes made by a person skilled in the art based on these embodiments are all within the scope of protection of the present invention.
本发明一实施例提供一种交互响应速度快、反馈的信息精准的文本语义匹配方法及制冷设备系统。An embodiment of the present invention provides a text semantic matching method and a refrigeration equipment system with fast interactive response speed and accurate feedback information.
制冷设备系统可以包括制冷设备100、以及与该制冷设备100对应的客户端200,制冷设备100可以是一种冰箱,客户端200可以是手机,或者是手机上的app端。制冷设备系统参图1所示,制冷设备100与客户端200之间可以通过无线信号连接。下文以冰箱为例说明制冷设备100。The refrigeration equipment system may include a refrigeration equipment 100 and a client 200 corresponding to the refrigeration equipment 100. The refrigeration equipment 100 may be a refrigerator, and the client 200 may be a mobile phone or an app on the mobile phone. The refrigeration equipment system is shown in FIG1 , and the refrigeration equipment 100 and the client 200 may be connected via a wireless signal. The refrigeration equipment 100 is described below using a refrigerator as an example.
继续如图1所示,制冷设备100可以是一种具有音频采集、视频采集、与用户界面交互的冰箱,冰箱上设置可采集音频的麦克风30、可以拍摄视频的摄像头20、可以与用户语音交互的扬声器40、可以与用户文字或图形界面交互的交互屏幕10,交互屏幕10可以设置在冰箱的门体上;用户打开冰箱门体后,摄像头20记录用户的操作,形成视频数据;扬声器40和麦克风30组合,以问答形式与用户交互音频。Continuing with FIG1 , the refrigeration device 100 may be a refrigerator with the functions of audio acquisition, video acquisition, and interaction with a user interface. The refrigerator is provided with a microphone 30 for collecting audio, a camera 20 for shooting video, a speaker 40 for interacting with user voice, and an interactive screen 10 for interacting with the user's text or graphical interface. The interactive screen 10 may be provided on the door of the refrigerator. After the user opens the door of the refrigerator, the camera 20 records the user's operation to form video data. The speaker 40 and the microphone 30 are combined to interact with the user in the form of question and answer audio.
客户端200以手机为例,用户可以通过手机端与冰箱进行文字或语音的交流,或者通过手机管理冰箱内的食材信息,以及控制冰箱的运行状态。Taking a mobile phone as an example, the client 200 can communicate with the refrigerator via text or voice through the mobile phone, or manage the food information in the refrigerator and control the operating status of the refrigerator through the mobile phone.
另外制冷设备系统还可以包括外界的其他器件,例如外界的温度传感器、其他设备自带的摄像头20或麦克风30和扬声器40、智能音箱等,这些设备可以通过无线信号与制冷设备100或客户端200连接。In addition, the refrigeration equipment system may also include other external devices, such as external temperature sensors, cameras 20 or microphones 30 and speakers 40 provided by other devices, smart speakers, etc. These devices can be connected to the refrigeration equipment 100 or the client 200 via wireless signals.
制冷设备系统内各个设备的数据,形成了多源异构数据,多设备采集的多源异构数据可以经由有线、wifi、蓝牙等形式传输。文本、音频、视频等各种类型的数据构成了多模态数据,这些数据可以是实时在线或离线数据,这些数据也可以是存储下来的历史数据。The data of each device in the refrigeration system forms multi-source heterogeneous data. The multi-source heterogeneous data collected by multiple devices can be transmitted via wired, WiFi, Bluetooth, etc. Various types of data such as text, audio, video, etc. constitute multimodal data. These data can be real-time online or offline data, or they can be stored historical data.
制冷设备场景包括了用户与制冷设备100的交互、用户与制冷设备100的对应的客户端200的交互,例如食材的交互、指令的交互、用户操作制冷设备100时的视频的记录、用户对制冷设备100内部温度湿度的控制,以及用户在客户端200上对食材的点评、用户的偏好等,这些直接操作制冷设备100产生的数据、以及与制冷设备100相关的客户端200产生的数据,都是制冷设备场景下的数据。The refrigeration equipment scenario includes the interaction between the user and the refrigeration equipment 100, and the interaction between the user and the corresponding client 200 of the refrigeration equipment 100, such as the interaction of ingredients, the interaction of instructions, the recording of videos when the user operates the refrigeration equipment 100, the user's control of the temperature and humidity inside the refrigeration equipment 100, and the user's comments on the ingredients on the client 200, the user's preferences, etc. The data generated by directly operating the refrigeration equipment 100 and the data generated by the client 200 related to the refrigeration equipment 100 are all data in the refrigeration equipment scenario.
本实施例利用制冷设备100使用场景下产生的多模态实时、离线数据、积累海量文本历史数据充分挖掘了数据本身及数据间等自然语言理解的语义、语法及语境信息,使制冷设备场景下的文本语义匹配结果更加准确。This embodiment utilizes the multimodal real-time, offline data generated in the refrigeration equipment 100 usage scenario and accumulates massive text historical data to fully mine the semantic, grammatical and contextual information of natural language understanding such as the data itself and between data, so as to make the text semantic matching results in the refrigeration equipment scenario more accurate.
语义匹配的核心思想是将文本转化为语义向量表示,并计算与这些向量的距离或相似度很近的其他向量,以判断文本之间的相关性。语义匹配可以帮助我们更好地理解和处理自然语言文本。它可以应用于各种场景,例如文本分类、知识图谱构建、智能客服等。The core idea of semantic matching is to convert text into semantic vector representation and calculate other vectors that are close in distance or similarity to these vectors to determine the relevance between texts. Semantic matching can help us better understand and process natural language texts. It can be applied to various scenarios, such as text classification, knowledge graph construction, intelligent customer service, etc.
一般地,语义匹配需要比较大量文本之间的相似度,计算速度慢,匹配需要很长时间。而本发明通过预标注加槽位抽取的方法,使匹配速度大大加快,在槽位抽取时,包括三种情况:(1)未标注的经槽位抽取后匹配成功;(2)标注的经槽位抽取后匹配成功;(3)标注的经槽位抽取后匹配失败,对匹配失败的一类再通过深度融合网络模型及计算匹配。也就是说,其中有两类的文档的匹配速度得到提升,剩下的未匹配成功的一类,还会通过神经网络计算的方式,相对于现有的语义匹配速度也大大提高,且槽位抽取和神经网络的匹配准确率也很高,所以本发明大大加快了语义匹配速度和提高了准确率,下文对该匹配方法做更进一步的说明。Generally, semantic matching requires comparing the similarities between a large number of texts, which results in slow calculation speed and long matching time. However, the present invention greatly speeds up the matching speed by using the method of pre-marking and slot extraction. When extracting slots, there are three situations: (1) unmarked documents are successfully matched after slot extraction; (2) marked documents are successfully matched after slot extraction; (3) marked documents fail to match after slot extraction, and the category that fails to match is matched again through deep fusion network model and calculation. In other words, the matching speed of two categories of documents is improved, and the remaining category that fails to match is also greatly improved by neural network calculation compared to the existing semantic matching speed, and the matching accuracy of slot extraction and neural network is also very high. Therefore, the present invention greatly speeds up the semantic matching speed and improves the accuracy. The following is a further description of the matching method.
下面结合图2和图5,说明本发明一实施例提供的一种文本语义匹配方法,虽然本申请提供了如下述实施方式或流程图所示的方法操作步骤,但是基于常规或者无需创造性的劳动,所述方法在逻辑性上不存在必要因果关系的步骤中,这些步骤的执行顺序不限于本申请实施方式中所提供的执行顺序。例如下文的步骤S20、S30和S40的获取顺序可以任意调整或同时进行,不区分时间顺序上的先后。In conjunction with FIG. 2 and FIG. 5, a text semantic matching method provided by an embodiment of the present invention is described below. Although the present application provides the method operation steps shown in the following implementation or flowchart, based on routine or no creative labor, the execution order of the steps in the method where there is no necessary causal relationship in logic is not limited to the execution order provided in the implementation of the present application. For example, the acquisition order of steps S20, S30 and S40 below can be adjusted arbitrarily or performed simultaneously, without distinguishing the order in time sequence.
步骤S10:采集多模态和/或多源异构数据并预处理,其中,所述多模态数据包括与制冷设备场景相关的文本、音频和视频数据。Step S10: Collect and pre-process multimodal and/or multi-source heterogeneous data, wherein the multimodal data includes text, audio and video data related to the refrigeration equipment scenario.
其中,所述预处理包括对所述多模态数据的清洗、格式转化和存储。格式转化包括了对数据格式的解析。The preprocessing includes cleaning, format conversion and storage of the multimodal data. The format conversion includes parsing the data format.
文本通过客户端200和或冰箱端的交互屏幕10采集,客户端200可以包括手机、pad、PC端等设备,除了应用程序上产生的文本数据,还包括客服的文本、web端、以及小程序或公众号上的文本,文本数据的预处理可采用如停用词、去重等。The text is collected through the client 200 and/or the interactive screen 10 on the refrigerator. The client 200 may include devices such as mobile phones, pads, and PCs. In addition to the text data generated on the application, it also includes text from customer service, the web, and text on mini-programs or public accounts. The preprocessing of the text data may include stop words, deduplication, etc.
音频数据可以由手机端和或冰箱端的麦克风30采集,也可以经WiFi等无线连接至其他采音单元,获取其他设备上的采音单元的语音,麦克风30可以是单一麦克风30或者麦克风30阵列。Audio data can be collected by microphone 30 on the mobile phone and/or refrigerator, or can be wirelessly connected to other sound collection units via WiFi to obtain voice from sound collection units on other devices. Microphone 30 can be a single microphone 30 or a microphone 30 array.
视频数据可以由客户端200和或冰箱端的摄像头20采集,例如通过手机的app、摄像头20、蓝牙等采集,另外也可以由脚本进行语音、视频的分离,然后得到有效的语音和视频数据。The video data can be collected by the client 200 and/or the camera 20 on the refrigerator, for example, through a mobile phone app, camera 20, Bluetooth, etc., and the voice and video can also be separated by a script to obtain valid voice and video data.
通过上述诸多方式完成多种渠道及终端等多模态数据采集任务,保证数据完整性及多模态认知特性。Through the above-mentioned methods, multimodal data collection tasks such as multiple channels and terminals are completed to ensure data integrity and multimodal cognitive characteristics.
现有技术大多只采用文本这一单一数据,导致后期的数据学习中忽视了其他数据,进一步地用户以其他形式与冰箱交互时识别准确率低。本发明的文本来源包括了用户产生的各种模态的数据,对其进一步训练和文本语义匹配,准确率更高。Most existing technologies only use text as a single data, which leads to the neglect of other data in the later data learning, and further the low recognition accuracy when the user interacts with the refrigerator in other forms. The text source of the present invention includes data of various modes generated by users, which is further trained and matched with text semantics, with higher accuracy.
步骤S20:将视频数据转写为文本数据。Step S20: transcribe the video data into text data.
该步骤包括如下两个实施例,在其一实施例中,包括如下步骤:This step includes the following two embodiments. In one embodiment, it includes the following steps:
将所述视频数据中的音频和图像分离,得到音频数据和图像数据;Separating the audio and image in the video data to obtain audio data and image data;
识别所述图像数据中的文字信息,转写为文本数据;Recognize text information in the image data and transcribe it into text data;
基于时空和长距离依赖特征识别图像数据,转写为文本数据。Recognize image data based on spatiotemporal and long-distance dependent features and transcribe them into text data.
识别图像数据中的文字可以直接将其转写为文本数据,而对图像非文字的识别,可以是根据用户嘴部的动作的变化的多张图像识别用户所说的内容。且本发明考虑到如果只根据说话人的图像特征,可能识别出的句子会较复杂,所以本发明同时结合了句子长度因素和上下文关联性进行识别,句子长度因素包括了句子长度不一和单词构成不一的特征,采用基于时空和长距离依赖特征进行识别,挖掘句子序列丰富的语义特征信息。Recognition of text in image data can be directly transcribed into text data, while recognition of non-text in images can be based on multiple images of changes in the user's mouth movements to recognize what the user said. In addition, the present invention takes into account that if only the image features of the speaker are used, the sentences that may be recognized may be more complex, so the present invention combines sentence length factors and context relevance for recognition. The sentence length factor includes the features of different sentence lengths and different word compositions, and uses recognition based on spatiotemporal and long-distance dependency features to mine the rich semantic feature information of sentence sequences.
此外,为了加快模型的响应速度,将基于时空和长距离依赖特征识别图像通过知识蒸馏和扩散模型的融合模型生成学生模型,将原先大模型的知识迁移到学生网络中,通过所述学生模型将图像数据转写为文本数据。学生模型是一个离散时间步且步数短的模型,学生模型可以蒸馏为教师模型步数的一半。In addition, in order to speed up the response of the model, the image recognition based on spatiotemporal and long-distance dependent features is generated through the fusion model of knowledge distillation and diffusion model, and the knowledge of the original large model is transferred to the student network. The image data is transcribed into text data through the student model. The student model is a model with a discrete time step and a short number of steps. The student model can be distilled to half the number of steps of the teacher model.
在另一实施例中,可以通过扩散模型获取所述视频数据中的关键帧图像;In another embodiment, the key frame image in the video data may be obtained by a diffusion model;
识别所述关键帧图像生成文本数据。The key frame image is recognized to generate text data.
这里,通过识别图像的含义,以文本的形式将图像的内容说明出来,完成图像数据到文本数据的转写。Here, the meaning of the image is recognized and the content of the image is described in the form of text, thereby completing the transcription of image data into text data.
步骤S30:结合制冷设备100的语音时空特性和上下文关系特征,将音频数据转写为文本数据。Step S30: combining the speech temporal and spatial characteristics and contextual relationship features of the refrigeration device 100, the audio data is transcribed into text data.
该步骤的音频数据,一方面可以是直接采集到的音频数据,还可以包括由视频数据分割出的音频数据。The audio data in this step may be directly collected audio data, or may include audio data segmented from video data.
本实施例结合了制冷设备100如冰箱的使用场景的语音数据时空特性,通过端到端学习方法,建立基于融合神经网络MMCNN-RNN、CTC、以及Attention的深度循环卷积网络模型将音频数据转写为文本数据,获得丰富的高层语音特征信息,提高了模型语音转写文本的精度。This embodiment combines the spatiotemporal characteristics of voice data in the usage scenarios of refrigeration equipment 100 such as refrigerators, and establishes a deep recurrent convolutional network model based on the fusion neural network MMCNN-RNN, CTC, and Attention through an end-to-end learning method to transcribe audio data into text data, thereby obtaining rich high-level speech feature information and improving the accuracy of the model's speech-to-text transcription.
步骤S40:获取历史文本数据,其中,所述历史文本数据包括历史记录数据和历史交互数据。Step S40: Acquire historical text data, wherein the historical text data includes historical record data and historical interaction data.
与历史文本数据相对的,上述步骤S10采集到的多模态数据是实时的数据,步骤S40获取的是历史的数据,通过获取历史文本数据,一方面可以利用历史文本数据自身的信息,另一方面可以与实时的数据形成互补、关联的关系,充分获取文本数据的语义信息。Compared with historical text data, the multimodal data collected in the above step S10 is real-time data, and what is obtained in step S40 is historical data. By obtaining historical text data, on the one hand, the information of the historical text data itself can be utilized, and on the other hand, a complementary and related relationship can be formed with the real-time data to fully obtain the semantic information of the text data.
历史文本数据可以是在制冷设备100或客户端200上积累的很多未经标注的文本,获取历史文本数据后可以将采集到的数据进行清洗、格式转化等统一处理,使历史文本数据与实时采集的数据转写的文本格式统一,保证了数据特性的全面性与特殊性等特点。The historical text data may be a lot of unlabeled texts accumulated on the refrigeration equipment 100 or the client 200. After obtaining the historical text data, the collected data may be cleaned, formatted, and processed in a unified manner, so that the text format of the historical text data and the real-time collected data transcription is unified, thereby ensuring the comprehensiveness and specificity of the data characteristics.
进一步的,所述历史记录数据包括用户的食材偏好数据、兴趣数据和评论数据;所述历史交互数据包括从客户端200或制冷设备100的交互端获取的交互记录。历史文本数据同样包括了制冷设备100采集的数据、制冷设备100对应的客户端200上的数据。Furthermore, the historical record data includes the user's food preference data, interest data and comment data; the historical interaction data includes the interaction records obtained from the client 200 or the interaction end of the refrigeration device 100. The historical text data also includes the data collected by the refrigeration device 100 and the data on the client 200 corresponding to the refrigeration device 100.
步骤S50:预处理总文本数据。Step S50: pre-processing the total text data.
将所述多模态数据中的文本数据、所述音频数据转写的文本数据、所述视频数据转写的文本数据、以及所述历史文本数据汇总为总文本数据。The text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data are aggregated into total text data.
汇总为总文本数据体现了本实施例采用了多源异构数据,实时、离线的语音、视频、图像、文本及用户历史评论、食材偏好、食材兴趣等文本数据。The summary of total text data reflects that this embodiment uses multi-source heterogeneous data, real-time, offline voice, video, image, text and user historical comments, food preferences, food interests and other text data.
总文本数据中可以包括两类数据,第一类是可以进行预先标注的数据,第二类是不能或者不需要被标注的数据,不能或者不需要被标注的一般是短语,短语在后续步骤中也易于槽位抽取。需要进行预先标注一般是较长的句子,对该类型的句子进行标注,可以便于后续的分类和识别,标注的数据更有利于在后续步骤的槽位抽取中匹配到结果。The total text data can include two types of data. The first type is data that can be pre-annotated, and the second type is data that cannot or does not need to be annotated. Data that cannot or does not need to be annotated are generally short phrases, which are also easy to extract slots in subsequent steps. Pre-annotation is generally required for longer sentences. Annotating this type of sentence can facilitate subsequent classification and recognition. Annotated data is more conducive to matching results in slot extraction in subsequent steps.
对于可标注的文本数据,可以采用如下步骤:For annotated text data, the following steps can be used:
步骤S51:对所述总文本数据中可标注的文本进行标注,并将标注后的文本分别存储为训练数据和测试数据。Step S51: annotate the annotatable texts in the total text data, and store the annotated texts as training data and test data respectively.
具体的,可如图5所示,对可标注的文本依次进行预标注、正式标注、标注质检和数据存储,其中,当文本经过正式标注的值经过标注质检判断低于预设阈值时,将该文本返回预标注中重新标注。标注后的数据可以包括训练数据和测试数据。Specifically, as shown in FIG5 , the annotable text is pre-annotated, formally annotated, annotated quality checked, and stored in sequence, wherein when the value of the formally annotated text is judged to be lower than a preset threshold by the annotation quality check, the text is returned to the pre-annotation process for re-annotation. The annotated data may include training data and test data.
对于不可标注或不需要被标注的文本数据,可以采用如下步骤:For text data that cannot be annotated or does not need to be annotated, the following steps can be used:
步骤S52:对所述总文本数据中不可标注的文本进行数据清洗。Step S52: cleaning the text that cannot be marked in the total text data.
步骤S52中,将这些无法被标注的文本数据直接进行数据清洗、格式转化等任务,然后不经标注直接参与到后续槽位抽取中。In step S52, the text data that cannot be labeled are directly subjected to data cleaning, format conversion and other tasks, and then directly participate in the subsequent slot extraction without being labeled.
步骤S60:通过槽位抽取对所述总文本数据中不可标注的文本和已标注的文本均进行结果匹配,并判断匹配的结果,如图3所示。Step S60: performing result matching on both the unmarkable text and the marked text in the total text data through slot extraction, and judging the matching result, as shown in FIG3 .
若匹配成功则跳到后续步骤S80;If the match is successful, jump to the subsequent step S80;
若匹配失败,则进行后续步骤S70。If the matching fails, proceed to the subsequent step S70.
进一步的,槽位抽取的过程如图4所示,通过规则引擎对所述总文本数据中不可标注的文本和已标注的文本均进行结果匹配;Furthermore, the process of slot extraction is shown in FIG4 , and the rule engine is used to match the results of both the unlabeled text and the labeled text in the total text data;
当所述规则引擎检测出问题时,通过快速修复模块自动分析和修复规则定义,并重新通过规则引擎进行结果匹配。When the rule engine detects a problem, the rule definition is automatically analyzed and repaired through the quick repair module, and the result matching is performed again through the rule engine.
规则引擎是一种软件工具,可以在系统中定义和执行各种规则,例如业务规则、流程规则、数据校验规则等。规则引擎可能会出现各种问题,例如规则定义不准确、规则冲突、规则执行效率低下等,当规则引擎检测到违反规则的情况时,可以生成警告或错误消息。A rule engine is a software tool that can define and execute various rules in a system, such as business rules, process rules, data validation rules, etc. A rule engine may have various problems, such as inaccurate rule definitions, rule conflicts, inefficient rule execution, etc. When a rule engine detects a violation of a rule, it can generate a warning or error message.
快速修复技术可以快速识别和解决很多问题。快速修复技术通常涉及到自动化测试、代码分析、调试工具等方法,可以快速识别问题并提供解决方案。Quick fix techniques can quickly identify and resolve many issues. Quick fix techniques usually involve methods such as automated testing, code analysis, and debugging tools to quickly identify issues and provide solutions.
因此,本实施例通过结合规则引擎和快速修复技术实现对规则的快速修复。当规则引擎发现问题时,快速修复技术可以自动分析和修复规则定义,减少手动修复的时间和成本,提高系统的可靠性和效率,实现了对规则的实时更新修复的效果,避免了离线无更新规则,需要花费大量时间和精力来定义和维护规则集合的问题。Therefore, this embodiment realizes the rapid repair of rules by combining the rule engine and the rapid repair technology. When the rule engine finds a problem, the rapid repair technology can automatically analyze and repair the rule definition, reduce the time and cost of manual repair, improve the reliability and efficiency of the system, and achieve the effect of real-time update and repair of rules, avoiding the problem of offline no updated rules, and needing to spend a lot of time and energy to define and maintain the rule set.
当所述快速修复模块无法解决所述规则引擎检测出的问题、或者规则修复后所述规则引擎依旧无法进行结果匹配时,所述槽位抽取匹配失败。When the quick repair module cannot solve the problem detected by the rule engine, or the rule engine still cannot match the result after the rule is repaired, the slot extraction match fails.
下文针对槽位抽取失败的这一类文本进行语义匹配。The following article performs semantic matching on the text that fails slot extraction.
步骤S70:通过神经网络计算语义匹配结果。Step S70: Calculate the semantic matching result through the neural network.
步骤S70如图3所示,具体包括步骤S71、步骤S72、步骤S73;Step S70 is shown in FIG3 , and specifically includes step S71 , step S72 , and step S73 ;
步骤S71:将所述总文本数据通过深度融合网络模型进行特征提取,用所述训练数据训练所述深度融合网络模型,用所述测试数据通过所述深度融合网络模型预测结果。Step S71: extract features from the total text data through a deep fusion network model, train the deep fusion network model with the training data, and predict results with the deep fusion network model using the test data.
结合上文步骤S51的被标注的数据可以包括训练数据和测试数据,步骤S71可以基于训练数据对所构建的深度融合模型进行预训练任务,再基于测试数据进行结果预测,获得丰富的语义特征信息,从而得到最佳有效的模型,保障了预测结果最优,用户反馈信息准确率更高。In combination with the above step S51, the annotated data may include training data and test data. Step S71 may perform pre-training tasks on the constructed deep fusion model based on the training data, and then predict the results based on the test data to obtain rich semantic feature information, thereby obtaining the best and most effective model, ensuring the optimal prediction results and higher accuracy of user feedback information.
步骤S71的所述文本向量化模型对字、词、短语、句子层级文本编码;所述多维度特征提取模型使用多头注意力机制,分别提取字、词、词语、句子交互与关联特征、上下文语义信息。The text vectorization model of step S71 encodes text at the character, word, phrase, and sentence levels; the multi-dimensional feature extraction model uses a multi-head attention mechanism to extract character, word, phrase, sentence interaction and association features, and contextual semantic information.
步骤S72:将失败对应的文本数据传输至深度融合网络模型进行特征提取,提取出交互特征;Step S72: Transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction, and extracting the interaction features;
通过注意力加权求和,计算交互特征之间的聚合交互特征信息;By weighted summation of attention, the aggregated interactive feature information between interactive features is calculated;
通过注意力机制增强,计算交互特征之间的差异化交互特征信息。Enhanced by the attention mechanism, the differentiated interactive feature information between interactive features is calculated.
聚合交互特征信息可以将不同特征之间的交互信息进行聚合,从而得到更丰富的特征表示。它可以提高模型的预测性能,尤其是在特征之间存在复杂交互关系的情况下,通过注意力加权求和对特征信息进行学习和聚合,得到更丰富的特征表示。Aggregating interactive feature information can aggregate the interactive information between different features to obtain a richer feature representation. It can improve the prediction performance of the model, especially when there are complex interactive relationships between features. Feature information can be learned and aggregated through attention weighted summation to obtain a richer feature representation.
差异化交互特征信息是通过计算不同特征之间的差异性,来构建新的特征表示。它可以捕捉不同特征之间的重要交互关系,从而提高模型的预测能力。差异化的交互特征信息的计算方法可以是利用注意力机制增强计算交互特征之间的差异,差异化交互特征信息可以作为模型的输入,在后续的神经网络等模型进行学习和聚合,得到更准确的预测结果。Differentiated interactive feature information is to construct a new feature representation by calculating the differences between different features. It can capture the important interactive relationships between different features, thereby improving the prediction ability of the model. The calculation method of differentiated interactive feature information can be to use the attention mechanism to enhance the calculation of the differences between interactive features. Differentiated interactive feature information can be used as the input of the model, and can be learned and aggregated in subsequent neural network models to obtain more accurate prediction results.
步骤S73:根据特征信息、聚合交互特征信息、差异化交互特征信息,计算文本语义的匹配结果。Step S73: Calculate the text semantic matching result according to the feature information, the aggregated interaction feature information, and the differentiated interaction feature information.
步骤S73中,将特征提取后的向量依次经过全连接层和自注意力机制计算得到文本的语义匹配结果,这里,先通过全连接层计算语义匹配结果,再通过自注意力机制计算得到文本交互性更强的语义关系量化结果,In step S73, the feature-extracted vector is sequentially passed through the fully connected layer and the self-attention mechanism to obtain the semantic matching result of the text. Here, the semantic matching result is first calculated through the fully connected layer, and then the self-attention mechanism is used to calculate the semantic relationship quantification result with stronger text interactivity.
另外,步骤S73还可以采用带有消岐的语义匹配相似度进行计算,基于词项与词项间的距离的阈值大小控制来实现,这样可以去除文本中的一些冗余信息。In addition, step S73 can also be calculated using semantic matching similarity with disambiguation, which can be achieved based on the threshold value control of the distance between terms, so that some redundant information in the text can be removed.
步骤S80:结果触达。Step S80: Result reached.
将上述步骤运行完成后的语义匹配结果进行触达任务,结果触达方法可以采用自带或外带的多种形式,如外呼、短信触达、邮件通知、大屏展现、语音播报、文本输出、智能音箱、弹窗UI、app端、PAD、web端等多种结果触达方式,满足了结果展现方式的需求及数字化显示。After the above steps are completed, the semantic matching results are used for the contact task. The result contact method can adopt a variety of built-in or external forms, such as outbound calls, SMS contact, email notification, large-screen display, voice broadcast, text output, smart speakers, pop-up UI, app, PAD, web and other result contact methods to meet the needs of result presentation and digital display.
与现有技术相比,本实施例具有以下有益效果:Compared with the prior art, this embodiment has the following beneficial effects:
该文本语义匹配方法的匹配利用了先进行标注再槽位抽取机制,使一些文本数据被标注后更易于槽位抽取,且未标注的一般是短语,短语易于槽位抽取,所以在槽位抽取时,包括三种情况:未标注的经槽位抽取后匹配成功;标注的经槽位抽取后匹配成功;标注的经槽位抽取后匹配失败,对匹配失败的一类再通过深度融合网络模型及计算匹配;使文本整体的匹配速度大大提高,且匹配精准度高,提高了用户的使用体验。The matching of this text semantic matching method utilizes the mechanism of first marking and then slot extraction, so that some text data are easier to slot extract after being marked, and the unmarked ones are generally phrases, which are easy to slot extract. Therefore, when extracting the slots, there are three situations: the unmarked ones are successfully matched after slot extraction; the marked ones are successfully matched after slot extraction; the marked ones fail to match after slot extraction, and the category of failed matching is matched again through deep fusion network model and calculation; the overall matching speed of the text is greatly improved, and the matching accuracy is high, which improves the user experience.
在一个实施例中,本发明还提出了一种制冷设备系统,其包括存储模块60和处理模块50,处理模块50执行所述计算机程序时可实现上述的文本语义匹配方法中的步骤,也就是说,实现上述文本语义匹配方法中的任意一个技术方案中的步骤。In one embodiment, the present invention also proposes a refrigeration equipment system, which includes a storage module 60 and a processing module 50. When the processing module 50 executes the computer program, it can implement the steps in the above-mentioned text semantic matching method, that is, implement the steps in any one of the technical solutions in the above-mentioned text semantic matching method.
制冷设备系统还可以如图6所示,包括如下多个模块,各模块具体功能如下:The refrigeration equipment system may also include the following multiple modules as shown in FIG6 , and the specific functions of each module are as follows:
采集模块,用于采集多模态数据和/或多源异构数据并预处理,其中,所述多模态数据包括与制冷设备场景相关的文本、音频和视频数据;A collection module, used for collecting and preprocessing multimodal data and/or multi-source heterogeneous data, wherein the multimodal data includes text, audio and video data related to the refrigeration equipment scene;
视频转写模块,用于将视频数据转写为文本数据;A video transcription module, used to transcribe video data into text data;
音频转写模块,用于结合制冷设备100的语音时空特性和上下文关系特征,将音频数据转写为文本数据;An audio transcription module, used to transcribe audio data into text data by combining the speech spatiotemporal characteristics and contextual relationship characteristics of the refrigeration device 100;
获取模块,用于获取历史文本数据,其中,所述历史文本数据包括历史记录数据和历史交互数据;An acquisition module, used for acquiring historical text data, wherein the historical text data includes historical record data and historical interaction data;
智能标注模块,用于将所述多模态数据中的文本数据、所述音频数据转写的文本数据、所述视频数据转写的文本数据、以及所述历史文本数据汇总为总文本数据,并对所述总文本数据中可标注的文本进行标注;An intelligent annotation module, for aggregating the text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data into total text data, and annotating the annotatable text in the total text data;
槽位抽取模块,用于通过槽位抽取对所述总文本数据中不可标注的文本和已标注的文本均进行结果匹配;A slot extraction module, used for matching the results of the unmarkable text and the marked text in the total text data through slot extraction;
特征提取模块,用于在匹配失败后,将所述总文本数据通过深度融合模型进行特征提取,其中,所述深度融合模型是文本向量化模型和多维度特征提取模型的融合模型;A feature extraction module, used for extracting features from the total text data through a deep fusion model after the matching fails, wherein the deep fusion model is a fusion model of a text vectorization model and a multi-dimensional feature extraction model;
聚合差异模块,用于计算交互特征之间的聚合交互特征信息,计算交互特征之间的差异化交互特征信息;An aggregation difference module is used to calculate the aggregated interaction feature information between the interaction features and calculate the differentiated interaction feature information between the interaction features;
语义匹配模块,用于计算语义匹配结果;Semantic matching module, used to calculate semantic matching results;
触达模块,用于结果触达。Reach module, used for result reach.
需要说明的是,本发明实施例的制冷设备系统中未披露的细节,请参照本发明实施例的文本语义匹配方法中所披露的细节。It should be noted that for details not disclosed in the refrigeration equipment system of the embodiment of the present invention, please refer to the details disclosed in the text semantic matching method of the embodiment of the present invention.
制冷设备系统还可以包括制冷设备100、手机、计算机、笔记本、掌上电脑及云端服务器等计算设备,以及包括但不限于处理模块50、存储模块60、以及存储在存储模块60中并可在处理模块50上运行的计算机程序,例如上述的文本语义匹配方法程序。所述处理模块50执行所述计算机程序时实现上述各个文本语义匹配方法实施例中的步骤,例如图2~5所示的步骤。The refrigeration equipment system may also include refrigeration equipment 100, computing devices such as mobile phones, computers, notebooks, PDAs, and cloud servers, and include but are not limited to a processing module 50, a storage module 60, and a computer program stored in the storage module 60 and executable on the processing module 50, such as the above-mentioned text semantic matching method program. When the processing module 50 executes the computer program, the steps in the above-mentioned text semantic matching method embodiments are implemented, such as the steps shown in FIGS. 2 to 5.
制冷设备系统还可以包括信号传输模块和通信总线70。如图7所示,信号传输模块用于将数据发送至处理模块50或服务器,例如制冷设备100与手机之间、制冷设备100与服务器之间通过信号传输模块传输数据,信号传输模块可以通过无线连接的形式传输数据,如蓝牙、wifi、ZigBee等,通信总线70用于将信号传输模块、处理模块50与存储模块60之间建立连接,通信总线70可包括一通路,在上述的信号传输模块、处理模块50与存储模块60之间传送信息。The refrigeration equipment system may further include a signal transmission module and a communication bus 70. As shown in FIG7 , the signal transmission module is used to send data to the processing module 50 or the server, for example, data is transmitted between the refrigeration equipment 100 and the mobile phone, or between the refrigeration equipment 100 and the server through the signal transmission module, and the signal transmission module may transmit data in the form of a wireless connection, such as Bluetooth, wifi, ZigBee, etc. The communication bus 70 is used to establish a connection between the signal transmission module, the processing module 50 and the storage module 60, and the communication bus 70 may include a passage to transmit information between the above-mentioned signal transmission module, the processing module 50 and the storage module 60.
处理模块50与存储模块60可以是集成于制冷设备100内的一部分、或者是手机的一部分、本地的终端设备、还可以是云端服务器的一部分。The processing module 50 and the storage module 60 may be a part integrated into the refrigeration device 100, or a part of a mobile phone, a local terminal device, or a part of a cloud server.
处理模块50可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field- Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器,也可以是任何常规的处理器。处理模块50是制冷设备系统的控制中心,利用各种接口和线路连接整个制冷设备系统的各个部分。The processing module 50 can be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor. The processing module 50 is the control center of the refrigeration equipment system, and uses various interfaces and lines to connect various parts of the entire refrigeration equipment system.
存储模块60可用于存储所述计算机程序和/或模块,处理模块50通过运行或执行存储在存储模块60内的计算机程序和/或模块,以及调用存储在存储模块60内的数据,实现制冷设备系统的各种功能。存储模块60可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等。此外,存储模块60可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)、至少—个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The storage module 60 can be used to store the computer program and/or module. The processing module 50 realizes various functions of the refrigeration equipment system by running or executing the computer program and/or module stored in the storage module 60 and calling the data stored in the storage module 60. The storage module 60 can mainly include a storage program area and a storage data area, wherein the storage program area can store an operating system, at least one application required for a function, etc. In addition, the storage module 60 can include a high-speed random access memory, and can also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), at least one disk storage device, a flash memory device, or other volatile solid-state storage devices.
示例性的,所述计算机程序可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在存储模块60中,并由处理模块50执行,以完成本发明。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序在制冷设备系统中的执行过程。Exemplarily, the computer program may be divided into one or more modules/units, which are stored in the storage module 60 and executed by the processing module 50 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of implementing specific functions, and the instruction segments are used to describe the execution process of the computer program in the refrigeration equipment system.
进一步地,本发明一实施例提供了一种可读存储介质,其存储有计算机程序,该计算机程序被处理模块50执行时可实现上述的文本语义匹配方法中的步骤,也就是说,实现上述文本语义匹配方法中的任意一个技术方案中的步骤。Furthermore, an embodiment of the present invention provides a readable storage medium storing a computer program, which, when executed by the processing module 50, can implement the steps in the above-mentioned text semantic matching method, that is, implement the steps in any one of the technical solutions in the above-mentioned text semantic matching method.
所述文本语义匹配方法集成的模块如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理模块50执行时,可实现上述各个方法实施例的步骤。If the module integrated in the text semantic matching method is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present invention implements all or part of the processes in the above-mentioned embodiment method, and can also be completed by instructing related hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processing module 50, the steps of each of the above-mentioned method embodiments can be implemented.
其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、∪盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。The computer program includes computer program code, which may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, recording medium, disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer readable media do not include electric carrier signals and telecommunication signals.
应当理解,虽然本说明书按照实施方式加以描述,但并非每个实施方式仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施方式中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。It should be understood that although this specification is described according to implementation modes, not every implementation mode contains only one independent technical solution. This description of the specification is only for the sake of clarity. Those skilled in the art should regard the specification as a whole. The technical solutions in each implementation mode may also be appropriately combined to form other implementation modes that can be understood by those skilled in the art.
上文所列出的一系列的详细说明仅仅是针对本发明的可行性实施方式的具体说明,它们并非用以限制本发明的保护范围,凡未脱离本发明技艺精神所作的等效实施方式或变更均应包含在本发明的保护范围之内。The series of detailed descriptions listed above are only specific descriptions of feasible implementation methods of the present invention. They are not intended to limit the scope of protection of the present invention. Any equivalent implementation methods or changes that do not deviate from the technical spirit of the present invention should be included in the scope of protection of the present invention.
Claims (15)
- 一种文本语义匹配方法,其特征在于,包括如下步骤:A text semantic matching method, characterized by comprising the following steps:将总文本数据中可标注的文本进行标注;Annotate the annotable texts in the total text data;通过槽位抽取对所述总文本数据中不可标注的文本和已标注的文本均进行结果匹配,并判断匹配的结果;Performing result matching on both the unmarkable text and the marked text in the total text data through slot extraction, and determining the matching result;若匹配成功则输出匹配结果;If the match is successful, the matching result is output;若匹配失败,将失败对应的文本数据传输至深度融合网络模型进行特征提取,再根据特征提取结果计算文本语义的匹配结果;其中,所述深度融合网络模型是文本向量化模型和多维度特征提取模型的融合模型,所述文本向量化模型将文本数据向量化,所述多维度特征提取模型提取多维度交互特征和关联特征。If the matching fails, the text data corresponding to the failure is transmitted to the deep fusion network model for feature extraction, and then the matching result of the text semantics is calculated based on the feature extraction result; wherein, the deep fusion network model is a fusion model of a text vectorization model and a multi-dimensional feature extraction model, the text vectorization model vectorizes the text data, and the multi-dimensional feature extraction model extracts multi-dimensional interaction features and correlation features.
- 根据权利要求1所述的文本语义匹配方法,其特征在于,所述步骤通过槽位抽取对所述总文本数据中不可标注的文本和已标注的文本均进行结果匹配包括:The text semantic matching method according to claim 1 is characterized in that the step of matching the results of both the unlabeled text and the labeled text in the total text data by slot extraction comprises:通过规则引擎对所述总文本数据中不可标注的文本和已标注的文本均进行结果匹配;Matching the results of the unmarkable text and the marked text in the total text data by using a rule engine;当所述规则引擎检测出问题时,通过快速修复模块自动分析和修复规则定义,并重新通过规则引擎进行结果匹配。When the rule engine detects a problem, the rule definition is automatically analyzed and repaired through the quick repair module, and the result matching is performed again through the rule engine.
- 根据权利要求2所述的文本语义匹配方法,其特征在于,还包括步骤:The text semantic matching method according to claim 2, characterized in that it also includes the steps of:当所述快速修复模块无法解决所述规则引擎检测出的问题、或者规则修复后所述规则引擎依旧无法进行结果匹配时,所述槽位抽取匹配失败;其中,所述问题包括规则定义不准确、规则冲突、或规则执行效率低下。When the quick repair module cannot solve the problem detected by the rule engine, or the rule engine still cannot match the results after the rule is repaired, the slot extraction match fails; wherein the problem includes inaccurate rule definition, rule conflict, or inefficient rule execution.
- 根据权利要求1所述的文本语义匹配方法,其特征在于,所述步骤若匹配失败还包括:The text semantic matching method according to claim 1, characterized in that if the matching fails, the step further comprises:将失败对应的文本数据传输至深度融合网络模型进行特征提取,提取出交互特征;The text data corresponding to the failure is transmitted to the deep fusion network model for feature extraction, and the interaction features are extracted;计算交互特征之间的聚合交互特征信息;Calculate aggregated interactive feature information between interactive features;计算交互特征之间的差异化交互特征信息;Calculate the differential interaction feature information between the interaction features;根据特征信息、聚合交互特征信息、差异化交互特征信息,计算文本语义的匹配结果;Calculate the text semantic matching results based on feature information, aggregated interactive feature information, and differentiated interactive feature information;所述步骤计算交互特征之间的聚合交互特征信息包括:The step of calculating aggregated interactive feature information between interactive features includes:通过注意力加权求和,计算交互特征之间的聚合交互特征信息;By weighted summation of attention, the aggregated interactive feature information between interactive features is calculated;所述步骤计算交互特征之间的差异化交互特征信息包括:The step of calculating the differentiated interactive feature information between the interactive features includes:通过注意力机制增强,计算交互特征之间的差异化交互特征信息。Enhanced by the attention mechanism, the differentiated interactive feature information between interactive features is calculated.
- 根据权利要求1所述的文本语义匹配方法,其特征在于,所述步骤对所述总文本数据中可标注的文本进行标注包括:The text semantic matching method according to claim 1, characterized in that the step of marking the markable text in the total text data comprises:对可标注的文本依次进行预标注、正式标注和标注质检,其中,当文本经过正式标注的值经过标注质检判断低于预设阈值时,将该文本返回预标注中重新标注。The annotable text is subjected to pre-annotation, formal annotation and annotation quality inspection in sequence. When the value of the text after formal annotation is judged by the annotation quality inspection to be lower than the preset threshold, the text is returned to the pre-annotation process for re-annotation.
- 根据权利要求1所述的文本语义匹配方法,其特征在于,所述步骤将总文本数据中可标注的文本进行标注包括:The text semantic matching method according to claim 1 is characterized in that the step of marking the markable text in the total text data comprises:将总文本数据中可标注的文本进行标注,并将标注后的文本分别存储为训练数据和测试数据;Annotate the annotable texts in the total text data, and store the annotated texts as training data and test data respectively;所述步骤将失败对应的文本数据传输至深度融合网络模型进行特征提取包括:The step of transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction includes:对于匹配失败的被标注的文本数据,用所述训练数据训练所述深度融合网络模型,用所述测试数据通过所述深度融合网络模型预测结果。For the annotated text data that fails to match, the deep fusion network model is trained with the training data, and the result is predicted by the deep fusion network model using the test data.
- 根据权利要求1所述的文本语义匹配方法,其特征在于,还包括步骤:The text semantic matching method according to claim 1, characterized in that it also includes the steps of:对所述总文本数据中不可标注的文本进行数据清洗;Performing data cleaning on the text that cannot be labeled in the total text data;所述步骤将失败对应的文本数据传输至深度融合网络模型进行特征提取包括:The step of transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction includes:对于匹配失败的不可标注的文本数据,采用无监督学习算法用所述不可标注的文本训练所述深度融合网络模型。For the unlabeled text data that failed to match, an unsupervised learning algorithm is used to train the deep fusion network model with the unlabeled text.
- 根据权利要求1所述的文本语义匹配方法,其特征在于,所述总文本数据是由多模态数据和/或多源异构数据全部转写为文本数据并汇总而成;The text semantic matching method according to claim 1 is characterized in that the total text data is obtained by transcribing all multimodal data and/or multi-source heterogeneous data into text data and summarizing them;所述文本语义匹配方法还包括步骤:The text semantic matching method also includes the steps of:采集多模态数据和/或多源异构数据并预处理,其中,所述多模态数据包括与文本、音频和视频数据,所述预处理包括对所述多模态数据的清洗、格式转化和存储;Collecting multimodal data and/or multi-source heterogeneous data and preprocessing them, wherein the multimodal data includes text, audio and video data, and the preprocessing includes cleaning, format conversion and storage of the multimodal data;将视频数据转写为文本数据;Transcribing video data into text data;将音频数据转写为文本数据;transcribe audio data into text data;获取历史文本数据;Get historical text data;将所述多模态数据中的文本数据、所述音频数据转写的文本数据、所述视频数据转写的文本数据、以及所述历史文本数据汇总为所述总文本数据。The text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data are aggregated into the total text data.
- 根据权利要求8所述的文本语义匹配方法,其特征在于,所述步骤将视频数据转写为文本数据包括:The text semantic matching method according to claim 8, wherein the step of transcribing the video data into text data comprises:将所述视频数据中的音频和图像分离,得到音频数据和图像数据;Separating the audio and image in the video data to obtain audio data and image data;识别所述图像数据中的文字信息,转写为文本数据;Recognize text information in the image data and transcribe it into text data;基于时空和长距离依赖特征识别图像数据,转写为文本数据;Recognize image data based on spatiotemporal and long-distance dependent features and transcribe them into text data;其中,所述步骤基于时空和长距离依赖特征识别图像,转写为文本数据包括:The step of recognizing an image based on spatiotemporal and long-distance dependent features and transcribing the image into text data includes:将基于时空和长距离依赖特征识别图像通过知识蒸馏和扩散模型的融合模型生成学生模型,通过所述学生模型将图像数据转写为文本数据。The image recognition based on spatiotemporal and long-distance dependent features is generated into a student model through a fusion model of knowledge distillation and diffusion model, and the image data is transcribed into text data through the student model.
- 根据权利要求8所述的文本语义匹配方法,其特征在于,所述步骤将视频数据转写为文本数据包括:The text semantic matching method according to claim 8, wherein the step of transcribing the video data into text data comprises:通过扩散网络模型获取所述视频数据中的关键帧图像;Acquire key frame images in the video data through a diffusion network model;识别所述关键帧图像生成文本数据。The key frame image is recognized to generate text data.
- 根据权利要求8所述的文本语义匹配方法,其特征在于,所述步骤将音频数据转写为文本数据包括:The text semantic matching method according to claim 8, characterized in that the step of transcribing the audio data into text data comprises:结合的获取数据的场景下的语音时空特性和上下文关系特征,建立基于融合神经网络MMCNN-RNN、CTC、以及Attention的深度循环卷积网络模型将音频数据转写为文本数据。Combining the spatiotemporal characteristics of speech and contextual relationship features in the scenario where the data is acquired, a deep recurrent convolutional network model based on the fusion neural network MMCNN-RNN, CTC, and Attention is established to transcribe audio data into text data.
- 根据权利要求8所述的文本语义匹配方法,其特征在于,所述历史文本数据包括历史记录数据和历史交互数据,其中,所述历史记录数据包括用户的食材偏好数据、兴趣数据和评论数据,所述历史交互数据包括从客户端或制冷设备的交互端获取的交互记录。The text semantic matching method according to claim 8 is characterized in that the historical text data includes historical record data and historical interaction data, wherein the historical record data includes the user's food preference data, interest data and comment data, and the historical interaction data includes interaction records obtained from the client or the interaction end of the refrigeration equipment.
- 根据权利要求1所述的文本语义匹配方法,其特征在于,所述文本向量化模型对字、词、短语、句子层级文本编码;The text semantic matching method according to claim 1, characterized in that the text vectorization model encodes text at the character, word, phrase, and sentence levels;所述多维度特征提取模型使用多头注意力机制,分别提取字、词、词语、句子交互与关联特征、上下文语义信息。The multi-dimensional feature extraction model uses a multi-head attention mechanism to extract the interaction and association features of characters, words, phrases, and sentences, as well as contextual semantic information.
- 根据权利要求1所述的文本语义匹配方法,其特征在于,所述步骤再根据特征提取结果计算文本语义的匹配结果包括:将特征提取后的向量依次经过全连接层和自注意力机制计算得到文本语义匹配结果;The text semantic matching method according to claim 1 is characterized in that the step of calculating the text semantic matching result based on the feature extraction result comprises: sequentially passing the feature extracted vector through a fully connected layer and a self-attention mechanism to calculate the text semantic matching result;所述文本语义匹配方法还包括步骤:The text semantic matching method also includes the steps of:将匹配成功的匹配结果和匹配失败经过重新计算出的匹配结果均结果触达。The matching results of successful matches and the matching results of failed matches after recalculation are all reached.
- 一种制冷设备系统,其特征在于,包括:A refrigeration equipment system, characterized in that it comprises:存储模块,存储计算机程序;A storage module storing a computer program;处理模块,执行所述计算机程序时可实现权利要求1至14中任意一项所述的文本语义匹配方法中的步骤。A processing module, which can implement the steps of the text semantic matching method described in any one of claims 1 to 14 when executing the computer program.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310247263.1A CN116521821A (en) | 2023-03-15 | 2023-03-15 | Text semantic matching method and refrigeration equipment system |
CN202310247263.1 | 2023-03-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024188277A1 true WO2024188277A1 (en) | 2024-09-19 |
Family
ID=87401935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2024/081468 WO2024188277A1 (en) | 2023-03-15 | 2024-03-13 | Text semantic matching method and refrigeration device system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116521821A (en) |
WO (1) | WO2024188277A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116521821A (en) * | 2023-03-15 | 2023-08-01 | 青岛海尔电冰箱有限公司 | Text semantic matching method and refrigeration equipment system |
CN116431805A (en) * | 2023-03-15 | 2023-07-14 | 青岛海尔电冰箱有限公司 | Text classification method and refrigeration equipment system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150331852A1 (en) * | 2012-12-27 | 2015-11-19 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
CN111046674A (en) * | 2019-12-20 | 2020-04-21 | 科大讯飞股份有限公司 | Semantic understanding method and device, electronic equipment and storage medium |
WO2022083094A1 (en) * | 2020-10-23 | 2022-04-28 | 平安科技(深圳)有限公司 | Text semantic recognition method and apparatus, electronic device, and storage medium |
CN115098765A (en) * | 2022-05-20 | 2022-09-23 | 青岛海尔电冰箱有限公司 | Information pushing method, device and equipment based on deep learning and storage medium |
CN116521821A (en) * | 2023-03-15 | 2023-08-01 | 青岛海尔电冰箱有限公司 | Text semantic matching method and refrigeration equipment system |
-
2023
- 2023-03-15 CN CN202310247263.1A patent/CN116521821A/en active Pending
-
2024
- 2024-03-13 WO PCT/CN2024/081468 patent/WO2024188277A1/en unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150331852A1 (en) * | 2012-12-27 | 2015-11-19 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
CN111046674A (en) * | 2019-12-20 | 2020-04-21 | 科大讯飞股份有限公司 | Semantic understanding method and device, electronic equipment and storage medium |
WO2022083094A1 (en) * | 2020-10-23 | 2022-04-28 | 平安科技(深圳)有限公司 | Text semantic recognition method and apparatus, electronic device, and storage medium |
CN115098765A (en) * | 2022-05-20 | 2022-09-23 | 青岛海尔电冰箱有限公司 | Information pushing method, device and equipment based on deep learning and storage medium |
CN116521821A (en) * | 2023-03-15 | 2023-08-01 | 青岛海尔电冰箱有限公司 | Text semantic matching method and refrigeration equipment system |
Also Published As
Publication number | Publication date |
---|---|
CN116521821A (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110147726B (en) | Service quality inspection method and device, storage medium and electronic device | |
CN107832286B (en) | Intelligent interaction method, equipment and storage medium | |
WO2024188277A1 (en) | Text semantic matching method and refrigeration device system | |
CN110517689B (en) | Voice data processing method, device and storage medium | |
US20220383854A1 (en) | Intent recognition method and intent recognition system having self learning capability | |
KR102041621B1 (en) | System for providing artificial intelligence based dialogue type corpus analyze service, and building method therefor | |
JP2017534941A (en) | Orphan utterance detection system and method | |
WO2018045646A1 (en) | Artificial intelligence-based method and device for human-machine interaction | |
JP2022088304A (en) | Method for processing video, device, electronic device, medium, and computer program | |
US20220414463A1 (en) | Automated troubleshooter | |
WO2024193596A1 (en) | Natural language understanding method and refrigerator | |
US20240064383A1 (en) | Method and Apparatus for Generating Video Corpus, and Related Device | |
WO2024140432A1 (en) | Ingredient recommendation method based on knowledge graph, and device and storage medium | |
CN110956958A (en) | Searching method, searching device, terminal equipment and storage medium | |
WO2024188276A1 (en) | Text classification method and refrigeration device system | |
KR20200130400A (en) | Voice-based search for digital content on the network | |
CN114065720A (en) | Conference summary generation method and device, storage medium and electronic equipment | |
WO2024093578A1 (en) | Voice recognition method and apparatus, and electronic device, storage medium and computer program product | |
TW202211077A (en) | Multi-language speech recognition and translation method and system | |
CN118114679A (en) | Service dialogue quality control method, system, electronic equipment and storage medium | |
Yu et al. | Incorporating multimodal sentiments into conversational bots for service requirement elicitation | |
CN115129865A (en) | Work order classification method and device, electronic equipment and storage medium | |
CN109739970B (en) | Information processing method and device and electronic equipment | |
US20210012791A1 (en) | Image representation of a conversation to self-supervised learning | |
CN114120425A (en) | Emotion recognition method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24769968 Country of ref document: EP Kind code of ref document: A1 |