WO2024188277A1

WO2024188277A1 - Text semantic matching method and refrigeration device system

Info

Publication number: WO2024188277A1
Application number: PCT/CN2024/081468
Authority: WO
Inventors: 曾谁飞; 李华刚; 孔令磊; 张景瑞; 刘卫强; 李敏
Original assignee: 青岛海尔电冰箱有限公司; 青岛海尔智能技术研发有限公司; 海尔智家股份有限公司
Priority date: 2023-03-15
Filing date: 2024-03-13
Publication date: 2024-09-19
Also published as: CN116521821A

Abstract

Disclosed in the present invention are a text semantic matching method and a refrigeration device system. The method comprises: annotating annotatable text in total text data; performing result matching on both unannotatable text and annotated text in the total text data by means of slot extraction; if the matching is successful, outputting a matching result; and if the matching fails, calculating a matching result by means of a neural network. In the method, matching is performed by using a mechanism of performing annotation first and then performing slot extraction, such that the overall matching speed of text is greatly increased.

Description

Text semantic matching method and refrigeration equipment system

This application claims the priority of a Chinese patent application filed on March 15, 2023, with application number 202310247263.1, and invention name “Text Semantic Matching Method and Refrigeration Equipment System”, all contents of which are incorporated by reference into this application.

Technical Field

The present invention relates to the technical field of refrigeration equipment, and in particular to a text semantic matching method and a refrigeration equipment system.

Background Art

With the advancement of artificial intelligence, people hope to introduce artificial intelligence into the refrigerator field to make refrigerators more intelligent. In the process of refrigerator intelligence, a large number of optimization processes for refrigerator scenarios are involved, including optimization of various types of interactions between users and refrigerators, such as oral, text, and video. During the optimization process, the inventor found that the existing technology has the following problems:

The existing interactions have a slow response speed and are not accurate enough, which cannot meet the user's needs for instant and clear communication. Users clearly feel the inconvenience of human-computer dialogue, rather than the naturalness of human-to-human dialogue, and the user experience is poor.

Summary of the invention

In order to solve at least one of the above-mentioned problems in the prior art, an object of the present invention is to provide a text semantic matching method and a refrigeration equipment system with fast interactive response speed and accurate feedback information.

To achieve the above-mentioned object of the invention, an embodiment of the present invention provides a text semantic matching method, comprising the following steps:

Annotate the annotable texts in the total text data;

Performing result matching on both the unmarkable text and the marked text in the total text data through slot extraction, and determining the matching result;

If the match is successful, the matching result is output;

If the matching fails, the text data corresponding to the failure is transmitted to the deep fusion network model for feature extraction, and then the matching result of the text semantics is calculated based on the feature extraction result; wherein, the deep fusion network model is a fusion model of a text vectorization model and a multi-dimensional feature extraction model, the text vectorization model vectorizes the text data, and the multi-dimensional feature extraction model extracts multi-dimensional interaction features and correlation features.

As a further improvement of the present invention, the step of performing result matching on both the unmarkable text and the marked text in the total text data through slot extraction includes:

Matching the results of the unmarkable text and the marked text in the total text data by using a rule engine;

When the rule engine detects a problem, the rule definition is automatically analyzed and repaired through the quick repair module, and the result matching is performed again through the rule engine.

As a further improvement of the present invention, the present invention further comprises the steps of:

When the quick repair module cannot solve the problem detected by the rule engine, or the rule engine still cannot match the results after the rule is repaired, the slot extraction match fails; wherein the problem includes inaccurate rule definition, rule conflict, or inefficient rule execution.

As a further improvement of the present invention, if the matching fails, the steps further include:

The text data corresponding to the failure is transmitted to the deep fusion network model for feature extraction, and the interaction features are extracted;

Calculate aggregated interactive feature information between interactive features;

Calculate the differential interaction feature information between the interaction features;

The text semantic matching results are calculated based on the feature information, aggregated interactive feature information, and differentiated interactive feature information.

As a further improvement of the present invention, the step of calculating aggregated interaction feature information between interaction features includes:

By weighted summation of attention, the aggregated interactive feature information between interactive features is calculated;

The step of calculating the differentiated interactive feature information between the interactive features includes:

Enhanced by the attention mechanism, the differentiated interactive feature information between interactive features is calculated.

As a further improvement of the present invention, the step of marking the markable text in the total text data includes:

The annotable text is subjected to pre-annotation, formal annotation and annotation quality inspection in sequence. When the value of the text after formal annotation is judged by the annotation quality inspection to be lower than the preset threshold, the text is returned to the pre-annotation process for re-annotation.

Annotate the annotable texts in the total text data, and store the annotated texts as training data and test data respectively;

The step of transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction includes:

For the annotated text data that fails to match, the deep fusion network model is trained with the training data, and the result is predicted by the deep fusion network model using the test data.

Performing data cleaning on the text that cannot be labeled in the total text data;

For the unlabeled text data that failed to match, an unsupervised learning algorithm is used to train the deep fusion network model with the unlabeled text.

As a further improvement of the present invention, the total text data is obtained by transcribing all multimodal data and/or multi-source heterogeneous data into text data and summarizing them.

Collecting multimodal data and/or multi-source heterogeneous data and preprocessing them, wherein the multimodal data includes text, audio and video data, and the preprocessing includes cleaning, format conversion and storage of the multimodal data;

Transcribing video data into text data;

transcribe audio data into text data;

Get historical text data;

The text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data are aggregated into the total text data.

As a further improvement of the present invention, the step of transcribing the video data into text data comprises:

Separating the audio and image in the video data to obtain audio data and image data;

Recognize text information in the image data and transcribe it into text data;

Recognize image data based on spatiotemporal and long-distance dependent features and transcribe them into text data;

The step of recognizing an image based on spatiotemporal and long-distance dependent features and transcribing the image into text data includes:

The image recognition based on spatiotemporal and long-distance dependent features is generated into a student model through a fusion model of knowledge distillation and diffusion model, and the image data is transcribed into text data through the student model.

Acquire key frame images in the video data through a diffusion network model;

The key frame image is recognized to generate text data.

As a further improvement of the present invention, the step of transcribing the audio data into text data comprises:

Combining the spatiotemporal characteristics of speech and contextual relationship features in the scenario where the data is acquired, a deep recurrent convolutional network model based on the fusion neural network MMCNN-RNN, CTC, and Attention is established to transcribe audio data into text data.

As a further improvement of the present invention, the historical text data includes historical record data and historical interaction data, wherein the historical record data includes the user's ingredient preference data, interest data and comment data, and the historical interaction data includes interaction records obtained from the client or the interaction end of the refrigeration equipment.

As a further improvement of the present invention, the text vectorization model encodes text at the character, word, phrase, and sentence levels;

The multi-dimensional feature extraction model uses a multi-head attention mechanism to extract the interaction and association features of characters, words, phrases, and sentences, as well as contextual semantic information.

As a further improvement of the present invention, the step of calculating the text semantic matching result based on the feature extraction result includes: the vector after feature extraction is sequentially passed through a fully connected layer and a self-attention mechanism to calculate the text semantic matching result.

The matching results of successful matches and the matching results of failed matches after recalculation are all reached.

To achieve one of the above-mentioned purposes of the invention, an embodiment of the present invention provides a refrigeration equipment system, comprising:

A storage module storing a computer program;

The processing module can implement the steps in the above-mentioned text semantic matching method when executing the computer program.

Compared with the prior art, the present invention has the following beneficial effects: the matching of the text semantic matching method utilizes the mechanism of first marking and then slot extraction, so that some text data are easier to slot extract after being marked, and the unmarked ones are generally phrases, which are easy to slot extract, so when the slot is extracted, three situations are included: the unmarked ones are successfully matched after slot extraction; the marked ones are successfully matched after slot extraction; the marked ones fail to match after slot extraction, and the category of failed matching is matched again through a deep fusion network model and calculation; the overall matching speed of the text is greatly improved, and the matching accuracy is high, which improves the user experience.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a schematic structural diagram of a refrigeration equipment system according to an embodiment of the present invention;

2 is a flow chart of a schematic diagram of the structure of a text semantic matching method according to an embodiment of the present invention;

3 is a partial flow chart of a structural schematic diagram of a text semantic matching method according to an embodiment of the present invention;

FIG4 is a flow chart of slot extraction according to an embodiment of the present invention;

FIG5 is a data flow diagram of a text semantic matching method according to an embodiment of the present invention;

FIG6 is a schematic diagram of a module according to an embodiment of the present invention;

FIG7 is a block diagram of a refrigeration system according to an embodiment of the present invention;

Among them, 100, refrigeration equipment; 10, interactive screen; 20, camera; 30, microphone; 40, speaker; 50, processing module; 60, storage module; 70, communication bus; 200, client.

DETAILED DESCRIPTION

The present invention will be described in detail below in conjunction with the specific embodiments shown in the accompanying drawings. However, these embodiments do not limit the present invention, and any structural, methodological, or functional changes made by a person skilled in the art based on these embodiments are all within the scope of protection of the present invention.

An embodiment of the present invention provides a text semantic matching method and a refrigeration equipment system with fast interactive response speed and accurate feedback information.

The refrigeration equipment system may include a refrigeration equipment 100 and a client 200 corresponding to the refrigeration equipment 100. The refrigeration equipment 100 may be a refrigerator, and the client 200 may be a mobile phone or an app on the mobile phone. The refrigeration equipment system is shown in FIG1 , and the refrigeration equipment 100 and the client 200 may be connected via a wireless signal. The refrigeration equipment 100 is described below using a refrigerator as an example.

Continuing with FIG1 , the refrigeration device 100 may be a refrigerator with the functions of audio acquisition, video acquisition, and interaction with a user interface. The refrigerator is provided with a microphone 30 for collecting audio, a camera 20 for shooting video, a speaker 40 for interacting with user voice, and an interactive screen 10 for interacting with the user's text or graphical interface. The interactive screen 10 may be provided on the door of the refrigerator. After the user opens the door of the refrigerator, the camera 20 records the user's operation to form video data. The speaker 40 and the microphone 30 are combined to interact with the user in the form of question and answer audio.

Taking a mobile phone as an example, the client 200 can communicate with the refrigerator via text or voice through the mobile phone, or manage the food information in the refrigerator and control the operating status of the refrigerator through the mobile phone.

In addition, the refrigeration equipment system may also include other external devices, such as external temperature sensors, cameras 20 or microphones 30 and speakers 40 provided by other devices, smart speakers, etc. These devices can be connected to the refrigeration equipment 100 or the client 200 via wireless signals.

The data of each device in the refrigeration system forms multi-source heterogeneous data. The multi-source heterogeneous data collected by multiple devices can be transmitted via wired, WiFi, Bluetooth, etc. Various types of data such as text, audio, video, etc. constitute multimodal data. These data can be real-time online or offline data, or they can be stored historical data.

The refrigeration equipment scenario includes the interaction between the user and the refrigeration equipment 100, and the interaction between the user and the corresponding client 200 of the refrigeration equipment 100, such as the interaction of ingredients, the interaction of instructions, the recording of videos when the user operates the refrigeration equipment 100, the user's control of the temperature and humidity inside the refrigeration equipment 100, and the user's comments on the ingredients on the client 200, the user's preferences, etc. The data generated by directly operating the refrigeration equipment 100 and the data generated by the client 200 related to the refrigeration equipment 100 are all data in the refrigeration equipment scenario.

This embodiment utilizes the multimodal real-time, offline data generated in the refrigeration equipment 100 usage scenario and accumulates massive text historical data to fully mine the semantic, grammatical and contextual information of natural language understanding such as the data itself and between data, so as to make the text semantic matching results in the refrigeration equipment scenario more accurate.

The core idea of semantic matching is to convert text into semantic vector representation and calculate other vectors that are close in distance or similarity to these vectors to determine the relevance between texts. Semantic matching can help us better understand and process natural language texts. It can be applied to various scenarios, such as text classification, knowledge graph construction, intelligent customer service, etc.

Generally, semantic matching requires comparing the similarities between a large number of texts, which results in slow calculation speed and long matching time. However, the present invention greatly speeds up the matching speed by using the method of pre-marking and slot extraction. When extracting slots, there are three situations: (1) unmarked documents are successfully matched after slot extraction; (2) marked documents are successfully matched after slot extraction; (3) marked documents fail to match after slot extraction, and the category that fails to match is matched again through deep fusion network model and calculation. In other words, the matching speed of two categories of documents is improved, and the remaining category that fails to match is also greatly improved by neural network calculation compared to the existing semantic matching speed, and the matching accuracy of slot extraction and neural network is also very high. Therefore, the present invention greatly speeds up the semantic matching speed and improves the accuracy. The following is a further description of the matching method.

In conjunction with FIG. 2 and FIG. 5, a text semantic matching method provided by an embodiment of the present invention is described below. Although the present application provides the method operation steps shown in the following implementation or flowchart, based on routine or no creative labor, the execution order of the steps in the method where there is no necessary causal relationship in logic is not limited to the execution order provided in the implementation of the present application. For example, the acquisition order of steps S20, S30 and S40 below can be adjusted arbitrarily or performed simultaneously, without distinguishing the order in time sequence.

Step S10: Collect and pre-process multimodal and/or multi-source heterogeneous data, wherein the multimodal data includes text, audio and video data related to the refrigeration equipment scenario.

The preprocessing includes cleaning, format conversion and storage of the multimodal data. The format conversion includes parsing the data format.

The text is collected through the client 200 and/or the interactive screen 10 on the refrigerator. The client 200 may include devices such as mobile phones, pads, and PCs. In addition to the text data generated on the application, it also includes text from customer service, the web, and text on mini-programs or public accounts. The preprocessing of the text data may include stop words, deduplication, etc.

Audio data can be collected by microphone 30 on the mobile phone and/or refrigerator, or can be wirelessly connected to other sound collection units via WiFi to obtain voice from sound collection units on other devices. Microphone 30 can be a single microphone 30 or a microphone 30 array.

The video data can be collected by the client 200 and/or the camera 20 on the refrigerator, for example, through a mobile phone app, camera 20, Bluetooth, etc., and the voice and video can also be separated by a script to obtain valid voice and video data.

Through the above-mentioned methods, multimodal data collection tasks such as multiple channels and terminals are completed to ensure data integrity and multimodal cognitive characteristics.

Most existing technologies only use text as a single data, which leads to the neglect of other data in the later data learning, and further the low recognition accuracy when the user interacts with the refrigerator in other forms. The text source of the present invention includes data of various modes generated by users, which is further trained and matched with text semantics, with higher accuracy.

Step S20: transcribe the video data into text data.

This step includes the following two embodiments. In one embodiment, it includes the following steps:

Recognize text information in the image data and transcribe it into text data;

Recognize image data based on spatiotemporal and long-distance dependent features and transcribe them into text data.

Recognition of text in image data can be directly transcribed into text data, while recognition of non-text in images can be based on multiple images of changes in the user's mouth movements to recognize what the user said. In addition, the present invention takes into account that if only the image features of the speaker are used, the sentences that may be recognized may be more complex, so the present invention combines sentence length factors and context relevance for recognition. The sentence length factor includes the features of different sentence lengths and different word compositions, and uses recognition based on spatiotemporal and long-distance dependency features to mine the rich semantic feature information of sentence sequences.

In addition, in order to speed up the response of the model, the image recognition based on spatiotemporal and long-distance dependent features is generated through the fusion model of knowledge distillation and diffusion model, and the knowledge of the original large model is transferred to the student network. The image data is transcribed into text data through the student model. The student model is a model with a discrete time step and a short number of steps. The student model can be distilled to half the number of steps of the teacher model.

In another embodiment, the key frame image in the video data may be obtained by a diffusion model;

The key frame image is recognized to generate text data.

Here, the meaning of the image is recognized and the content of the image is described in the form of text, thereby completing the transcription of image data into text data.

Step S30: combining the speech temporal and spatial characteristics and contextual relationship features of the refrigeration device 100, the audio data is transcribed into text data.

The audio data in this step may be directly collected audio data, or may include audio data segmented from video data.

This embodiment combines the spatiotemporal characteristics of voice data in the usage scenarios of refrigeration equipment 100 such as refrigerators, and establishes a deep recurrent convolutional network model based on the fusion neural network MMCNN-RNN, CTC, and Attention through an end-to-end learning method to transcribe audio data into text data, thereby obtaining rich high-level speech feature information and improving the accuracy of the model's speech-to-text transcription.

Step S40: Acquire historical text data, wherein the historical text data includes historical record data and historical interaction data.

Compared with historical text data, the multimodal data collected in the above step S10 is real-time data, and what is obtained in step S40 is historical data. By obtaining historical text data, on the one hand, the information of the historical text data itself can be utilized, and on the other hand, a complementary and related relationship can be formed with the real-time data to fully obtain the semantic information of the text data.

The historical text data may be a lot of unlabeled texts accumulated on the refrigeration equipment 100 or the client 200. After obtaining the historical text data, the collected data may be cleaned, formatted, and processed in a unified manner, so that the text format of the historical text data and the real-time collected data transcription is unified, thereby ensuring the comprehensiveness and specificity of the data characteristics.

Furthermore, the historical record data includes the user's food preference data, interest data and comment data; the historical interaction data includes the interaction records obtained from the client 200 or the interaction end of the refrigeration device 100. The historical text data also includes the data collected by the refrigeration device 100 and the data on the client 200 corresponding to the refrigeration device 100.

Step S50: pre-processing the total text data.

The text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data are aggregated into total text data.

The summary of total text data reflects that this embodiment uses multi-source heterogeneous data, real-time, offline voice, video, image, text and user historical comments, food preferences, food interests and other text data.

The total text data can include two types of data. The first type is data that can be pre-annotated, and the second type is data that cannot or does not need to be annotated. Data that cannot or does not need to be annotated are generally short phrases, which are also easy to extract slots in subsequent steps. Pre-annotation is generally required for longer sentences. Annotating this type of sentence can facilitate subsequent classification and recognition. Annotated data is more conducive to matching results in slot extraction in subsequent steps.

For annotated text data, the following steps can be used:

Step S51: annotate the annotatable texts in the total text data, and store the annotated texts as training data and test data respectively.

Specifically, as shown in FIG5 , the annotable text is pre-annotated, formally annotated, annotated quality checked, and stored in sequence, wherein when the value of the formally annotated text is judged to be lower than a preset threshold by the annotation quality check, the text is returned to the pre-annotation process for re-annotation. The annotated data may include training data and test data.

For text data that cannot be annotated or does not need to be annotated, the following steps can be used:

Step S52: cleaning the text that cannot be marked in the total text data.

In step S52, the text data that cannot be labeled are directly subjected to data cleaning, format conversion and other tasks, and then directly participate in the subsequent slot extraction without being labeled.

Step S60: performing result matching on both the unmarkable text and the marked text in the total text data through slot extraction, and judging the matching result, as shown in FIG3 .

If the match is successful, jump to the subsequent step S80;

If the matching fails, proceed to the subsequent step S70.

Furthermore, the process of slot extraction is shown in FIG4 , and the rule engine is used to match the results of both the unlabeled text and the labeled text in the total text data;

A rule engine is a software tool that can define and execute various rules in a system, such as business rules, process rules, data validation rules, etc. A rule engine may have various problems, such as inaccurate rule definitions, rule conflicts, inefficient rule execution, etc. When a rule engine detects a violation of a rule, it can generate a warning or error message.

Quick fix techniques can quickly identify and resolve many issues. Quick fix techniques usually involve methods such as automated testing, code analysis, and debugging tools to quickly identify issues and provide solutions.

Therefore, this embodiment realizes the rapid repair of rules by combining the rule engine and the rapid repair technology. When the rule engine finds a problem, the rapid repair technology can automatically analyze and repair the rule definition, reduce the time and cost of manual repair, improve the reliability and efficiency of the system, and achieve the effect of real-time update and repair of rules, avoiding the problem of offline no updated rules, and needing to spend a lot of time and energy to define and maintain the rule set.

When the quick repair module cannot solve the problem detected by the rule engine, or the rule engine still cannot match the result after the rule is repaired, the slot extraction match fails.

The following article performs semantic matching on the text that fails slot extraction.

Step S70: Calculate the semantic matching result through the neural network.

Step S70 is shown in FIG3 , and specifically includes step S71 , step S72 , and step S73 ;

Step S71: extract features from the total text data through a deep fusion network model, train the deep fusion network model with the training data, and predict results with the deep fusion network model using the test data.

In combination with the above step S51, the annotated data may include training data and test data. Step S71 may perform pre-training tasks on the constructed deep fusion model based on the training data, and then predict the results based on the test data to obtain rich semantic feature information, thereby obtaining the best and most effective model, ensuring the optimal prediction results and higher accuracy of user feedback information.

The text vectorization model of step S71 encodes text at the character, word, phrase, and sentence levels; the multi-dimensional feature extraction model uses a multi-head attention mechanism to extract character, word, phrase, sentence interaction and association features, and contextual semantic information.

Step S72: Transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction, and extracting the interaction features;

Aggregating interactive feature information can aggregate the interactive information between different features to obtain a richer feature representation. It can improve the prediction performance of the model, especially when there are complex interactive relationships between features. Feature information can be learned and aggregated through attention weighted summation to obtain a richer feature representation.

Differentiated interactive feature information is to construct a new feature representation by calculating the differences between different features. It can capture the important interactive relationships between different features, thereby improving the prediction ability of the model. The calculation method of differentiated interactive feature information can be to use the attention mechanism to enhance the calculation of the differences between interactive features. Differentiated interactive feature information can be used as the input of the model, and can be learned and aggregated in subsequent neural network models to obtain more accurate prediction results.

Step S73: Calculate the text semantic matching result according to the feature information, the aggregated interaction feature information, and the differentiated interaction feature information.

In step S73, the feature-extracted vector is sequentially passed through the fully connected layer and the self-attention mechanism to obtain the semantic matching result of the text. Here, the semantic matching result is first calculated through the fully connected layer, and then the self-attention mechanism is used to calculate the semantic relationship quantification result with stronger text interactivity.

In addition, step S73 can also be calculated using semantic matching similarity with disambiguation, which can be achieved based on the threshold value control of the distance between terms, so that some redundant information in the text can be removed.

Step S80: Result reached.

After the above steps are completed, the semantic matching results are used for the contact task. The result contact method can adopt a variety of built-in or external forms, such as outbound calls, SMS contact, email notification, large-screen display, voice broadcast, text output, smart speakers, pop-up UI, app, PAD, web and other result contact methods to meet the needs of result presentation and digital display.

Compared with the prior art, this embodiment has the following beneficial effects:

The matching of this text semantic matching method utilizes the mechanism of first marking and then slot extraction, so that some text data are easier to slot extract after being marked, and the unmarked ones are generally phrases, which are easy to slot extract. Therefore, when extracting the slots, there are three situations: the unmarked ones are successfully matched after slot extraction; the marked ones are successfully matched after slot extraction; the marked ones fail to match after slot extraction, and the category of failed matching is matched again through deep fusion network model and calculation; the overall matching speed of the text is greatly improved, and the matching accuracy is high, which improves the user experience.

In one embodiment, the present invention also proposes a refrigeration equipment system, which includes a storage module 60 and a processing module 50. When the processing module 50 executes the computer program, it can implement the steps in the above-mentioned text semantic matching method, that is, implement the steps in any one of the technical solutions in the above-mentioned text semantic matching method.

The refrigeration equipment system may also include the following multiple modules as shown in FIG6 , and the specific functions of each module are as follows:

A collection module, used for collecting and preprocessing multimodal data and/or multi-source heterogeneous data, wherein the multimodal data includes text, audio and video data related to the refrigeration equipment scene;

A video transcription module, used to transcribe video data into text data;

An audio transcription module, used to transcribe audio data into text data by combining the speech spatiotemporal characteristics and contextual relationship characteristics of the refrigeration device 100;

An acquisition module, used for acquiring historical text data, wherein the historical text data includes historical record data and historical interaction data;

An intelligent annotation module, for aggregating the text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data into total text data, and annotating the annotatable text in the total text data;

A slot extraction module, used for matching the results of the unmarkable text and the marked text in the total text data through slot extraction;

A feature extraction module, used for extracting features from the total text data through a deep fusion model after the matching fails, wherein the deep fusion model is a fusion model of a text vectorization model and a multi-dimensional feature extraction model;

An aggregation difference module is used to calculate the aggregated interaction feature information between the interaction features and calculate the differentiated interaction feature information between the interaction features;

Semantic matching module, used to calculate semantic matching results;

Reach module, used for result reach.

It should be noted that for details not disclosed in the refrigeration equipment system of the embodiment of the present invention, please refer to the details disclosed in the text semantic matching method of the embodiment of the present invention.

The refrigeration equipment system may also include refrigeration equipment 100, computing devices such as mobile phones, computers, notebooks, PDAs, and cloud servers, and include but are not limited to a processing module 50, a storage module 60, and a computer program stored in the storage module 60 and executable on the processing module 50, such as the above-mentioned text semantic matching method program. When the processing module 50 executes the computer program, the steps in the above-mentioned text semantic matching method embodiments are implemented, such as the steps shown in FIGS. 2 to 5.

The refrigeration equipment system may further include a signal transmission module and a communication bus 70. As shown in FIG7 , the signal transmission module is used to send data to the processing module 50 or the server, for example, data is transmitted between the refrigeration equipment 100 and the mobile phone, or between the refrigeration equipment 100 and the server through the signal transmission module, and the signal transmission module may transmit data in the form of a wireless connection, such as Bluetooth, wifi, ZigBee, etc. The communication bus 70 is used to establish a connection between the signal transmission module, the processing module 50 and the storage module 60, and the communication bus 70 may include a passage to transmit information between the above-mentioned signal transmission module, the processing module 50 and the storage module 60.

The processing module 50 and the storage module 60 may be a part integrated into the refrigeration device 100, or a part of a mobile phone, a local terminal device, or a part of a cloud server.

The processing module 50 can be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or any conventional processor. The processing module 50 is the control center of the refrigeration equipment system, and uses various interfaces and lines to connect various parts of the entire refrigeration equipment system.

The storage module 60 can be used to store the computer program and/or module. The processing module 50 realizes various functions of the refrigeration equipment system by running or executing the computer program and/or module stored in the storage module 60 and calling the data stored in the storage module 60. The storage module 60 can mainly include a storage program area and a storage data area, wherein the storage program area can store an operating system, at least one application required for a function, etc. In addition, the storage module 60 can include a high-speed random access memory, and can also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), at least one disk storage device, a flash memory device, or other volatile solid-state storage devices.

Exemplarily, the computer program may be divided into one or more modules/units, which are stored in the storage module 60 and executed by the processing module 50 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of implementing specific functions, and the instruction segments are used to describe the execution process of the computer program in the refrigeration equipment system.

Furthermore, an embodiment of the present invention provides a readable storage medium storing a computer program, which, when executed by the processing module 50, can implement the steps in the above-mentioned text semantic matching method, that is, implement the steps in any one of the technical solutions in the above-mentioned text semantic matching method.

If the module integrated in the text semantic matching method is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present invention implements all or part of the processes in the above-mentioned embodiment method, and can also be completed by instructing related hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processing module 50, the steps of each of the above-mentioned method embodiments can be implemented.

The computer program includes computer program code, which may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, recording medium, disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer readable media do not include electric carrier signals and telecommunication signals.

It should be understood that although this specification is described according to implementation modes, not every implementation mode contains only one independent technical solution. This description of the specification is only for the sake of clarity. Those skilled in the art should regard the specification as a whole. The technical solutions in each implementation mode may also be appropriately combined to form other implementation modes that can be understood by those skilled in the art.

The series of detailed descriptions listed above are only specific descriptions of feasible implementation methods of the present invention. They are not intended to limit the scope of protection of the present invention. Any equivalent implementation methods or changes that do not deviate from the technical spirit of the present invention should be included in the scope of protection of the present invention.

Claims

A text semantic matching method, characterized by comprising the following steps:

Annotate the annotable texts in the total text data;

Performing result matching on both the unmarkable text and the marked text in the total text data through slot extraction, and determining the matching result;

If the match is successful, the matching result is output;

If the matching fails, the text data corresponding to the failure is transmitted to the deep fusion network model for feature extraction, and then the matching result of the text semantics is calculated based on the feature extraction result; wherein, the deep fusion network model is a fusion model of a text vectorization model and a multi-dimensional feature extraction model, the text vectorization model vectorizes the text data, and the multi-dimensional feature extraction model extracts multi-dimensional interaction features and correlation features.
The text semantic matching method according to claim 1 is characterized in that the step of matching the results of both the unlabeled text and the labeled text in the total text data by slot extraction comprises:

Matching the results of the unmarkable text and the marked text in the total text data by using a rule engine;

When the rule engine detects a problem, the rule definition is automatically analyzed and repaired through the quick repair module, and the result matching is performed again through the rule engine.
The text semantic matching method according to claim 2, characterized in that it also includes the steps of:

When the quick repair module cannot solve the problem detected by the rule engine, or the rule engine still cannot match the results after the rule is repaired, the slot extraction match fails; wherein the problem includes inaccurate rule definition, rule conflict, or inefficient rule execution.
The text semantic matching method according to claim 1, characterized in that if the matching fails, the step further comprises:

The text data corresponding to the failure is transmitted to the deep fusion network model for feature extraction, and the interaction features are extracted;

Calculate aggregated interactive feature information between interactive features;

Calculate the differential interaction feature information between the interaction features;

Calculate the text semantic matching results based on feature information, aggregated interactive feature information, and differentiated interactive feature information;

The step of calculating aggregated interactive feature information between interactive features includes:

By weighted summation of attention, the aggregated interactive feature information between interactive features is calculated;

The step of calculating the differentiated interactive feature information between the interactive features includes:

Enhanced by the attention mechanism, the differentiated interactive feature information between interactive features is calculated.
The text semantic matching method according to claim 1, characterized in that the step of marking the markable text in the total text data comprises:

The annotable text is subjected to pre-annotation, formal annotation and annotation quality inspection in sequence. When the value of the text after formal annotation is judged by the annotation quality inspection to be lower than the preset threshold, the text is returned to the pre-annotation process for re-annotation.
The text semantic matching method according to claim 1 is characterized in that the step of marking the markable text in the total text data comprises:

Annotate the annotable texts in the total text data, and store the annotated texts as training data and test data respectively;

The step of transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction includes:

For the annotated text data that fails to match, the deep fusion network model is trained with the training data, and the result is predicted by the deep fusion network model using the test data.
The text semantic matching method according to claim 1, characterized in that it also includes the steps of:

Performing data cleaning on the text that cannot be labeled in the total text data;

The step of transmitting the text data corresponding to the failure to the deep fusion network model for feature extraction includes:

For the unlabeled text data that failed to match, an unsupervised learning algorithm is used to train the deep fusion network model with the unlabeled text.
The text semantic matching method according to claim 1 is characterized in that the total text data is obtained by transcribing all multimodal data and/or multi-source heterogeneous data into text data and summarizing them;

The text semantic matching method also includes the steps of:

Collecting multimodal data and/or multi-source heterogeneous data and preprocessing them, wherein the multimodal data includes text, audio and video data, and the preprocessing includes cleaning, format conversion and storage of the multimodal data;

Transcribing video data into text data;

transcribe audio data into text data;

Get historical text data;

The text data in the multimodal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data are aggregated into the total text data.
The text semantic matching method according to claim 8, wherein the step of transcribing the video data into text data comprises:

Separating the audio and image in the video data to obtain audio data and image data;

Recognize text information in the image data and transcribe it into text data;

Recognize image data based on spatiotemporal and long-distance dependent features and transcribe them into text data;

The step of recognizing an image based on spatiotemporal and long-distance dependent features and transcribing the image into text data includes:

The image recognition based on spatiotemporal and long-distance dependent features is generated into a student model through a fusion model of knowledge distillation and diffusion model, and the image data is transcribed into text data through the student model.
The text semantic matching method according to claim 8, wherein the step of transcribing the video data into text data comprises:

Acquire key frame images in the video data through a diffusion network model;

The key frame image is recognized to generate text data.
The text semantic matching method according to claim 8, characterized in that the step of transcribing the audio data into text data comprises:

Combining the spatiotemporal characteristics of speech and contextual relationship features in the scenario where the data is acquired, a deep recurrent convolutional network model based on the fusion neural network MMCNN-RNN, CTC, and Attention is established to transcribe audio data into text data.
The text semantic matching method according to claim 8 is characterized in that the historical text data includes historical record data and historical interaction data, wherein the historical record data includes the user's food preference data, interest data and comment data, and the historical interaction data includes interaction records obtained from the client or the interaction end of the refrigeration equipment.
The text semantic matching method according to claim 1, characterized in that the text vectorization model encodes text at the character, word, phrase, and sentence levels;

The multi-dimensional feature extraction model uses a multi-head attention mechanism to extract the interaction and association features of characters, words, phrases, and sentences, as well as contextual semantic information.
The text semantic matching method according to claim 1 is characterized in that the step of calculating the text semantic matching result based on the feature extraction result comprises: sequentially passing the feature extracted vector through a fully connected layer and a self-attention mechanism to calculate the text semantic matching result;

The text semantic matching method also includes the steps of:

The matching results of successful matches and the matching results of failed matches after recalculation are all reached.
A refrigeration equipment system, characterized in that it comprises:

A storage module storing a computer program;

A processing module, which can implement the steps of the text semantic matching method described in any one of claims 1 to 14 when executing the computer program.