CN116521821A - Text semantic matching method and refrigeration equipment system - Google Patents
Text semantic matching method and refrigeration equipment system Download PDFInfo
- Publication number
- CN116521821A CN116521821A CN202310247263.1A CN202310247263A CN116521821A CN 116521821 A CN116521821 A CN 116521821A CN 202310247263 A CN202310247263 A CN 202310247263A CN 116521821 A CN116521821 A CN 116521821A
- Authority
- CN
- China
- Prior art keywords
- text
- data
- matching
- interaction
- text data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000005057 refrigeration Methods 0.000 title claims abstract description 56
- 238000000605 extraction Methods 0.000 claims abstract description 50
- 230000004927 fusion Effects 0.000 claims abstract description 36
- 238000002372 labelling Methods 0.000 claims abstract description 17
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 238000013528 artificial neural network Methods 0.000 claims abstract description 8
- 230000003993 interaction Effects 0.000 claims description 71
- 238000012545 processing Methods 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 20
- 230000008439 repair process Effects 0.000 claims description 16
- 238000003860 storage Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 10
- 230000001419 dependent effect Effects 0.000 claims description 8
- 230000002452 interceptive effect Effects 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000004140 cleaning Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000009792 diffusion process Methods 0.000 claims description 6
- 238000007689 inspection Methods 0.000 claims description 6
- 238000012546 transfer Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- 235000020803 food preference Nutrition 0.000 claims description 4
- 238000013140 knowledge distillation Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 5
- 230000006872 improvement Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 6
- 230000008054 signal transmission Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 230000002776 aggregation Effects 0.000 description 3
- 238000004220 aggregation Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a text semantic matching method and a refrigeration equipment system, wherein the method comprises the following steps: labeling the text which can be labeled in the total text data; performing result matching on the text which cannot be marked in the total text data and the marked text through slot extraction; if the matching is successful, outputting a matching result; if the matching fails, calculating a matching result through a neural network. The matching of the text semantic matching method utilizes a mechanism of labeling first and then slot extraction, and the slot extraction comprises three conditions: the matching is successful after the unlabeled slot position is extracted; the marked channel bit is extracted and then successfully matched; the marked matching fails after the slot extraction, and the matching is performed on the class of the matching failure through the depth fusion network model and calculation; therefore, the overall matching speed of the text is greatly improved, the matching accuracy is high, and the use experience of a user is improved.
Description
Technical Field
The invention relates to the technical field of refrigeration equipment, in particular to a text semantic matching method and a refrigeration equipment system.
Background
With the progress of artificial intelligence, people hope to introduce artificial intelligence into the field of refrigerators, so that the refrigerators are more intelligent. In the process of refrigerator intellectualization, a great number of optimization processes for refrigerator scenes are involved, such optimization includes optimization of various types of interactions of users with spoken, text and video of the refrigerator, and in the process of optimization, the inventor finds that the prior art has the following problems:
the existing interaction has slow response speed and insufficient accuracy, and can not meet the requirement of instant and clear communication of users, so that the users obviously feel the inconvenience of man-machine conversation instead of the nature as in the case of man-machine conversation, and the use experience is poor.
Disclosure of Invention
To solve at least the above-mentioned problems of the prior artFirstly, the invention aims to provide a text semantic matching method with high interactive response speed and accurate feedback information and a refrigeration equipment system 。
In order to achieve the above object, an embodiment of the present invention provides a text semantic matching method, including the following steps:
labeling the text which can be labeled in the total text data;
performing result matching on the text which cannot be marked in the total text data and the marked text through slot extraction, and judging the matched result;
if the matching is successful, outputting a matching result;
if the matching is failed, transmitting text data corresponding to the failure to a depth fusion network model for feature extraction, and calculating a matching result of text semantics according to the feature extraction result; the depth fusion network model is a fusion model of a text vectorization model and a multi-dimensional feature extraction model, wherein the text vectorization model vectorizes text data, and the multi-dimensional feature extraction model extracts multi-dimensional interaction features and associated features.
As a further improvement of the present invention, the step of performing result matching on both the unlabeled text and the labeled text in the total text data by slot extraction includes:
performing result matching on the text which cannot be marked in the total text data and the marked text through a rule engine;
when the rule engine detects a problem, the rule definition is automatically analyzed and repaired through the quick repair module, and the result matching is performed again through the rule engine.
As a further improvement of the present invention, the method further comprises the steps of:
when the quick repair module cannot solve the problem detected by the rule engine or the rule engine still cannot perform result matching after rule repair, the slot extraction matching fails; among these problems are inaccurate rule definition, rule conflicts, or inefficient rule execution.
As a further improvement of the present invention, the step if the matching fails further includes:
transmitting the text data corresponding to the failure to a depth fusion network model for feature extraction, and extracting interactive features;
calculating aggregate interaction feature information between interaction features;
calculating differentiated interaction characteristic information among the interaction characteristics;
and calculating a matching result of the text semantics according to the feature information, the aggregated interaction feature information and the differentiated interaction feature information.
As a further improvement of the present invention, the step of calculating aggregated interaction characteristic information between interaction characteristics includes:
calculating aggregate interaction feature information between interaction features by attention weighted summation;
the step of calculating differentiated interaction characteristic information between interaction characteristics comprises the following steps:
and (5) calculating differentiated interaction characteristic information among the interaction characteristics through attention mechanism enhancement.
As a further improvement of the present invention, the labeling of the text that can be labeled in the total text data includes:
and sequentially performing pre-marking, formal marking and marking quality inspection on the marked text, wherein when the marked value of the text is judged to be lower than a preset threshold value through marking quality inspection, the text is returned to the pre-marking for re-marking.
As a further improvement of the present invention, the labeling the text that can be labeled in the total text data includes:
labeling the text which can be labeled in the total text data, and respectively storing the labeled text as training data and test data;
the step of transmitting the text data corresponding to failure to the deep fusion network model for feature extraction comprises the following steps:
and training the depth fusion network model by using the training data for the marked text data with failed matching, and predicting a result by using the test data through the depth fusion network model.
As a further improvement of the present invention, the method further comprises the steps of:
performing data cleaning on text which cannot be marked in the total text data;
the step of transmitting the text data corresponding to failure to the deep fusion network model for feature extraction comprises the following steps:
and training the depth fusion network model by using the non-labeling text by adopting an unsupervised learning algorithm for the non-labeling text data with failed matching.
As a further improvement of the invention, the total text data is formed by all transferring multi-modal data and/or multi-source heterogeneous data into text data and summarizing.
As a further improvement of the present invention, the method further comprises the steps of:
collecting and preprocessing multi-modal data and/or multi-source heterogeneous data, wherein the multi-modal data comprises text, audio and video data, and the preprocessing comprises cleaning, format conversion and storage of the multi-modal data;
converting the video data into text data;
transcribing the audio data into text data;
acquiring historical text data;
summarizing the text data in the multi-modal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data into the total text data.
As a further improvement of the present invention, the step of converting video data into text data includes:
separating audio and images in the video data to obtain audio data and image data;
identifying text information in the image data, and converting the text information into text data;
identifying image data based on space-time and long-distance dependent features, and transferring the image data into text data;
wherein the step of recognizing the image based on the space-time and long-distance dependent features, the converting into text data includes:
generating a student model by using a fusion model of knowledge distillation and diffusion models based on space-time and long-distance dependent characteristic recognition images, and transferring image data into text data by using the student model.
As a further improvement of the present invention, the step of converting video data into text data includes:
acquiring a key frame image in the video data through a diffusion network model;
and identifying the key frame image to generate text data.
As a further improvement of the present invention, the step of converting the audio data into text data includes:
and establishing a deep circular convolution network model based on fusion neural networks MMCNN-RNN, CTC and Attention to transfer the audio data into text data by combining the voice space-time characteristics and the context relation characteristics under the scene of acquiring the data.
As a further improvement of the present invention, the history text data includes history record data including food preference data, interest data, and comment data of the user, and history interaction data including interaction records acquired from the client or the interaction side of the refrigeration apparatus.
As a further improvement of the invention, the text vectorization model encodes words, phrases, sentence-level text;
the multidimensional feature extraction model uses a multi-head attention mechanism to extract characters, words, sentence interactions, associated features and context semantic information respectively.
As a further improvement of the present invention, the step of calculating the matching result of the text semantics based on the feature extraction result further includes: and calculating the vector after feature extraction through a full connection layer and a self-attention mechanism in sequence to obtain a text semantic matching result.
As a further improvement of the present invention, the method further comprises the steps of:
and touching the matching result of successful matching and the matching result of recalculated matching failure.
To achieve one of the above objects, an embodiment of the present invention provides a refrigeration apparatus system, including:
a storage module storing a computer program;
and the processing module can realize the steps in the text semantic matching method when executing the computer program.
Compared with the prior art, the invention has the following beneficial effects: the text semantic matching method utilizes a label annotation and slot extraction mechanism, so that some text data are more easy to extract after being labeled, and unlabeled common phrases are easy to extract, so that when the slots are extracted, three conditions are included: the matching is successful after the unlabeled slot position is extracted; the marked channel bit is extracted and then successfully matched; the marked matching fails after the slot extraction, and the matching is performed on the class of the matching failure through the depth fusion network model and calculation; the overall matching speed of the text is greatly improved, the matching accuracy is high, and the use experience of a user is improved.
Drawings
FIG. 1 is a schematic diagram of a refrigeration appliance system according to an embodiment of the present invention;
FIG. 2 is a flow chart of a structural diagram of a text semantic matching method according to an embodiment of the present invention;
FIG. 3 is a partial flow diagram of a structural schematic of a text semantic matching method according to an embodiment of the present invention;
FIG. 4 is a flow chart of slot extraction according to an embodiment of the present invention;
FIG. 5 is a data flow diagram of a text semantic matching method according to one embodiment of the present invention;
FIG. 6 is a schematic block diagram of an embodiment of the present invention;
FIG. 7 is a block diagram of a refrigeration appliance system according to an embodiment of the present invention;
100 parts of refrigeration equipment; 10. an interactive screen; 20. a camera; 30. a microphone; 40. a speaker; 50. a processing module; 60. a storage module; 70. a communication bus; 200. and a client.
Detailed Description
The present invention will be described in detail below with reference to specific embodiments shown in the drawings. These embodiments are not intended to limit the invention and structural, methodological, or functional modifications of these embodiments that may be made by one of ordinary skill in the art are included within the scope of the invention.
The embodiment of the invention provides a text semantic matching method with high interactive response speed and accurate feedback information and a refrigeration equipment system.
The refrigeration equipment system may include a refrigeration equipment 100, and a client 200 corresponding to the refrigeration equipment 100, where the refrigeration equipment 100 may be a refrigerator, and the client 200 may be a mobile phone or an app on the mobile phone. Refrigeration appliance system referring to fig. 1, a wireless signal connection may be provided between a refrigeration appliance 100 and a client 200. The refrigerating apparatus 100 is described below as a refrigerator.
As further shown in fig. 1, the refrigeration device 100 may be a refrigerator with audio collection, video collection and interaction with a user interface, a microphone 30 capable of collecting audio, a camera 20 capable of capturing video, a speaker 40 capable of interacting with user voice, and an interaction screen 10 capable of interacting with user text or graphic interface are arranged on the refrigerator, and the interaction screen 10 may be arranged on a door of the refrigerator; after a user opens the refrigerator door, the camera 20 records the operation of the user to form video data; the speaker 40 and microphone 30 combine to interact with the user in question and answer form.
The client 200 takes a mobile phone as an example, and a user can communicate text or voice with the refrigerator through the mobile phone end, or manage food material information in the refrigerator through the mobile phone, and control the running state of the refrigerator.
In addition, the refrigeration appliance system may further include other external devices, such as an external temperature sensor, a camera 20 or a microphone 30 and a speaker 40 of other devices, a smart box, etc., which may be connected to the refrigeration appliance 100 or the client 200 through wireless signals.
The data of each device in the refrigeration device system form multi-source heterogeneous data, and the multi-source heterogeneous data collected by the multi-device can be transmitted in the forms of wires, wifi, bluetooth and the like. Various types of data, such as text, audio, video, etc., constitute multimodal data, which may be real-time online or offline data, or may be stored historical data.
The refrigeration equipment scene comprises interaction of a user with the refrigeration equipment 100, interaction of the user with a corresponding client 200 of the refrigeration equipment 100, such as interaction of food materials, interaction of instructions, recording of videos when the user operates the refrigeration equipment 100, control of temperature and humidity inside the refrigeration equipment 100 by the user, comment of the food materials on the client 200 by the user, preference of the user and the like, and data generated by directly operating the refrigeration equipment 100 and data generated by the client 200 related to the refrigeration equipment 100 are data under the refrigeration equipment scene.
The embodiment fully excavates the semantic, grammar and context information of natural language understanding such as data and data by utilizing multi-mode real-time and offline data generated under the scene and accumulating massive text historical data of the refrigeration equipment 100, so that the text semantic matching result under the scene of the refrigeration equipment is more accurate.
The core idea of semantic matching is to transform text into semantic vector representations and calculate other vectors that are very close to or similar to these vectors to determine the relevance between the text. Semantic matching may help us better understand and process natural language text. The method can be applied to various scenes, such as text classification, knowledge graph construction, intelligent customer service and the like.
In general, semantic matching requires a relatively large amount of similarity between texts, and is slow in computation speed, and long in matching. The invention greatly accelerates the matching speed by the method of pre-marking and slot extraction, and comprises three conditions when the slot is extracted: (1) matching is successful after the unlabeled slot position is extracted; (2) the marked slot position is extracted and then successfully matched; (3) The marked matching fails after the slot extraction, and the matching is performed on the class of the matching failure through the deep fusion network model and calculation. That is, the matching speed of two types of documents is improved, the remaining types of documents which are not successfully matched are also greatly improved relative to the existing semantic matching speed in a neural network computing mode, and the matching accuracy of the slot extraction and the neural network is also very high, so that the semantic matching speed is greatly improved, the accuracy is improved, and the matching method is further described below.
In the following, referring to fig. 2 and fig. 5, a text semantic matching method according to an embodiment of the present invention is described, and although the present application provides the method operation steps shown in the following embodiment or flowchart, the method is based on conventional or non-creative labor, and the execution sequence of these steps is not limited to the execution sequence provided in the embodiment of the present application, where there is no logically necessary causal relationship in the method. The acquisition order of steps S20, S30 and S40 below may be arbitrarily adjusted or performed simultaneously, without distinguishing chronological order.
Step S10: multimodal and/or multi-source heterogeneous data is collected and preprocessed, wherein the multimodal data includes text, audio and video data related to a refrigeration appliance scene.
Wherein the preprocessing includes cleansing, format conversion and storage of the multimodal data. Format conversion involves parsing the data format.
The text is collected through the interactive screen 10 at the client 200 and/or the refrigerator, the client 200 may include a mobile phone, a pad, a PC, etc., and besides text data generated on an application program, text on a customer service, a web, and text on an applet or a public number, and preprocessing of the text data may include stop words, duplicate removal, etc.
The audio data can be collected by the microphone 30 at the mobile phone end or the refrigerator end, or can be connected to other sound collecting units through WiFi or the like in a wireless manner to obtain the voices of the sound collecting units on other devices, and the microphone 30 can be a single microphone 30 or a microphone 30 array.
The video data can be collected by the client 200 and/or the camera 20 at the refrigerator end, for example, by the app of a mobile phone, the camera 20, bluetooth, etc., or alternatively, the script can separate voice from video, and then obtain effective voice and video data.
The multi-mode data acquisition tasks such as various channels and terminals are completed in the modes, and the data integrity and the multi-mode cognitive characteristics are ensured.
In the prior art, only a single data of text is adopted, so that other data are ignored in later data learning, and further, the recognition accuracy is low when a user interacts with the refrigerator in other forms. The text source comprises data of various modes generated by a user, and the accuracy is higher when the data is further trained and matched with text semantics.
Step S20: the video data is transcribed into text data.
This step includes the following two embodiments, in one embodiment of which it includes the steps of:
separating audio and images in the video data to obtain audio data and image data;
identifying text information in the image data, and converting the text information into text data;
image data is recognized based on space-time and long-distance dependent features and is transcribed into text data.
The text in the recognition image data can be directly transferred into text data, and the recognition of the non-text in the image can be that a plurality of images according to the change of the action of the mouth of the user recognize what the user speaks. In addition, the invention considers that if the sentences possibly recognized are more complex only according to the image characteristics of the speaker, the invention combines the sentence length factors and the context relevance to recognize, the sentence length factors comprise the characteristics of different sentence lengths and different word constitution, and adopts the characteristics based on space-time and long-distance dependence to recognize, so as to mine the semantic characteristic information with rich sentence sequences.
In addition, in order to accelerate the response speed of the model, a student model is generated by a fusion model of knowledge distillation and diffusion models based on space-time and long-distance dependent feature recognition images, knowledge of the original large model is migrated into a student network, and image data is transcribed into text data through the student model. The student model is a model with discrete time steps and short steps, and can be distilled to be half of the steps of the teacher model.
In another embodiment, key frame images in the video data may be acquired by a diffusion model;
and identifying the key frame image to generate text data.
Here, the contents of the image are described in the form of text by recognizing the meaning of the image, and the transfer of the image data to the text data is completed.
Step S30: in combination with the speech spatiotemporal characteristics and contextual characteristics of the refrigeration unit 100, the audio data is transcribed into text data.
The audio data of this step may be, on the one hand, directly collected audio data, and may also include audio data divided from video data.
The embodiment combines the space-time characteristics of voice data of a use scene of the refrigeration equipment 100 such as a refrigerator, establishes a deep cyclic convolution network model based on fusion neural networks MMCNN-RNN, CTC and Attention to transfer audio data into text data through an end-to-end learning method, obtains rich high-level voice characteristic information, and improves the precision of model voice transfer text.
Step S40: and acquiring historical text data, wherein the historical text data comprises historical record data and historical interaction data.
The multi-mode data collected in the step S10 are real-time data, the step S40 is to obtain historical data, and by obtaining the historical text data, on one hand, the information of the historical text data can be utilized, on the other hand, complementary and associated relation can be formed with the real-time data, so that the semantic information of the text data can be fully obtained.
The historical text data can be a plurality of unlabeled texts accumulated on the refrigeration equipment 100 or the client 200, and after the historical text data is obtained, the collected data can be subjected to unified processing such as cleaning and format conversion, so that the historical text data is unified with text formats of data transfer collected in real time, and the characteristics of comprehensiveness, specificity and the like of the data characteristics are ensured.
Further, the history data comprises food preference data, interest data and comment data of the user; the historical interaction data includes interaction records obtained from the client 200 or the interactive side of the refrigeration appliance 100. The historical text data also comprises data collected by the refrigeration equipment 100 and data on the client 200 corresponding to the refrigeration equipment 100.
Step S50: the total text data is preprocessed.
And summarizing the text data in the multi-modal data, the text data transcribed from the audio data, the text data transcribed from the video data and the historical text data into total text data.
The summary of the total text data shows that the embodiment adopts multi-source heterogeneous data, and text data such as real-time and offline voice, video, images, text, historical comments of users, food preference, food interest and the like.
The total text data can comprise two types of data, wherein the first type is data which can be marked in advance, the second type is data which cannot or does not need to be marked, and the phrase is a general phrase which cannot or does not need to be marked, and the phrase is easy to be extracted in a subsequent step. The sentences with longer generally are required to be marked in advance, and the sentences of the type are marked, so that the subsequent classification and recognition can be facilitated, and the marked data is more favorable for matching the result in the slot extraction of the subsequent step.
For the text data that can be annotated, the following steps can be taken:
step S51: labeling the text which can be labeled in the total text data, and respectively storing the labeled text as training data and test data.
Specifically, as shown in fig. 5, the annotatable text may be sequentially subjected to pre-annotation, formal annotation, annotation quality inspection and data storage, where when the value of the text subjected to formal annotation is judged to be lower than a preset threshold value through the annotation quality inspection, the text is returned to the pre-annotation for re-annotation. The annotated data may include training data and test data.
For text data that is not or does not need to be annotated, the following steps may be taken:
step S52: and cleaning the data of the text which cannot be marked in the total text data.
In step S52, the text data that cannot be marked is directly subjected to tasks such as data cleaning and format conversion, and then directly participates in the subsequent slot extraction without marking.
Step S60: and carrying out result matching on the text which cannot be marked in the total text data and the marked text through slot extraction, and judging the matching result, as shown in fig. 3.
If the matching is successful, jumping to the subsequent step S80;
if the match fails, the following step S70 is performed.
Further, as shown in fig. 4, the process of extracting the slot is that the result matching is performed on the text which can not be marked and the marked text in the total text data through a rule engine;
when the rule engine detects a problem, the rule definition is automatically analyzed and repaired through the quick repair module, and the result matching is performed again through the rule engine.
A rule engine is a software tool that can define and execute various rules in a system, such as business rules, flow rules, data verification rules, and the like. Various problems may occur with the rule engine, such as inaccurate rule definition, rule conflicts, inefficient rule execution, etc., and when the rule engine detects a violation of a rule, a warning or error message may be generated.
Quick repair techniques can quickly identify and solve a number of problems. Quick repair techniques typically involve automated testing, code analysis, debugging tools, etc., which can quickly identify problems and provide solutions.
Therefore, the embodiment realizes the quick repair of the rule by combining a rule engine and a quick repair technology. When the rule engine finds out a problem, the quick repair technology can automatically analyze and repair rule definition, reduce the time and cost of manual repair, improve the reliability and efficiency of a system, realize the effect of updating and repairing the rule in real time, and avoid the problem that a great deal of time and energy are required to define and maintain a rule set in an offline non-updated rule.
And when the quick repair module cannot solve the problem detected by the rule engine or the rule engine still cannot perform result matching after rule repair, the slot extraction matching fails.
Semantic matching is performed on the text of the type with the slot extraction failure.
Step S70: and calculating a semantic matching result through a neural network.
Step S70 specifically includes step S71, step S72, and step S73, as shown in fig. 3;
step S71: and extracting features of the total text data through a depth fusion network model, training the depth fusion network model by using the training data, and predicting a result by using the test data through the depth fusion network model.
The labeled data in the step S51 can include training data and test data, the step S71 can perform a pre-training task on the constructed depth fusion model based on the training data, and then perform result prediction based on the test data, so as to obtain rich semantic feature information, thereby obtaining the best and effective model, ensuring the optimal prediction result and higher accuracy of user feedback information.
The text vectorization model of step S71 encodes words, phrases, sentence-level text; the multidimensional feature extraction model uses a multi-head attention mechanism to extract characters, words, sentence interactions, associated features and context semantic information respectively.
Step S72: transmitting the text data corresponding to the failure to a depth fusion network model for feature extraction, and extracting interactive features;
calculating aggregate interaction feature information between interaction features by attention weighted summation;
and (5) calculating differentiated interaction characteristic information among the interaction characteristics through attention mechanism enhancement.
The aggregated interaction feature information can aggregate interaction information among different features, so that richer feature representations are obtained. The method can improve the prediction performance of the model, and particularly under the condition that complex interaction relations exist between the features, the feature information is learned and aggregated through attention weighted summation, so that richer feature representation is obtained.
The differentiated interaction feature information is used to construct new feature representations by calculating the differences between different features. The method can capture important interaction relations among different features, so that the prediction capability of the model is improved. The method for calculating the differentiated interaction characteristic information can be to enhance and calculate the difference between the interaction characteristics by using an attention mechanism, the differentiated interaction characteristic information can be used as the input of a model, and the model such as a follow-up neural network can be used for learning and aggregation to obtain a more accurate prediction result.
Step S73: and calculating a matching result of the text semantics according to the feature information, the aggregated interaction feature information and the differentiated interaction feature information.
In step S73, the semantic matching result of the text is obtained by sequentially computing the vectors after feature extraction through the full connection layer and the self-attention mechanism, wherein, the semantic matching result is computed through the full connection layer, the semantic relationship quantization result with stronger text interactivity is obtained through the self-attention mechanism computation,
in addition, step S73 may also use semantic matching similarity with disambiguation to perform calculation, and implement the control based on the threshold size of the distance between terms, so that some redundant information in the text may be removed.
Step S80: the result is reached.
The semantic matching result after the operation of the steps is completed is subjected to a touch task, and the touch method of the result can adopt various forms of self-contained or external bands, such as various touch modes of external calling, short message touch, mail notification, large screen display, voice broadcast, text output, intelligent sound box, popup UI, app end, PAD, web end and the like, so that the requirement of the display mode of the result and digital display are met.
Compared with the prior art, the embodiment has the following beneficial effects:
the text semantic matching method utilizes a label annotation and slot extraction mechanism, so that some text data are more easy to extract after being labeled, and unlabeled common phrases are easy to extract, so that when the slots are extracted, three conditions are included: the matching is successful after the unlabeled slot position is extracted; the marked channel bit is extracted and then successfully matched; the marked matching fails after the slot extraction, and the matching is performed on the class of the matching failure through the depth fusion network model and calculation; the overall matching speed of the text is greatly improved, the matching accuracy is high, and the use experience of a user is improved.
In one embodiment, the present invention further provides a refrigeration equipment system, which includes a storage module 60 and a processing module 50, where the processing module 50 can implement the steps in the above-mentioned text semantic matching method when executing the computer program, that is, implement the steps in any one of the technical solutions of the above-mentioned text semantic matching method.
The refrigeration equipment system can also be shown in fig. 6, and comprises a plurality of modules, wherein the specific functions of each module are as follows:
the system comprises an acquisition module, a preprocessing module and a processing module, wherein the acquisition module is used for acquiring and preprocessing multi-mode data and/or multi-source heterogeneous data, wherein the multi-mode data comprises text, audio and video data related to a refrigeration equipment scene;
the video transcription module is used for transcribing the video data into text data;
an audio transcription module for transcribing audio data into text data in combination with the voice space-time characteristics and the context characteristics of the refrigeration equipment 100;
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring historical text data, and the historical text data comprises historical record data and historical interaction data;
the intelligent labeling module is used for summarizing text data in the multi-mode data, text data transcribed from the audio data, text data transcribed from the video data and the historical text data into total text data and labeling the text which can be labeled in the total text data;
the slot extraction module is used for carrying out result matching on the text which can not be marked in the total text data and the marked text through slot extraction;
the feature extraction module is used for extracting features of the total text data through a depth fusion model after the matching fails, wherein the depth fusion model is a fusion model of a text vectorization model and a multidimensional feature extraction model;
the aggregation difference module is used for calculating aggregation interaction feature information among the interaction features and calculating differentiated interaction feature information among the interaction features;
the semantic matching module is used for calculating a semantic matching result;
and the touch module is used for touch results.
It should be noted that, for details not disclosed in the refrigeration equipment system of the embodiment of the present invention, please refer to details disclosed in the text semantic matching method of the embodiment of the present invention.
The refrigeration appliance system may also include computing devices such as refrigeration appliance 100, cell phone, computer, notebook, palm top, and cloud server, and include, but are not limited to, processing module 50, memory module 60, and computer programs stored in memory module 60 and executable on processing module 50, such as the text semantic matching method programs described above. The processing module 50, when executing the computer program, implements the steps of the respective text semantic matching method embodiments described above, such as the steps shown in fig. 2 to 5.
The refrigeration appliance system may also include a signal transmission module and a communication bus 70. As shown in fig. 7, the signal transmission module is configured to send data to the processing module 50 or the server, for example, between the refrigeration device 100 and the mobile phone, or between the refrigeration device 100 and the server, through the signal transmission module, where the signal transmission module may transmit data in a wireless connection manner, such as bluetooth, wifi, zigBee, etc., and the communication bus 70 is configured to establish a connection between the signal transmission module, the processing module 50, and the storage module 60, where the communication bus 70 may include a path for transmitting information between the signal transmission module, the processing module 50, and the storage module 60.
The processing module 50 and the storage module 60 may be part of the refrigeration device 100, part of a mobile phone, a local terminal device, or part of a cloud server.
The processing module 50 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor, but in the alternative, it may be any conventional processor. The processing module 50 is a control center for the refrigeration equipment system and utilizes various interfaces and lines to connect the various parts of the overall refrigeration equipment system.
The memory module 60 may be used to store the computer program and/or module and the processing module 50 may perform various functions of the refrigeration appliance system by running or executing the computer program and/or module stored within the memory module 60 and invoking data stored within the memory module 60. The memory module 60 may mainly include a memory program area that may store an operating system, application programs required for at least one function, and the like, and a memory data area. In addition, the memory module 60 may include high-speed random access memory, and may also include nonvolatile memory, such as a hard disk, memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), at least one disk storage device, a Flash memory device, or other volatile solid-state memory device.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory module 60 and executed by the processing module 50 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program in a refrigeration appliance system.
Further, an embodiment of the present invention provides a readable storage medium storing a computer program, where the computer program can implement the steps in the above-mentioned text semantic matching method when executed by the processing module 50, that is, implement the steps in any one of the technical solutions of the above-mentioned text semantic matching method.
The modules integrated by the text semantic matching method can be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as independent products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the method embodiments described above when executed by the processing module 50.
Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U-disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random-access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunication signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is for clarity only, and that the skilled artisan should recognize that the embodiments may be combined as appropriate to form other embodiments that will be understood by those skilled in the art.
The above list of detailed descriptions is only specific to practical embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent embodiments or modifications that do not depart from the spirit of the present invention should be included in the scope of the present invention.
Claims (18)
1. The text semantic matching method is characterized by comprising the following steps of:
labeling the text which can be labeled in the total text data;
performing result matching on the text which cannot be marked in the total text data and the marked text through slot extraction, and judging the matched result;
if the matching is successful, outputting a matching result;
if the matching is failed, transmitting text data corresponding to the failure to a depth fusion network model for feature extraction, and calculating a matching result of text semantics according to the feature extraction result; the depth fusion network model is a fusion model of a text vectorization model and a multi-dimensional feature extraction model, wherein the text vectorization model vectorizes text data, and the multi-dimensional feature extraction model extracts multi-dimensional interaction features and associated features.
2. The text semantic matching method according to claim 1, wherein the step of performing result matching on both unlabeled text and labeled text in the total text data by slot extraction comprises:
performing result matching on the text which cannot be marked in the total text data and the marked text through a rule engine;
when the rule engine detects a problem, the rule definition is automatically analyzed and repaired through the quick repair module, and the result matching is performed again through the rule engine.
3. The text semantic matching method according to claim 2, further comprising the step of:
when the quick repair module cannot solve the problem detected by the rule engine or the rule engine still cannot perform result matching after rule repair, the slot extraction matching fails; among these problems are inaccurate rule definition, rule conflicts, or inefficient rule execution.
4. The text semantic matching method according to claim 1, wherein the step of if the matching fails further comprises:
transmitting the text data corresponding to the failure to a depth fusion network model for feature extraction, and extracting interactive features;
calculating aggregate interaction feature information between interaction features;
calculating differentiated interaction characteristic information among the interaction characteristics;
and calculating a matching result of the text semantics according to the feature information, the aggregated interaction feature information and the differentiated interaction feature information.
5. The text semantic matching method according to claim 4, wherein the step of calculating aggregated interaction feature information between interaction features comprises:
calculating aggregate interaction feature information between interaction features by attention weighted summation;
the step of calculating differentiated interaction characteristic information between interaction characteristics comprises the following steps:
and (5) calculating differentiated interaction characteristic information among the interaction characteristics through attention mechanism enhancement.
6. The text semantic matching method according to claim 1, wherein the labeling of the text that can be labeled in the total text data comprises:
and sequentially performing pre-marking, formal marking and marking quality inspection on the marked text, wherein when the marked value of the text is judged to be lower than a preset threshold value through marking quality inspection, the text is returned to the pre-marking for re-marking.
7. The text semantic matching method according to claim 1, wherein the step of labeling the text that can be labeled in the total text data comprises:
labeling the text which can be labeled in the total text data, and respectively storing the labeled text as training data and test data;
the step of transmitting the text data corresponding to failure to the deep fusion network model for feature extraction comprises the following steps:
and training the depth fusion network model by using the training data for the marked text data with failed matching, and predicting a result by using the test data through the depth fusion network model.
8. The text semantic matching method according to claim 1, further comprising the step of:
performing data cleaning on text which cannot be marked in the total text data;
the step of transmitting the text data corresponding to failure to the deep fusion network model for feature extraction comprises the following steps:
and training the depth fusion network model by using the non-labeling text by adopting an unsupervised learning algorithm for the non-labeling text data with failed matching.
9. The text semantic matching method according to claim 1, wherein the total text data is formed by all transferring multi-modal data and/or multi-source heterogeneous data into text data and summarizing.
10. The text semantic matching method according to claim 9, further comprising the step of:
collecting and preprocessing multi-modal data and/or multi-source heterogeneous data, wherein the multi-modal data comprises text, audio and video data, and the preprocessing comprises cleaning, format conversion and storage of the multi-modal data;
converting the video data into text data;
transcribing the audio data into text data;
acquiring historical text data;
summarizing the text data in the multi-modal data, the text data transcribed from the audio data, the text data transcribed from the video data, and the historical text data into the total text data.
11. The text semantic matching method according to claim 10, wherein the step of converting video data into text data comprises:
separating audio and images in the video data to obtain audio data and image data;
identifying text information in the image data, and converting the text information into text data;
identifying image data based on space-time and long-distance dependent features, and transferring the image data into text data;
wherein the step of recognizing the image based on the space-time and long-distance dependent features, the converting into text data includes:
generating a student model by using a fusion model of knowledge distillation and diffusion models based on space-time and long-distance dependent characteristic recognition images, and transferring image data into text data by using the student model.
12. The text semantic matching method according to claim 10, wherein the step of converting video data into text data comprises:
acquiring a key frame image in the video data through a diffusion network model;
and identifying the key frame image to generate text data.
13. The text semantic matching method according to claim 10, wherein the step of transcribing the audio data into text data comprises:
and establishing a deep circular convolution network model based on fusion neural networks MMCNN-RNN, CTC and Attention to transfer the audio data into text data by combining the voice space-time characteristics and the context relation characteristics under the scene of acquiring the data.
14. The text semantic matching method according to claim 10, wherein the historical text data comprises historical record data and historical interaction data, wherein the historical record data comprises food preference data, interest data and comment data of a user, and the historical interaction data comprises interaction records obtained from a client or an interaction end of a refrigeration device.
15. The text semantic matching method according to claim 1, wherein the text vectorization model encodes words, phrases, sentence-level text;
the multidimensional feature extraction model uses a multi-head attention mechanism to extract characters, words, sentence interactions, associated features and context semantic information respectively.
16. The text semantic matching method according to claim 1, wherein the step of calculating a text semantic matching result based on the feature extraction result comprises: and calculating the vector after feature extraction through a full connection layer and a self-attention mechanism in sequence to obtain a text semantic matching result.
17. The text semantic matching method according to claim 1, further comprising the step of:
and touching the matching result of successful matching and the matching result of recalculated matching failure.
18. A refrigeration appliance system comprising:
a storage module storing a computer program;
a processing module, when executing the computer program, is capable of implementing the steps in the text semantic matching method according to any one of claims 1 to 17.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310247263.1A CN116521821A (en) | 2023-03-15 | 2023-03-15 | Text semantic matching method and refrigeration equipment system |
PCT/CN2024/081468 WO2024188277A1 (en) | 2023-03-15 | 2024-03-13 | Text semantic matching method and refrigeration device system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310247263.1A CN116521821A (en) | 2023-03-15 | 2023-03-15 | Text semantic matching method and refrigeration equipment system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116521821A true CN116521821A (en) | 2023-08-01 |
Family
ID=87401935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310247263.1A Pending CN116521821A (en) | 2023-03-15 | 2023-03-15 | Text semantic matching method and refrigeration equipment system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116521821A (en) |
WO (1) | WO2024188277A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024188276A1 (en) * | 2023-03-15 | 2024-09-19 | 青岛海尔电冰箱有限公司 | Text classification method and refrigeration device system |
WO2024188277A1 (en) * | 2023-03-15 | 2024-09-19 | 青岛海尔电冰箱有限公司 | Text semantic matching method and refrigeration device system |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9772995B2 (en) * | 2012-12-27 | 2017-09-26 | Abbyy Development Llc | Finding an appropriate meaning of an entry in a text |
CN111046674B (en) * | 2019-12-20 | 2024-05-31 | 科大讯飞股份有限公司 | Semantic understanding method and device, electronic equipment and storage medium |
CN112241631A (en) * | 2020-10-23 | 2021-01-19 | 平安科技(深圳)有限公司 | Text semantic recognition method and device, electronic equipment and storage medium |
CN115098765A (en) * | 2022-05-20 | 2022-09-23 | 青岛海尔电冰箱有限公司 | Information pushing method, device and equipment based on deep learning and storage medium |
CN116521821A (en) * | 2023-03-15 | 2023-08-01 | 青岛海尔电冰箱有限公司 | Text semantic matching method and refrigeration equipment system |
-
2023
- 2023-03-15 CN CN202310247263.1A patent/CN116521821A/en active Pending
-
2024
- 2024-03-13 WO PCT/CN2024/081468 patent/WO2024188277A1/en unknown
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024188276A1 (en) * | 2023-03-15 | 2024-09-19 | 青岛海尔电冰箱有限公司 | Text classification method and refrigeration device system |
WO2024188277A1 (en) * | 2023-03-15 | 2024-09-19 | 青岛海尔电冰箱有限公司 | Text semantic matching method and refrigeration device system |
Also Published As
Publication number | Publication date |
---|---|
WO2024188277A1 (en) | 2024-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110444198B (en) | Retrieval method, retrieval device, computer equipment and storage medium | |
WO2022095380A1 (en) | Ai-based virtual interaction model generation method and apparatus, computer device and storage medium | |
CN110795532A (en) | Voice information processing method and device, intelligent terminal and storage medium | |
WO2024188277A1 (en) | Text semantic matching method and refrigeration device system | |
WO2018045646A1 (en) | Artificial intelligence-based method and device for human-machine interaction | |
US20230394247A1 (en) | Human-machine collaborative conversation interaction system and method | |
CN104462600A (en) | Method and device for achieving automatic classification of calling reasons | |
WO2024193596A1 (en) | Natural language understanding method and refrigerator | |
CN111177186A (en) | Question retrieval-based single sentence intention identification method, device and system | |
CN112632244A (en) | Man-machine conversation optimization method and device, computer equipment and storage medium | |
CN111046148A (en) | Intelligent interaction system and intelligent customer service robot | |
CN111858875A (en) | Intelligent interaction method, device, equipment and storage medium | |
CN110890097A (en) | Voice processing method and device, computer storage medium and electronic equipment | |
CN113393841B (en) | Training method, device, equipment and storage medium of voice recognition model | |
CN113051895A (en) | Method, apparatus, electronic device, medium, and program product for speech recognition | |
WO2024188276A1 (en) | Text classification method and refrigeration device system | |
WO2020227968A1 (en) | Adversarial multi-binary neural network for multi-class classification | |
CN118210910A (en) | AI interaction session processing method and system based on intelligent training | |
WO2024093578A1 (en) | Voice recognition method and apparatus, and electronic device, storage medium and computer program product | |
CN117952081A (en) | User opinion analysis method, device and medium | |
CN118114679A (en) | Service dialogue quality control method, system, electronic equipment and storage medium | |
US20240037316A1 (en) | Automatically summarizing event-related data using artificial intelligence techniques | |
CN112466286A (en) | Data processing method and device and terminal equipment | |
CN117172258A (en) | Semantic analysis method and device and electronic equipment | |
CN113505293B (en) | Information pushing method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |