WO2024169529A1

WO2024169529A1 - Knowledge base construction method, data retrieval method and apparatus, and cloud device

Info

Publication number: WO2024169529A1
Application number: PCT/CN2024/073350
Authority: WO
Inventors: 李鹤
Original assignee: 杭州阿里云飞天信息技术有限公司
Priority date: 2023-02-13
Filing date: 2024-01-19
Publication date: 2024-08-22
Also published as: CN116340479A

Abstract

The present application provides a knowledge base construction method, a data retrieval method and apparatus, and a cloud device. The knowledge base construction method comprises: obtaining a plurality of pieces of original data, the modalities of the plurality of pieces of original data being at least two of: text, image or video; for original data in the plurality of pieces of original data, determining a key text of the original data, determining the key of a key-value pair according to the key text, and determining the value of the key-value pair according to the original data, wherein the key text is used for describing the content of the original data; according to the key-value pairs corresponding to the plurality of pieces of original data, constructing a question and answer knowledge base, thereby allowing for constructing a question and answer knowledge base supporting different modalities of answer retrieval.

Description

Knowledge base construction method, data retrieval method, device and cloud device

This application claims the priority of the Chinese patent application filed with the China Patent Office on February 13, 2023, with application number 202310166877.7 and application name “Knowledge base construction method, data retrieval method, device and cloud device”, all contents of which are incorporated by reference in this application.

Technical Field

The present application relates to the field of computer technology, and in particular to a knowledge base construction method, a data retrieval method, an apparatus and a cloud device.

Background Art

The question-and-answer knowledge base is applied in intelligent customer service scenarios, in which the corresponding answer text is retrieved from the question-and-answer knowledge base based on the user's inquiry text and the user is replied.

In the related art, a question-answering knowledge base is constructed by taking part of the text in the original text data as an index and taking the original text as the value of the index. However, the question-answering knowledge base obtained by this construction method is not suitable for the retrieval of answer data of different modalities, which limits the application scope of the question-answering knowledge base in intelligent customer service scenarios.

Summary of the invention

Multiple aspects of the present application provide a knowledge base construction method, data retrieval method, apparatus and cloud device to solve the problem that the question-and-answer knowledge base constructed by related technologies is not suitable for retrieval of answer data of different modalities.

A first aspect of an embodiment of the present application provides a method for constructing a knowledge base, comprising:

Acquire multiple pieces of original data, where the modalities of the multiple pieces of original data are at least two of text, image, or video;

For original data in the plurality of original data, determine key text of the original data, determine a key of a key-value pair according to the key text, and determine a value of the key-value pair according to the original data, wherein the key text is used to describe the content of the original data;

Build a question-and-answer knowledge base based on the key-value pairs corresponding to multiple pieces of original data.

A second aspect of an embodiment of the present application provides a data retrieval method, comprising:

Receiving inquiry data sent by a terminal device, where the inquiry data is one of text, image or video;

In the question-answering knowledge base, answer data corresponding to the query data is retrieved, the modality of the answer data is one of text, image or video, and the question-answering knowledge base is constructed according to the knowledge base construction method of any one of the first aspects;

Send answer data to the terminal device.

A third aspect of the present application embodiment provides a data retrieval method, which is applied to a terminal device. The data retrieval method includes:

Send query data to the server;

Receive answer data of the inquiry data sent by the server, the answer data is determined according to the data retrieval method of the second aspect.

A fourth aspect of the present application provides a knowledge base construction device, including:

An acquisition module, used for acquiring multiple pieces of original data, where the modalities of the multiple pieces of original data are at least two of text, image or video;

a determination module, for determining key text of the original data among the plurality of original data, determining a key of a key-value pair according to the key text, and determining a value of the key-value pair according to the original data, wherein the key text is used to describe the content of the original data;

The construction module is used to construct a question-answering knowledge base based on the key-value pairs corresponding to multiple pieces of original data.

The fifth aspect of an embodiment of the present application provides a cloud device, including: a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the knowledge base construction method of the first aspect, the data retrieval method of the second aspect and/or the third aspect are implemented.

The sixth aspect of the embodiments of the present application provides a computer-readable storage medium, in which computer execution instructions are stored. When the computer execution instructions are executed by a processor, they are used to implement the knowledge base construction method of the first aspect, the data retrieval method of the second aspect and/or the third aspect.

A seventh aspect of an embodiment of the present application provides a computer program product, the program product comprising: a computer program, the computer program is stored in a readable storage medium, at least one processor of an electronic device can read the computer program from the readable storage medium, and at least one processor executes the computer program so that the electronic device executes the knowledge base construction method of the first aspect, the data retrieval method of the second aspect and/or the third aspect.

The embodiment of the present application is applied in an intelligent customer service scenario, by obtaining multiple pieces of original data, the modalities of the multiple pieces of original data are at least two of text, image or video; for the original data in the multiple pieces of original data, the key text of the original data is determined, and the key of the key-value pair is determined according to the key text, and the value of the key-value pair is determined according to the original data, and the key text is used to describe the content of the original data; according to the key-value pairs corresponding to the multiple pieces of original data, a question-and-answer knowledge base is constructed, and a question-and-answer knowledge base that supports answer retrieval in different modalities can be constructed.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings described herein are used to provide a further understanding of the present application and constitute a part of the present application. The illustrative embodiments of the present application and their descriptions are used to explain the present application and do not constitute an improper limitation on the present application. In the drawings:

FIG1 is a diagram of an application scenario provided by an exemplary embodiment of the present application;

FIG2 is a flowchart of a method for constructing a knowledge base provided by an exemplary embodiment of the present application;

FIG3 is a schematic diagram of a question-answering knowledge base provided by an exemplary embodiment of the present application;

FIG4 is a schematic diagram of another question-answering knowledge base provided by an exemplary embodiment of the present application;

FIG5 is a flowchart of a data retrieval method provided by an exemplary embodiment of the present application;

FIG6 is a flowchart of another data retrieval method provided by an exemplary embodiment of the present application;

FIG7 is a structural block diagram of a knowledge base construction device provided by an exemplary embodiment of the present application;

FIG8 is a schematic diagram of the structure of a cloud device provided by an exemplary embodiment of the present application.

DETAILED DESCRIPTION

In order to make the purpose, technical solution and advantages of the present application clearer, the technical solution of the present application will be clearly and completely described below in combination with the specific embodiments of the present application and the corresponding drawings. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without making creative work are within the scope of protection of the present application.

Intelligent customer service is an automatic question-and-answer product. In the field of e-commerce, facing the merchants’ rich product knowledge and the massive questions that users may ask, if you want to provide users with high-quality answers, you need to have enough data in the question-and-answer knowledge base. In related technologies, the index (key in the key-value pair) construction method is to directly build a query index for a certain field. However, in the multi-source knowledge scenario of intelligent customer service, this method is not suitable because the multi-source knowledge of intelligent customer service is usually in different modalities, and each modality includes different types of data. As the business develops, the amount of raw data will also increase. Therefore, how to design a question-and-answer knowledge base that can support the retrieval of answer data of different modalities and different types, and at the same time, the method of building the knowledge base is scalable and can be applied to new raw data is particularly important.

Based on the above background, the knowledge base construction method provided in the embodiment of the present application includes: obtaining multiple pieces of original data, the modalities of the multiple pieces of original data are at least two of text, image or video; for the original data in the multiple pieces of original data, determining the key text of the original data, and determining the key of the key-value pair according to the key text, and determining the value of the key-value pair according to the original data, the key text is used to describe the content of the original data; constructing a question-and-answer knowledge base according to the key-value pairs corresponding to the multiple pieces of original data. The present application can construct a question-and-answer knowledge base that supports answer retrieval in different modalities, and the construction method is extensible. When there is new original data, this method can be used to obtain key-value pairs for the new original data and incorporate them into the question-and-answer knowledge base.

In this embodiment, the overall knowledge base construction method can be realized by means of a cloud computing system. In addition, the server of the knowledge base construction method can be a cloud server, so as to run various neural network models with the advantage of cloud resources; relative to the cloud, the knowledge base construction method can also be applied to conventional servers or server arrays and other server-side devices, which are not limited here.

Referring to Figure 1, an application scenario diagram provided by the present application includes raw data of different modalities, such as text, images, or videos. The text data comes from different types of text, such as conversations, comments, and graphs. These raw data are processed to build a question-and-answer knowledge base. After the question-and-answer knowledge base is put online for use, upon receiving the user's inquiry text, the corresponding answer data can be found in the question-and-answer knowledge base, and the answer data can be returned to the user to reply to the user's inquiry text.

The technical solutions provided by various embodiments of the present application are described in detail below in conjunction with the accompanying drawings.

FIG2 is a flowchart of a method for constructing a knowledge base provided by an exemplary embodiment of the present application. The method for constructing the knowledge base used in the server, as shown in FIG2, specifically includes the following steps:

S201, obtaining multiple pieces of original data.

The modalities of the multiple pieces of original data are at least two of text, image or video.

In the embodiment of the present application, it can be understood that the database includes multiple pieces of raw data, which come from different data sources, and the data of the data source is text mode, image mode or video mode. Among them, the data source of the text mode includes: different types of data sources, such as user comment data source, person-to-person dialogue data source, product attribute data source, product knowledge graph data source, product details page data source and product homepage data source. Among them, the user comment data source includes: user comment text on the quality of a certain product, the express delivery speed of the product or the service of the merchant. The person-to-person dialogue data source includes: the dialogue text between the user and the merchant's manual customer service; the product attribute data source includes: text descriptions of product attributes (such as category, size, color, etc.). The product knowledge graph data source includes: nodes represent products, and the connections between nodes represent the knowledge graph of the connections between products. The product details page data source contains text content on the image, which is used to describe the product. These text introduction contents can be identified by OCR (optical character recognition) technology and used as raw data. The data source of the product homepage is the text content contained in the image, which is also used to describe the product. The text content can also be recognized by OCR technology and used as original data.

In the embodiment of the present application, the image may include an image of a product object, or may include an image of text, such as the product details page, product homepage, etc. The video may include a product introduction video, a product usage video, or a product installation video.

In addition, a piece of raw data, such as a comment text is a piece of raw data, a person-to-person conversation is a piece of raw data, a product attribute is a piece of raw data, a knowledge graph is a piece of raw data, all the text recognized in an image is a piece of raw data, an image is a piece of raw data, and a video is a piece of raw data.

In the embodiments of the present application, the source of the original data is not limited.

S202, determining key text of the original data among the plurality of original data, determining a key of a key-value pair according to the key text, and determining a value of the key-value pair according to the original data.

In an embodiment of the present application, at least one key text can be determined corresponding to a piece of raw data. The key text is used to describe the content of the raw data. Determining the key text of the raw data includes: converting the raw data into natural language text; extracting the key text from the natural language text to obtain the key text, and the key text includes: at least one of the content theme, core viewpoint, keyword, and key entity of the raw data.

Wherein, if the original data is text, converting the original data into natural language text includes: processing the original data through a prompt (a natural language model) template to obtain natural language text. The prompt template is pre-trained and can convert the original data into natural language text. If the original data is an image or video (including multiple frames of images), the image or video can be input into a pre-trained multimodal recognition model for processing, and a natural language text describing the corresponding image or video can be obtained.

In the embodiment of the present application, key text is extracted from the natural language text to obtain the key text. It includes: inputting natural language text into a unified model to extract key text and obtain key text.

In the embodiment of the present application, the unified model includes multiple pre-trained sub-models, which can realize the unification of multiple NLP (natural language) tasks, such as matching sub-models, classification sub-models or sorting sub-models. The matching sub-model is used to realize the matching task, the classification sub-model is used to realize the classification task, and the sorting sub-model is used to realize the sorting task. The natural language text is input into the unified model, and at least one key text can be obtained through the processing of each sub-model.

For example, the unified model can extract the core contents of the original data, such as content themes, core ideas, keywords, key entities, etc. Among them, content themes, such as whether the original data is about goods or quality, core ideas, such as whether the original data is positive data or negative data, keywords, such as keywords in the original data that describe the content theme attributes, and key entities, such as location, time, product name or product category, etc.

In an optional embodiment, the key text can be used as the key of the key-value pair, and the natural language text or original data can be used as the value of the key-value pair to obtain the key-value pair, thereby constructing a question-answering knowledge base. Among them, one piece of original data corresponds to one piece of natural language text, and one piece of natural language text can correspond to one key, which is one key text in the key text of the original data or a combination of multiple key texts.

For example, referring to FIG3 , the original data is X, the natural text corresponding to the original data is N, and the key text corresponding to the original data includes topic A, core view B, keyword C, keyword D, key entity E and key entity F. The key-value pairs included in the constructed question-answering knowledge base are shown in FIG3 .

Furthermore, the value of the key-value pair is determined based on the original data, including: segmenting the original data based on the key text to obtain data segments corresponding to the key text, the mode of the data segments is the same as the mode of the original data; and determining the value of the key-value pair based on the data segments.

In addition, if the original data is text, the original data is segmented based on the key text to obtain data segments corresponding to the key text, including: using machine reading comprehension technology (MRC, Machine Reading Comprehension) to extract data segments describing the key text from the original data.

Specifically, the key text and the corresponding raw data are input into the understanding model based on machine reading comprehension technology, and the data fragments describing the key text are output. Among them, the understanding model can remove the redundant noise data in the raw data, thereby improving the overall quality of the data fragments. In the embodiment of the present application, a key text has a corresponding data fragment in the raw data.

For example, if the original data is "the parameters of this projector are body weight 1kg, zoom multiple is fixed focus, and light source type is LED light source", the key texts obtained are: projector, weight, zoom multiple, fixed focus, light source type, LED light source. Then the data segment corresponding to "projector" can be the entire original data, the data segment corresponding to weight can be "body weight 1kg", the data segment corresponding to zoom multiple and/or fixed focus is zoom multiple is fixed focus, and the data segment corresponding to light source type and/or LED light source is light source type is LED light source.

In the embodiment of the present application, if the original data is an image, the entire image can be used as the value of the key-value pair, or the original data can be segmented, and the local image obtained by segmentation is the data fragment as the value of the key-value pair. The segmentation method can be through image recognition or image processing technology. For example, the object contained in an image is a set of tableware. If the key text is “spoon”, the partial image containing the spoon in the image can be used as the data segment of the “spoon”.

Furthermore, if the original data is a video, the entire video can be used as the value of the key-value pair, or a portion of the frame image in the video can be used as the value of the key-value pair, or a local image in the image in the video can be used as the value of the key-value pair. In the embodiment of the present application, the specific segmentation method of the image and video is not limited.

Further, determining the key of the key-value pair according to the key text includes: expanding the key text to obtain a first inquiry text; and determining the first inquiry text as the key of the key-value pair.

In an embodiment of the present application, one or more key texts can be expanded to obtain a first inquiry text. For example, for the original data X, the corresponding key texts include topic A, core view B, keyword C, keyword D, key entity E, and key entity F. Then topic A can be expanded to obtain the corresponding first inquiry text; topic A and core view B can be expanded to obtain the corresponding first inquiry text. Among them, one original data can correspond to multiple first inquiry texts. The first inquiry text is determined as the key of the key-value pair, which is the index.

In the present application, the key text may be expanded by inputting the key text into a pre-trained expansion model, so that the key text is expanded into a natural language inquiry text. For example, if the subject A is a projector and the keyword C is weight, the expanded first inquiry text may be "How much does the projector weigh?" It can be understood that the expanded first inquiry text includes the key text.

Furthermore, after expanding the key text to obtain the query text, the method further includes: encoding the first query text to obtain a coding vector; and determining the coding vector as a key in the key-value pair.

In an embodiment of the present application, the first inquiry text is encoded using an encoder of a pre-trained BERT language model, the first inquiry text is input into the encoder for encoding, and the output is the encoding vector of the first inquiry text.

It can be understood that using the encoding vector as the key (ie, index) in the key-value pair can effectively make up for the defect that the query text index has a poor recall effect on semantic content.

S203, constructing a question-answer knowledge base according to the key-value pairs corresponding to the multiple pieces of original data.

Furthermore, it also includes: performing data mining on the original data to obtain target data; generating a second query text of the target data based on the target data; determining the target data as the value of the key-value pair and determining the second query text as the key of the key-value pair to construct a question-and-answer knowledge base.

In an embodiment of the present application, the data mined from the original data can be combined into key-value pairs and added to the question-answer knowledge base constructed as described above, thereby increasing the amount of data in the question-answer knowledge base. Exemplarily, user comment data is mined, and the target data obtained is comment data, and multiple comment data constitute the comment knowledge base. Product attribute data is mined, and the target data obtained is product data, and multiple product data constitute the product knowledge base. Other data is mined, and the target data obtained is general data, and multiple general data constitute the general knowledge base. For the target data in the comment knowledge base, product knowledge base, and general knowledge base, the key of the target data is determined, and the target data is used as the value, so that the key-value pair can be obtained and added to the question-answer knowledge base constructed in the above manner.

For example, referring to FIG4 , which is a schematic diagram of a question-answering knowledge base obtained in the present application, the key of the question-answering knowledge base may be a text or a coded vector, but the value corresponding to the key may be a text, an image or a video. For example, in FIG4 , the encoding vector b is the encoding vector of the query text a, and the corresponding value of the two is the text g. The encoding vector d is the encoding vector of the query text c, and the corresponding value of the two is the image h. The encoding vector f is the encoding vector of the query text e, and the corresponding value of the two is the video k.

The method for constructing a question-and-answer knowledge base provided in this application can solve the problem of unified retrieval of answer data of different modalities. At the same time, it has strong scalability and can better solve the problem that the method for constructing a question-and-answer knowledge base in related technologies is not flexible enough. This application can quickly construct a question-and-answer knowledge base of 100 million levels, thereby improving the retrieval coverage of online inquiry data.

FIG5 is a flowchart of a data retrieval method provided by an exemplary embodiment of the present application, which is applied to a server. Specifically, the following steps are included:

S501, receiving inquiry data sent by a terminal device.

The query data is one of text, image or video. If it is an image or video, the image or video can be converted into a query text, and the query text can describe the content of the image or video.

S502: Retrieve answer data corresponding to the query data in the question-answer knowledge base.

The mode of the answer data is one of text, image or video, and the question-answer knowledge base is constructed according to the above-mentioned knowledge base construction method.

Furthermore, in the question-and-answer knowledge base, answer data corresponding to the query data is retrieved, including: determining that the value in a key-value pair of a key determined based on the query data is the answer data; and/or encoding the query data to obtain a query encoding vector; in the question-and-answer knowledge base, determining a target encoding vector whose similarity with the encoding vector is greater than a threshold; in the question-and-answer knowledge base, determining that the value of the key-value pair with the target encoding vector as the key is the answer data.

Specifically, determining that the value in the key-value pair of the key determined based on the query data is the answer data includes: if the query data sent by the terminal device is text, the corresponding value can be retrieved as the answer data in the question and answer knowledge base using the query data as the key; if the corresponding value cannot be retrieved, the query data is encoded to obtain a coding vector, and the corresponding value is retrieved as the answer data in the question and answer knowledge base using the coding vector as the key.

Furthermore, if the query data is an image or video, the image or video can be extracted and processed to obtain the corresponding query text, and then the query text is used as a key or the encoding vector of the query text is used as a key to retrieve the corresponding value as the answer data.

S503, sending answer data to the terminal device.

In an embodiment of the present application, the server can retrieve answer data for the query data based on the knowledge base to provide high-quality answers to the user.

FIG6 is a flowchart of another data retrieval method provided by an exemplary embodiment of the present application. Applied to a terminal device, the method specifically includes the following steps:

S601, sending inquiry data to the server.

The query data is one of text, image or video.

S602, receiving answer data of the inquiry data sent by the server.

The answer data is determined according to the above-mentioned data retrieval method.

The specific implementation process of this embodiment refers to the above embodiment and will not be repeated here.

In the embodiment of the present application, referring to FIG. 7 , in addition to providing a method for constructing a knowledge base, a knowledge base construction device 70 is also provided. The knowledge base construction device 70 includes:

An acquisition module 71 is used to acquire multiple pieces of original data, where the modalities of the multiple pieces of original data are at least two of text, image or video;

A determination module 72, for determining key text of the original data among the plurality of original data, and determining a key of a key-value pair according to the key text, and determining a value of the key-value pair according to the original data, wherein the key text is used to describe the content of the original data;

The construction module 73 is used to construct a question-answer knowledge base according to the key-value pairs corresponding to the multiple original data.

In an optional embodiment, the determination module 72 is specifically used to: convert the original data into natural language text; extract key text from the natural language text to obtain key text, and the key text includes: at least one of the content theme, core ideas, keywords, and key entities of the original data.

In an optional embodiment, when extracting key text from natural language text to obtain key text, the determination module 72 is specifically used to: input the natural language text into a unified model to extract key text to obtain key text.

In an optional embodiment, the determination module 72 is specifically used to: segment the original data based on the key text to obtain data segments corresponding to the key text, and the mode of the data segments is the same as the mode of the original data; determine the value of the key-value pair based on the data segments.

In an optional embodiment, the original data is text, and the determination module 72 segments the original data based on the key text to obtain data segments corresponding to the key text, specifically for: using machine reading comprehension technology to extract data segments describing the key text from the original data.

In an optional embodiment, the determination module 72 is specifically used to: expand the key text to obtain a first query text; and determine that the first query text is a key of the key-value pair.

In an optional embodiment, the determination module 72 is further used to encode the first query text to obtain an encoding vector; and determine the encoding vector as a key in the key-value pair.

In an optional embodiment, the determination module 72 is also used to perform data mining on the original data to obtain target data; generate a second query text for the target data based on the target data; determine the target data as the value of the key-value pair and determine the second query text as the key of the key-value pair to construct a question-and-answer knowledge base.

In an embodiment of the present application, a data retrieval device (not shown) is also provided, and the data retrieval device includes:

A receiving module, used to receive inquiry data sent by a terminal device, where the inquiry data is one of text, image or video;

A retrieval module is used to retrieve answer data corresponding to the query data in the question-answer knowledge base, where the mode of the answer data is one of text, image or video. The question-answer knowledge base is constructed according to the above-mentioned knowledge base construction method;

The sending module is used to send answer data to the terminal device.

In an optional embodiment, the retrieval module is specifically used to: determine that a value in a key-value pair of a key determined based on the query data is answer data;

And/or, encode the query data to obtain a query encoding vector; in the question-answering knowledge base, determine a target encoding vector whose similarity with the encoding vector is greater than a threshold; in the question-answering knowledge base, determine that the value of the key-value pair with the target encoding vector as the key is the answer data.

In an embodiment of the present application, another data retrieval device (not shown) is further provided and applied to a terminal device. The data retrieval device includes:

A sending module, used for sending query data to the server;

The receiving module is used to receive answer data of the inquiry data sent by the server, and the answer data is determined by the above-mentioned data retrieval method.

In an embodiment of the present application, a knowledge base construction device is provided, which is applied to an intelligent customer service scenario, by obtaining multiple pieces of original data, the modalities of the multiple pieces of original data are at least two of text, image or video; for the original data in the multiple pieces of original data, the key text of the original data is determined, and the key of the key-value pair is determined according to the key text, and the value of the key-value pair is determined according to the original data, and the key text is used to describe the content of the original data; according to the key-value pairs corresponding to the multiple pieces of original data, a question-and-answer knowledge base is constructed, and a question-and-answer knowledge base that supports answer retrieval in different modalities can be constructed.

In addition, in some of the processes described in the above embodiments and the accompanying drawings, multiple operations that appear in a specific order are included, but it should be clearly understood that these operations may not be executed in the order in which they appear in this article or executed in parallel, and are only used to distinguish between different operations, and the sequence number itself does not represent any execution order. In addition, these processes may include more or fewer operations, and these operations may be executed in sequence or in parallel. It should be noted that the descriptions of "first", "second", etc. in this article are used to distinguish different messages, devices, modules, etc., do not represent the order of precedence, and do not limit the "first" and "second" to be different types.

FIG8 is a schematic diagram of the structure of a cloud device 80 provided by an exemplary embodiment of the present application. The cloud device 80 is used to run the above-mentioned knowledge base construction method or image processing method. As shown in FIG8 , the cloud device includes: a memory 84 and a processor 85 .

The memory 84 is used to store computer programs and can be configured to store various other information to support operations on the cloud device. The memory 84 can be an object storage service (OSS).

The memory 84 can be implemented by any type of volatile or nonvolatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.

The processor 85 is coupled to the memory 84 and is used to execute the computer program in the memory 84 to: obtain multiple pieces of original data, where the modalities of the multiple pieces of original data are at least two of text, image, or video; determine key text of the original data for the original data in the multiple pieces of original data, determine the key of the key-value pair according to the key text, and determine the value of the key-value pair according to the original data, where the key text is used to describe the content of the original data; and determine the value of the key-value pair according to the multiple pieces of original data. According to the corresponding key-value pairs, a question-answer knowledge base is constructed.

Further optionally, when determining the key text of the original data, the processor 85 is specifically used to: convert the original data into natural language text; extract the key text from the natural language text to obtain the key text, and the key text includes: at least one of the content theme, core ideas, keywords, and key entities of the original data.

Further optionally, when the processor 85 extracts key text from the natural language text to obtain the key text, it is specifically used to: input the natural language text into a unified model to extract the key text to obtain the key text.

Further optionally, when determining the value of the key-value pair based on the original data, the processor 85 is specifically used to: segment the original data based on the key text to obtain data segments corresponding to the key text, the mode of the data segments is the same as the mode of the original data; and determine the value of the key-value pair based on the data segments.

Further optionally, the original data is text, and when the processor 85 segments the original data based on the key text to obtain data segments corresponding to the key text, it is specifically used to: use machine reading comprehension technology to extract data segments describing the key text from the original data.

Further optionally, after expanding the key text to obtain the query text, the processor 85 is further used to: encode the first query text to obtain a coding vector; and determine the coding vector as a key in the key-value pair.

Further optionally, the processor 85 is also used to perform data mining on the original data to obtain target data; generate a second query text of the target data based on the target data; determine the target data as the value of the key-value pair and determine the second query text as the key of the key-value pair to construct a question and answer knowledge base.

In an optional embodiment, the processor 85 is coupled to the memory 84, and is used to execute the computer program in the memory 84, so as to: receive query data sent by a terminal device, the query data being one of text, image or video; retrieve answer data corresponding to the query data in a question and answer knowledge base, the modality of the answer data being one of text, image or video, and the question and answer knowledge base is constructed according to any of the above-mentioned knowledge base construction methods; and send the answer data to the terminal device.

Further optionally, when the processor 85 retrieves answer data corresponding to the query data in the question and answer knowledge base, it is specifically used to: determine that the value in the key-value pair of the key determined based on the query data is the answer data; and/or, encode the query data to obtain a query encoding vector; determine in the question and answer knowledge base a target encoding vector whose similarity with the encoding vector is greater than a threshold.

In an optional embodiment, the processor 85 is coupled to the memory 84 and is used to execute the computer program in the memory 84 to: send query data to the server; receive answer data of the query data sent by the server, and the answer data is determined according to the above-mentioned data retrieval method.

Furthermore, as shown in Fig. 8 , the cloud device also includes other components such as a firewall 81, a load balancer 82, a communication component 86, and a power supply component 83. Fig. 8 only schematically shows some components, which does not mean that the cloud device only includes the components shown in Fig. 8 .

The cloud device provided in the embodiment of the present application can obtain a compressed visual network model, which occupies a smaller memory and has a faster computing efficiency without affecting the recognition accuracy.

Accordingly, the embodiment of the present application also provides a computer-readable storage medium storing a computer program. When the computer program/instructions are executed by a processor, the processor is caused to implement the steps in the above-mentioned method.

Accordingly, an embodiment of the present application also provides a computer program product, including a computer program/instruction. When the computer program/instruction is executed by a processor, the processor is caused to implement the steps in the method shown above.

The communication component of Figure 8 above is configured to facilitate wired or wireless communication between the device where the communication component is located and other devices. The device where the communication component is located can access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast-related text from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared information association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

The power supply assembly of Figure 8 provides power to various components of the device in which the power supply assembly is located. The power supply assembly may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device in which the power supply assembly is located.

In the several embodiments provided in the present application, it should be understood that the disclosed systems and methods can be implemented in other ways. For example, the system embodiments described above are only schematic, for example, the division of units is only a logical function division, and there may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of systems or units, which can be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of hardware plus software functional units.

The above-mentioned integrated unit implemented in the form of a software functional unit can be stored in a computer-readable storage medium. The above-mentioned software functional unit is stored in a storage medium, including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute some steps of the methods of each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk and other media that can store program codes.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional modules is used as an example for illustration. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the system can be divided into different functional modules to complete all or part of the above description. The specific working process of the system described above can refer to the corresponding process in the aforementioned method embodiment, and will not be repeated here.

Those skilled in the art will readily appreciate other embodiments of the present application after considering the specification and practicing the invention disclosed herein. The present application is intended to cover any modification, use or adaptation of the present application, which follows the general principles of the present application and includes common knowledge or customary techniques in the art that are not disclosed in the present application. The specification and examples are intended to be exemplary only, and the true scope and spirit of the present application are indicated by the following claims.

It should be understood that the present application is not limited to the precise structures that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present application is limited only by the appended claims.

Claims

A method for constructing a knowledge base, characterized by comprising:

Acquire multiple pieces of original data, where the modalities of the multiple pieces of original data are at least two of text, image, or video;

For original data among the plurality of original data, determine key text of the original data, determine a key of a key-value pair according to the key text, and determine a value of the key-value pair according to the original data, wherein the key text is used to describe content of the original data;

A question-and-answer knowledge base is constructed based on the key-value pairs corresponding to the multiple pieces of original data.
The method for constructing a knowledge base according to claim 1, wherein determining the key text of the original data comprises:

Converting the raw data into natural language text;

Extract key text from the natural language text to obtain the key text, wherein the key text includes at least one of the content theme, core viewpoint, keywords, and key entities of the original data.
The method for constructing a knowledge base according to claim 2, characterized in that extracting key text from the natural language text to obtain the key text comprises:

The natural language text is input into a unified model to extract key text to obtain the key text.
The method for constructing a knowledge base according to any one of claims 1 to 3, characterized in that determining the value of the key-value pair according to the original data comprises:

Segmenting the original data based on the key text to obtain data segments corresponding to the key text, wherein the modality of the data segments is the same as the modality of the original data;

According to the data segment, a value of the key-value pair is determined.
The method for constructing a knowledge base according to claim 4, wherein the original data is text, and the segmenting of the original data based on the key text to obtain data segments corresponding to the key text includes:

Machine reading comprehension technology is used to extract data segments describing the key text from the original data.
The method for constructing a knowledge base according to any one of claims 1 to 3, characterized in that the step of determining the key of the key-value pair according to the key text comprises:

Expanding the key text to obtain a first inquiry text;

The first query text is determined to be a key of the key-value pair.
The method for constructing a knowledge base according to claim 6, characterized in that after the key text is expanded to obtain the query text, it also includes:

Encoding the first query text to obtain an encoding vector;

The encoding vector is determined to be a key in the key-value pair.
The method for constructing a knowledge base according to any one of claims 1 to 3, further comprising:

Performing data mining on the original data to obtain target data;

generating a second query text for the target data according to the target data;

The target data is determined to be the value of a key-value pair and the second inquiry text is determined to be the key of the key-value pair to construct the question-answer knowledge base.
A data retrieval method, characterized by comprising:

Receiving inquiry data sent by a terminal device, wherein the inquiry data is one of text, image or video;

In a question-and-answer knowledge base, answer data corresponding to the query data is retrieved, wherein the modality of the answer data is one of text, image or video, and the question-and-answer knowledge base is constructed according to the method for constructing a knowledge base according to any one of claims 1 to 8;

The answer data is sent to the terminal device.
The data retrieval method according to claim 9, characterized in that the step of retrieving answer data corresponding to the query data in the question-answer knowledge base comprises:

Determine that a value in a key-value pair of a key determined based on the query data is the answer data;

and/or,

Encoding the query data to obtain a query encoding vector;

In the question-answer knowledge base, determining a target encoding vector having a similarity with the encoding vector greater than a threshold;

In the question-answer knowledge base, it is determined that the value of the key-value pair with the target encoding vector as the key is the answer data.
A data retrieval method, characterized in that it is applied to a terminal device, and the data retrieval method comprises:

Send query data to the server;

Receive answer data of the inquiry data sent by the server, wherein the answer data is determined according to the data retrieval method according to claim 9 or 10.
A device for constructing a knowledge base, characterized by comprising:

An acquisition module, used to acquire multiple pieces of original data, wherein the modalities of the multiple pieces of original data are at least two of text, image or video;

a determination module, configured to determine, for original data among the plurality of original data, a key text of the original data, determine a key of a key-value pair according to the key text, and determine a value of the key-value pair according to the original data, wherein the key text is used to describe the content of the original data;

A construction module is used to construct a question-and-answer knowledge base based on the key-value pairs corresponding to the multiple pieces of original data.
A cloud device, characterized in that it comprises: a processor, a memory, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, it implements the knowledge base construction method as described in any one of claims 1 to 8, and/or the data retrieval method as described in any one of claims 9 to 11.