[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN112988980B - Target product query method and device, computer equipment and storage medium - Google Patents

Target product query method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112988980B
CN112988980B CN202110514110.XA CN202110514110A CN112988980B CN 112988980 B CN112988980 B CN 112988980B CN 202110514110 A CN202110514110 A CN 202110514110A CN 112988980 B CN112988980 B CN 112988980B
Authority
CN
China
Prior art keywords
word
product
standard
input
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110514110.XA
Other languages
Chinese (zh)
Other versions
CN112988980A (en
Inventor
郭巍
李其弢
张婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiping Finance Technology Services Shanghai Co ltd
Original Assignee
Taiping Finance Technology Services Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiping Finance Technology Services Shanghai Co ltd filed Critical Taiping Finance Technology Services Shanghai Co ltd
Priority to CN202110514110.XA priority Critical patent/CN112988980B/en
Publication of CN112988980A publication Critical patent/CN112988980A/en
Application granted granted Critical
Publication of CN112988980B publication Critical patent/CN112988980B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Technology Law (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a target product query method, a target product query device, computer equipment and a storage medium. The method comprises the following steps: receiving input text data; performing word segmentation processing on the text data to obtain an input word list; acquiring a pre-generated core word bank, wherein standard participles corresponding to a product and weights of the standard participles are stored in the core word bank, and the weights are used for measuring the importance degrees of the standard participles at different positions of the product; matching the input word list with the standard participles according to the weight of each standard participle; and acquiring a product corresponding to the successfully matched standard word segmentation as a target product corresponding to the text data. By adopting the method, the query accuracy can be improved.

Description

Target product query method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for querying a target product, a computer device, and a storage medium.
Background
With the development of artificial intelligence technology, natural language technology has emerged, for example, to identify natural language to obtain products corresponding to natural language.
In the traditional technology, for the pdf document of the existing product, products inquired by a user are accurately returned in a form of front-end display by extracting and word segmentation, so that the interaction effect between an enterprise and the user can be improved, and the user can know product information in time.
However, the current product query method generally calculates the levenstein ratio based on the levenstein distance or is a keyword matching algorithm, and such an algorithm generally obtains the ranking by simply depending on the matching degree and does not consider the actual characteristics of the product, thereby resulting in low matching accuracy.
Disclosure of Invention
In view of the above, it is necessary to provide a target product query method, apparatus, computer device and storage medium capable of improving accuracy.
A method of target product querying, the method comprising:
receiving input text data;
performing word segmentation processing on the text data to obtain an input word list;
acquiring a pre-generated core word bank, wherein standard participles corresponding to a product and weights of the standard participles are stored in the core word bank, and the weights are used for measuring the importance degrees of the standard participles at different positions of the product;
matching the input word list with the standard participles according to the weight of each standard participle;
and acquiring a product corresponding to the successfully matched standard word segmentation as a target product corresponding to the text data.
In one embodiment, the matching the input word list and the standard participles according to the weight of each standard participle includes:
reading each word to be processed in the input word list, and obtaining an input matrix corresponding to the input word list according to the word to be processed;
acquiring each standard participle in a core word stock, and inquiring the weight corresponding to the standard participle;
calculating the reverse file frequency corresponding to each standard participle and the weighted word frequency at the corresponding position of the product;
calculating according to the reverse file frequency, the weighted word frequency at the corresponding position of the product and the weight of the standard participle to obtain a word matrix element;
and obtaining a word matrix according to the word matrix elements, and matching an input word list with the standard participles through the word matrix and the input matrix.
In one embodiment, the calculation method of the weighted word frequency includes:
counting the occurrence frequency of each standard word segmentation at the corresponding position of each product;
acquiring the number of weighted entries in each product;
and counting the times of occurrence of each standard participle at the corresponding position of the weighted word frequency product of the corresponding position of each product according to the times and the number of each product.
In one embodiment, the matching the input word list and the standard participles through the word matrix and the input matrix includes:
performing dimensionality reduction on the word matrix to obtain a vector corresponding to each product and a conversion matrix corresponding to an input matrix;
processing the input matrix according to the conversion matrix to obtain a vector corresponding to an input word list;
calculating the target similarity of the vector corresponding to the product and the vector corresponding to the input word list;
and when the target similarity is greater than or equal to a preset value, matching the input word list with the standard participle, otherwise, not matching the input word list with the standard participle.
In one embodiment, the performing the dimension reduction processing on the word matrix to obtain a vector corresponding to each product and a transformation matrix corresponding to the input matrix includes:
performing dimensionality reduction processing on the word matrix to obtain a correlation matrix of standard participles and word senses, a correlation matrix of products and themes and a correlation matrix of the word senses and the themes;
obtaining a vector corresponding to each product according to the correlation matrix of the products and the theme;
and obtaining a conversion matrix corresponding to the input matrix according to the correlation matrix of the standard participle and the word meaning and the correlation matrix of the word meaning and the theme.
In one embodiment, the generating manner of the weight includes:
performing word segmentation processing on the text at the preset position in each product to obtain standard words;
respectively counting the positions of each standard participle in the product;
and calculating according to the position of the standard participle in the product to obtain the weight corresponding to the standard participle.
In one embodiment, the performing word segmentation processing on the text data to obtain an input word list includes:
performing word segmentation on the text data to obtain initial word segmentation;
filtering the initial participles through a preset disabled word bank;
and expanding the filtered initial segmentation words through a preset common word bank to obtain an input word list, wherein the common word bank is a preset personalized dictionary, and the common word bank is a preset personalized dictionary.
In one embodiment, the obtaining, as the target product corresponding to the text data, a product corresponding to the successfully matched standard word segmentation includes:
acquiring an initial product corresponding to the successfully matched standard word segmentation;
searching similar products corresponding to the initial products successfully matched in the core word stock;
and calculating to obtain a target product according to the initial product and the similar product.
In one embodiment, the calculating a target product according to the initial product and the similar product includes:
calculating a first similarity of the similar product and the input word list and a second similarity of the similar product and the initial product;
acquiring similarity distribution weight;
and calculating according to the similarity distribution weight, the first similarity and the second similarity to obtain a target product.
In one embodiment, the calculating a target product according to the similarity distribution weight, the first similarity and the second similarity includes:
calculating according to the similarity distribution weight, the first similarity and the second similarity to obtain a reference similarity and a corresponding target product;
and sequencing the target products according to the reference similarity.
A target product querying device, the device comprising:
the receiving module is used for receiving input text data;
the first word segmentation module is used for carrying out word segmentation processing on the text data to obtain an input word list;
the standard participle acquiring module is used for acquiring a pre-generated core word bank, wherein standard participles corresponding to products and weights of the standard participles are stored in the core word bank, and the weights are used for measuring the importance degrees of the standard participles at different positions of the products;
the matching module is used for matching the input word list with the standard word segmentation according to the weight of each standard word segmentation;
and the target product output module is used for acquiring a product corresponding to the successfully matched standard word segmentation as a target product corresponding to the text data.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
According to the target product query method, the target product query device, the computer equipment and the storage medium, the input word list and the standard participles are matched through the weight of each standard participle, wherein the weight of each standard participle fully considers the characteristics of the product and gives the important programs of each standard participle at different positions of the product, so that the query accuracy can be improved by matching the input word list and the standard participles in combination with the weight of the standard participles, the searched product is enabled to better accord with input text data, and the product query accuracy is improved.
Drawings
FIG. 1 is a diagram of an application environment of a method for querying a target product in one embodiment;
FIG. 2 is a flow diagram that illustrates a method for querying a target product, according to one embodiment;
FIG. 3 is a schematic diagram of a query interface in one embodiment;
FIG. 4 is a flowchart illustrating a method for querying a target product according to another embodiment;
FIG. 5 is a block diagram showing the structure of a target product inquiry apparatus in one embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The target product query method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the core lexicon 104 through a network. The terminal 102 receives input text data, performs word segmentation on the text data to obtain an input word list, matches the input word list with standard words in a core word library according to the weight of each standard word in the core word library to obtain a target product corresponding to the text data, wherein the weight of each standard word weighs the importance degree of each standard word at different positions of the product, the weight of each standard word takes the characteristics of the product into full consideration, the important program of each standard word at different positions of the product is given, and the matching of the input word list and the standard word in combination with the weight of each standard word can improve the query accuracy, so that the searched product is more consistent with the input text data, and the product query accuracy is improved. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the core lexicon 104 may be implemented by an independent database or a database cluster composed of a plurality of databases.
In one embodiment, as shown in fig. 2, a target product query method is provided, which is described by taking the method as an example applied to the terminal in fig. 1, and includes the following steps:
s202: input text data is received.
Specifically, the text data is input to the terminal by the user, for example, the terminal may first display a query page, then the user inputs a product name to be queried in the query input box, clicks a "query" button, returns related insurance product data through a corresponding interface, and jumps to a new interface showing a product profile. In particular, the query interface may be as described with reference to FIG. 3, wherein it may also display keywords corresponding to related popular searches.
S204: and performing word segmentation processing on the text data to obtain an input word list.
Specifically, the input word list is obtained by performing word segmentation processing on the text data, the adopted word segmentation algorithm can be any word segmentation algorithm, and preferably, a bidirectional maximum word segmentation matching algorithm can be adopted for word segmentation.
The input word list can be formed by the terminal firstly performing word segmentation processing on the text data and then performing semantic expansion on words after the word segmentation processing, and in addition, in order to perform normalization and standardization processing on the expanded words, the input word list is stored in a mode of input word list so that more accurate matching results can be obtained when the next matching is performed.
S206: and acquiring a pre-generated core word library, wherein the core word library stores standard participles corresponding to the product and the weight of each standard participle, and the weight is used for measuring the importance degree of the standard participles at different positions of the product.
Specifically, the core word stock is a pre-generated word stock for a target product, for example, when the product is an insurance product, the core word stock is a core word stock for product names and terms of the insurance product, and the specially-designed core word stock can help to improve the operation efficiency and accuracy of the search. Wherein, at the time of the inquiry of the insurance product, the user searches for the purpose of finding the needed insurance product, and the scope of the result is limited in the insurance product. Therefore, it is necessary to establish an insurance product core word library using information in insurance product names and terms as a result library of the item search matching.
The core word bank can be used for firstly carrying out word segmentation processing to obtain corresponding standard word segmentation and then obtaining the weight of the standard word segmentation according to the position of the standard word segmentation. The standard word segmentation acquisition mode comprises the following steps: the method comprises the steps of obtaining preset contents of products, for example, extracting relevant paragraphs and chapters of contents related to underwriting responsibility, disease definition and the like in product names and product insurance clauses, then performing word segmentation processing on the preset contents to obtain standard words, for example, performing word segmentation and stop word removal processing on relevant paragraphs in existing insurance product names and product clauses by using a bidirectional maximum word segmentation matching algorithm in a character string matching method and combining two maintained dictionaries, namely a common word bank and a stop word bank. The weight of the standard participle is that when the preset content is participled, the position of the obtained standard participle is recorded at the same time, the corresponding weight of the standard participle is determined according to different positions of the standard participle in the product, for example, the initial weight corresponding to each position can be preset, and then the weight of the standard participle is obtained according to the appearance position of the standard participle and the initial weight corresponding to each position.
S208: and matching the input word list with the standard participles according to the weight of each standard participle.
In particular, in the project practical application, in addition to the frequency with which a search term is risked in insurance product clauses, the position where the search term appears is important, and a term is obviously more important than in disease interpretation if it appears in the title or disease name of an insurance product. Therefore, in order to improve the matching accuracy, the input word list and the standard participles may be matched with the weights of the standard participles, for example, a vector corresponding to the input word list is calculated first, then a product vector corresponding to each product is calculated according to the weights of the standard participles, and finally the vector corresponding to the input word list is matched with the vector of the product. In such a matching manner, the same word segmentation is respectively presented in the product names and the product clauses of different products, and then the product presented in the product name is selected as the final matched product.
S210: and acquiring a product corresponding to the successfully matched standard word segmentation as a target product corresponding to the text data.
Specifically, since the matching sufficiently takes the weight of the standard participle into consideration, the obtained weight of the matched product is a weight matching value, and the terminal can finally determine the matched product according to the size of the weight matching value, for example, select the largest product, or select the first few products, or select the product with the weight matching value greater than the preset value as the target product. And optionally, the terminal may sort the selected products according to the weight matching values, so that the most matched product is located at the top, so as to be convenient for the user to view.
Preferably, when the product is displayed, the text data input by the user can be specially displayed in the displayed product, for example, the text data is displayed in a highlight mode, so that the visual effect of the user is improved, and the content inquired by the user can be quickly found.
According to the target product query method, the input word list and the standard participles are matched through the weight of each standard participle, wherein the weight of each standard participle fully considers the characteristics of the product and gives the importance degree of each standard participle at different positions of the product, so that the query accuracy can be improved by matching the input word list and the standard participles in combination with the weight of the standard participles, the searched product can better accord with input text data, and the product query accuracy is improved.
In one embodiment, matching the input vocabulary with the standard participles according to the weights of the standard participles comprises: reading each word to be processed in the input word list, and obtaining an input matrix corresponding to the input word list according to the word to be processed; acquiring each standard participle in a core word stock, and inquiring the weight corresponding to the standard participle; calculating the reverse file frequency corresponding to each standard word and the weighted word frequency at the corresponding position of the product; calculating according to the reverse file frequency, the weighted word frequency at the corresponding position of the product and the weight of the standard participle to obtain a word matrix element; and obtaining a word matrix according to the word matrix elements, and matching the input word list with the standard word segmentation through the word matrix and the input matrix.
Specifically, the input word list may include one or more to-be-processed participles, and the terminal may read the participles in the input word list in parallel and then perform processing, so as to improve efficiency. Secondly, the terminal obtains the standard participles in the core word stock and reads the weight of each standard participle.
Reverse file frequency is a measure of the general importance of a word. The inverse document frequency for a particular standard participle may be obtained by dividing the total number of documents by the number of documents containing the standard participle, and taking the resulting quotient to be a base-10 logarithm.
Word frequency refers to the frequency with which a given word appears in the document. The weighted word frequency is obtained by taking the weight of each standard participle in a product into consideration, carrying out weighting according to the weight of the standard participle and the number of the standard participles to obtain the number of all entries, and calculating through the ratio of the standard participles to the obtained entry data. Preferably, the calculation method of the weighted word frequency includes: counting the occurrence frequency of each standard word in the corresponding position of each product; acquiring the number of weighted entries in each product; and counting the times of occurrence of each standard word in the corresponding position of the weighted word frequency product of the corresponding position of each product according to the times and the number of the standard word in each product.
The word matrix is a matrix related to the product, the word matrix elements can be obtained by calculating the inverse file frequency of the standard word segmentation, the weighted word frequency at the corresponding position of the product and the weight of the standard word segmentation, and a complete word matrix is obtained after all the word matrix elements are calculated.
The following is a practical example: in the embodiment, the input word list and the core word stock are subjected to density processing by an SVD method, so that a singular value matrix which is more convenient for similarity calculation is obtained. And then, the cosine similarity is used for matching the vector corresponding to the input word list with the product vector of each product in the core word stock to obtain a product list. In the process, the matching accuracy of the input content and the product is further improved by improving the forming method of the word matrix in the SVD method.
Specifically, for the convenience of understanding, the original SVD algorithm is introduced first, and the core formula of the SVD method is
Figure 152277DEST_PATH_IMAGE001
. Where A is a matrix of words about the product. k is the number of topics we assume, which is generally less than the number of products, n is the number of products, and m is the number of words in the core lexicon. U shapeilAnd corresponding to the correlation degree of the ith word and the ith word sense, wherein the value of i is an integer from 0 to m, and the value of l is an integer from n to k. VjpCorresponding to the correlation degree of the jth product and the pth theme, the value of j is an integer from 0 to k, and the value of p is an integer from 0 to n. SigmalpCorresponding to the correlation matrix of the ith word sense and the pth theme, the value of l is an integer from 0 to k, sigmalpAlso known as singular value matrix, is the core of the dimensionality reduction of matrix a.
A common configuration method of the matrix a is a TF-IDF (term frequency-inverse document frequency) method. The method only considers the dimensionality of the word frequency, and the constitutive formula is as follows:
Figure 888152DEST_PATH_IMAGE002
wherein,
Figure 385254DEST_PATH_IMAGE003
and the ith row and the jth column of the matrix A are represented, and the corresponding TF-IDF coefficients of the ith word in the jth product in the core word library are represented.
In the embodiment, in order to more accurately match the user input to the search result, when constructing the matrix a, the position of the word occurrence is considered, and a dynamic weight is introduced on the basis of the TF-IDF construction method, specifically:
Figure 788554DEST_PATH_IMAGE004
wherein
Figure 640972DEST_PATH_IMAGE005
Figure 129722DEST_PATH_IMAGE006
For standard word-segmentation, i.e. entries
Figure 866734DEST_PATH_IMAGE007
The corresponding weight of the weight is set to be,
Figure 983595DEST_PATH_IMAGE008
is a word matrix element, the new word matrix is
Figure 413439DEST_PATH_IMAGE009
Specifically, taking product X, product Y, and product Z as examples, when a user inputs a keyword O in a search box, a word matrix B is constructed, and then an input word list and standard participles are matched according to the word matrix and the input matrix.
In the embodiment, the weighted word frequency is introduced, so that the important programs of the standard participles at different positions are fully considered, and the accuracy of product matching is improved.
In one embodiment, the method of matching the input word list and the standard participles through the word matrix and the input matrix, that is, the step of matching the input word list and the standard participles through the word matrix and the input matrix, includes: performing dimensionality reduction on the word matrix to obtain a vector corresponding to each product and a conversion matrix corresponding to the input matrix; processing the input matrix according to the conversion matrix to obtain a vector corresponding to the input word list; calculating the target similarity of the vector corresponding to the product and the vector corresponding to the input word list; and when the target similarity is more than or equal to a preset value, matching the input word list with the standard participles, otherwise, not matching the input word list with the standard participles.
Specifically, the terminal first performs dimension reduction on the word matrix to obtain a vector corresponding to each product and a conversion matrix corresponding to the input matrix, and preferably, the dimension reduction may include performing dimension reduction on the word matrix to obtain a correlation matrix U of standard participles and word sensesilProduct and subject correlation matrix VjpAnd a correlation matrix Σ between word senses and topicslp(ii) a According to the correlation matrix V of the product and the subjectjpObtaining a vector corresponding to each product; according to the correlation matrix U of the standard participle and the word meaningilAnd a correlation matrix Σ between word senses and topicslpAnd obtaining a conversion matrix corresponding to the input word list.
Wherein the correlation matrix V of the product and the subjectjpEach column of the transposed matrix of (a) characterizes a product, such that each column constitutes a product vector for a product. The input word list is related to the matrix U of the standard participles and word sensesilAnd a correlation matrix Σ between word senses and topicslpThe correlation can be specifically obtained by the following formula:
Figure 186223DEST_PATH_IMAGE010
and finally, after the vector corresponding to the product and the vector corresponding to the input word list are obtained, matching the input word list and the standard word segmentation can be carried out by calculating the target similarity of the vector corresponding to the product and the vector corresponding to the input word list, so that a corresponding target product is obtained.
Specifically, in this embodiment, a cosine similarity method may be used to calculate the similarity between two texts. Cosine similarity is evaluated by calculating the cosine value of the included angle between two vectors, and the similarity between the two vectors is generally larger when the calculated numerical value is larger. And finally, obtaining a target product according to the cosine similarity calculated according to a preset rule.
In the above embodiment, in the preprocessing of the matching process, a plurality of dimensions are considered simultaneously: in practical applications of projects, in addition to the frequency with which search terms are risked in the terms of a risky product, the location where the search terms appear is also important: a word is clearly more important if it appears in the title or disease name of an insurance product than in the disease explanation. Therefore, the embodiment considers two dimensions of the occurrence frequency and the occurrence position simultaneously when constructing the co-occurrence matrix, and helps to improve the accuracy of the search result.
In one embodiment, the generating manner of the weight includes: performing word segmentation processing on the text at the preset position in each product to obtain standard words; respectively counting the position of each standard word in the product; and calculating according to the positions of the standard participles in the product to obtain the weight corresponding to the standard participles.
Specifically, the weight corresponding to the standard word segmentation may be generated in advance, where the preset position of the product may be predetermined, and in the insurance product, the position where the word appears may be divided into 3 cases, which are defined as 3 dimensions, and are respectively: in the product name, in the disease name of the product clause, in the disease explanation of the product clause. Recording the position of a word in a 3-dimensional vector
Figure 726926DEST_PATH_IMAGE011
In (1), this vector is referred to as the dynamic weight of the word. Wherein,
if the word appears in the product name, then a1=3, otherwise a1=0;
If the word appears in the disease name of the product clause, then a2=3, otherwise a2=0;
If it occurs in the interpretation of the disease of the product clause, then a3=1, otherwise a3=0。
When the terminal generates the weight, the position of each standard word in the product is respectively counted; calculating the weight corresponding to the standard participle according to the position where the standard participle appears in the product, for example, taking "product X" as an example, extracting paragraphs related to disease explanations such as "X1" of the product, sections related to insurance responsibility explanations in insurance clauses, and "X2", "X3", "X4", "X5" in clauses, and performing participle, specifically, using a bidirectional maximum participle matching algorithm to participle, and then after eliminating stop words, the remaining 3 words form a component of a core lexicon, which is shown as follows:
core word stock (part)
Figure 839238DEST_PATH_IMAGE012
For example, the word "P1" appears in the product name and only appears in the product name of the product, with a dynamic weight of the word
Figure 174405DEST_PATH_IMAGE013
. Further, as with the term "P2," it appears in the disease name of the clause, and also in the disease interpretation, so that the dynamic weight is
Figure 260916DEST_PATH_IMAGE014
. Similarly, a core lexicon with weights can be obtained:
Figure 339731DEST_PATH_IMAGE015
the above-mentioned process of performing word segmentation may be the same as the process of performing word segmentation on the text data in the following, and specifically, the process of performing word segmentation on the text data in the following may be referred to.
In the embodiment, the position and the frequency of the standard participle are fully considered to calculate the weight corresponding to the standard participle, so that a foundation is laid for the accuracy of subsequent matching.
In one embodiment, the obtaining of the input word list by performing word segmentation processing on the text data includes: performing word segmentation on the text data to obtain initial word segmentation; filtering the initial segmentation words through a preset disabled word bank; and expanding the filtered initial segmentation words through a preset common word bank to obtain an input word list, wherein the common word bank is a preset dictionary subjected to personalized processing.
Specifically, the preset disabled word stock and the common word stock are personalized and maintained according to the characteristics of the insurance products. The method helps to more accurately segment words in the later steps and obtain more ideal segmentation results.
The common lexicon and the inactive lexicon are two existing dictionaries. The dead word library contains words in natural language that are not much helpful to semantic understanding, such as partial conjunctions and pronouns. The word stock formed by the residual words after the stop words are removed is the common word stock. In this embodiment, in order to make the common word stock and the disabled word stock more suitable for the actual situation of the item, before using the word stocks, personalized maintenance is performed on the two word stocks. Mainly by the following 3 ways: internal manual maintenance, i.e., manually adding insurance, disease-related words to the common thesaurus and the disuse thesaurus by the project personnel. The content of the external website is updated regularly, namely information is periodically captured from websites such as a bank protection monitoring authority website and a group official website, and some new words are supplemented into the two word banks. The search result is fed back for self-updating, namely, after the search behavior of the user occurs, the word bank is perfected and updated according to the feedback condition of the user to the search result.
Figure 634446DEST_PATH_IMAGE016
Figure 140513DEST_PATH_IMAGE017
In the above embodiment, the two word banks are maintained in the first two ways, and relevant words such as insurance, medical treatment, and disease are supplemented to the two word banks. And then, the word stock is updated in real time in the using process, so that the accuracy of the disabled word stock and the commonly used word stock is ensured.
When the terminal performs word segmentation on the text data, the terminal performs word segmentation through a preset word segmentation algorithm, for example, a bidirectional maximum word segmentation matching algorithm, to obtain initial words, then performs filtering processing on the initial words through a preset disabled word bank, namely, eliminates the disabled words, so as to obtain a word list preliminarily, and then performs expansion processing on the initial words after the filtering processing through a preset common word bank to obtain an input word list.
Specifically, based on the maintained common word bank and the disabled word bank, the input content is segmented according to a bidirectional maximum segmentation matching algorithm, and the disabled words are removed, so that a word list input by the user is obtained preliminarily. The segmented words are expanded using a Topic Model (Topic Model) that is used for semantic analysis and text mining problems in natural language processing. And finding out words with similar semantics through the topic model, and supplementing the words into the input words to form a final user input word list. Facilitating more accurate inquiry of insurance products.
If the user inputs "heart disease", the segmentation result using the two-way maximum segmentation matching algorithm is "heart disease", "heart/disease". And finding out words with similar meanings from a common word bank by using a topic model according to the word segmentation result, wherein the core principle of the topic model applied to the method is the same as that of the SVD method below, and the method is not expanded in detail. Such as myocarditis, myocardial infarction, coronary heart disease, stent, hypertension and the like, and the words are supplemented into the input word list to form the final input word list.
Figure 356731DEST_PATH_IMAGE018
In the above embodiment, the common word stock and the disabled word stock are personalized and maintained for the features of the insurance product. And establishing an insurance product core word bank by using information in the insurance product names and terms, so that the content input by the user is segmented and then expanded according to the semantic meaning of words to form an input word list, namely, in a search scene of the project application, the format of the content such as the name of a disease to be searched, the product name and the like input by the user in a search box is variable, the words are inaccurate, and the input is not a standard name generally. In this case, it is necessary to understand the vocabulary input by the user, complement and correct the input information. In order to improve the accuracy of searching, after the word segmentation is finished on the input content of the user, a common word bank is further combined, expansion is carried out according to the word meaning similarity, words with similar meanings are supplemented, an input word list is formed, and therefore the accuracy of subsequent matching is guaranteed.
In one embodiment, acquiring a product corresponding to the successfully matched standard word segmentation as a target product corresponding to the text data includes: acquiring an initial product corresponding to the successfully matched standard word segmentation; searching similar products corresponding to the initial products successfully matched in a core word bank; and calculating according to the initial product and the similar product to obtain the target product.
Specifically, the initial product is obtained by matching the keywords and the weights thereof, and after the initial product is obtained by calculation, the search result can be supplemented by a content recommendation algorithm to expand the range of the product so as to better meet the user requirements, for example, similar products matched with the initial product are calculated by using a recommendation system algorithm based on SVD.
Preferably, the target product is calculated according to the initial product and the similar products, and comprises the following steps: calculating a first similarity between the similar product and the input word list and a second similarity between the similar product and the initial product; acquiring similarity distribution weight; and calculating according to the similarity distribution weight, the first similarity and the second similarity to obtain the target product.
Specifically, the terminal may first: calculating the first similarity between the similar products and the input word list, namely the similarity obtained in the above, then using the SVD recommendation system algorithm to find the products with greater similarity to the products in the whole insurance product library, and supplementing the products into the search result list, wherein the terminal can also set the similarity distribution weight, namely comprehensively considering the similarity between the user input word list and the products (the first similarity matching weight, for example, 80%) and the similarity between the products (the second similarity matching weight, for example, 20%), performing secondary sorting, and adjusting the presentation order of the search result list. The overall similarity measure performs best, and may be at a forward position in the list of results presentation, so that relatively more relevant search results are presented to the user, further improving the accuracy of the search results.
Preferably, the target product is obtained by calculation according to the similarity distribution weight, the first similarity and the second similarity, and the method includes: calculating according to the similarity distribution weight, the first similarity and the second similarity to obtain a reference similarity and a corresponding target product; and sequencing the target products according to the reference similarity.
Specifically, if the user matches the product "X" by searching for "diabetes", a similar product is obtained according to the SVD recommendation algorithm: and (5) obtaining a product Y.
The terminal firstly supplements the product Y to the search result, namely adds the product Y to the target product, then carries out weighted average calculation according to the similarity between the input word list and the product and the similarity between the products to obtain the comprehensive similarity of each target product, and displays the target products according to the comprehensive similarity, namely, the products with higher comprehensive similarity are positioned at the front position when the search result is displayed
In the above embodiment, in combination with a content-based recommendation algorithm, search results are supplemented and secondarily ranked by associated recommendations of similar products: considering that the core word library as a matching result is relatively small, a result obtained by merely searching for a match may not satisfy a real demand of a user. Therefore, a recommendation algorithm based on content in the recommendation system is introduced, and after the preliminary matching is completed, similar products are searched by using the recommendation algorithm and are supplemented to the search results. And comprehensively considering two dimensions of the similarity between the product and the input content and the similarity between the product and the product, comprehensively sequencing the products in the search result again, and finally presenting the products to the user in a list form as a return value of the search result.
Specifically, referring to fig. 4, fig. 4 is a flowchart of a target product query method in another embodiment, where in this embodiment, the target product query method mainly includes the following steps:
s402: and personalized maintenance is carried out on the existing common word stock and the inactive word stock.
S404: and segmenting words of the insurance product name in the project and related paragraphs in the product insurance clauses according to the maintained inactive word bank and the commonly used word bank, and constructing a core word bank of the insurance product clauses.
S406: and performing word segmentation on the input text data according to the disabled word bank and the common word bank and combining semantic expansion to form an input word list.
With reference to fig. 3, the user inputs text data in the search box, so that the terminal obtains an input word list by calculation according to the text data input by the user.
S408: the Singular Value Decomposition (SVD) method is used to decompose a keyword Co-occurrence Matrix (Co-occurrence Matrix) composed of the input word list and the core word stock to obtain a Singular Value Matrix.
S410: and calculating the similarity of the text in the singular value matrix obtained in the step by using a cosine similarity method, matching the text data with the related insurance products according to the similarity, and keeping the matching result with higher similarity.
S412: and (4) performing product similarity analysis on the insurance products recorded by the system by using a recommendation algorithm based on content, finding out products with high association degree with the insurance products obtained in the previous step, and supplementing the search result.
S414: and (4) integrating the similarity between the products and the input content and the similarity between the products to perform secondary sequencing adjustment.
Specifically, the keyword query result interface shows related insurance products, including product names and brief descriptions thereof, and if one of the products is clicked, detailed information of the product is displayed. Taking the product underwriting disease detail page as an example, the page includes: information such as disease name, disease type and definition, disease details and the like, and the text data input by the user are related, the text data is highlighted in red, so that the visual effect of the user is improved, and the content inquired by the user can be quickly found. In addition, the user can click on a disease detail page to view definition and detailed explanation of the disease, and the user is helped to further know the disease details deeply. In addition, on the product underwriting disease detail page, the user can input the disease name in the search bar, the terminal inquires whether the product has the corresponding disease, if so, the disease information related to the searched disease is displayed, and in addition, the detail of the disease detail type and definition can be displayed. Similarly, for the insurance clause pdf file of the insurance product, the user can also query and view the insurance clause pdf file by clicking. The method realizes the search of the user on the insurance product underwriting details, provides convenience for the user to know the product underwriting details, and helps the user to know the meaning and range of the underwriting diseases of the product.
The embodiment breaks through the traditional one-to-one mode of keyword matching calculation, and realizes many-to-many 'multidimensional' calculation; when the data which the user wants to inquire is obtained, the data are preprocessed, then the processing results are respectively matched with the relevant data of the multi-dimensional insurance product, weighted summation and ranking are carried out, the full understanding of the information input by the user is realized, the search target of the user is accurately matched with the item information of the insurance product, and the user is helped to know the insurance acceptance details of the insurance product.
It should be understood that although the steps in the flowcharts of fig. 2 and 4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 4 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 5, there is provided a target product querying device, including: the system comprises a receiving module 100, a first segmentation module 200, a standard segmentation obtaining module 300, a matching module 400 and a target product output module 500, wherein:
a receiving module 100 for receiving input text data;
the first word segmentation module 200 is configured to perform word segmentation processing on the text data to obtain an input word list;
the standard participle acquiring module 300 is configured to acquire a pre-generated core word library, where standard participles corresponding to a product and weights of the standard participles are stored in the core word library, and the weights are used to measure importance degrees of the standard participles at different positions of the product;
a matching module 400, configured to match the input word list with the standard participles according to weights of the standard participles;
and the target product output module 500 is configured to obtain a product corresponding to the successfully matched standard word segmentation as a target product corresponding to the text data.
In one embodiment, the matching module 400 includes:
the reading unit is used for reading each word to be processed in the input word list and obtaining an input matrix corresponding to the input word list according to the word to be processed;
the weight query unit is used for acquiring each standard participle in the core word bank and querying the weight corresponding to the standard participle;
the first calculating unit is used for calculating the reverse file frequency corresponding to each standard word and the weighted word frequency at the corresponding position of the product;
the second calculation unit is used for calculating according to the reverse file frequency, the weighted word frequency at the corresponding position of the product and the weight of the standard participle to obtain a word matrix element;
and the matching unit is used for obtaining a word matrix according to the word matrix elements and matching the input word list with the standard word segmentation through the word matrix and the input matrix.
In one embodiment, the first calculating unit includes:
the counting subunit is used for counting the occurrence frequency of each standard word at the corresponding position of each product;
the entry number acquiring subunit is used for acquiring the number of the weighted entries in each product;
and the weighted word frequency calculating subunit is used for counting the times of occurrence of each standard word at the corresponding position of the weighted word frequency product of the corresponding position of each product according to the times and the number calculation of each standard word at each product.
In one embodiment, the matching unit includes:
the dimension reduction subunit is used for carrying out dimension reduction processing on the word matrix to obtain a vector corresponding to each product and a conversion matrix corresponding to the input matrix;
the target similarity calculation operator unit is used for processing the input matrix according to the conversion matrix to obtain a vector corresponding to the input word list and calculating the target similarity of the vector corresponding to the product and the vector corresponding to the input word list;
and the matching subunit is used for matching the input word list with the standard participle when the target similarity is greater than or equal to a preset value, and otherwise, not matching the input word list with the standard participle.
In one embodiment, the dimension reduction subunit includes:
the matrix splitting and sun-hanging unit is used for performing dimensionality reduction processing on the word matrix to obtain a correlation matrix of the standard participle and the word meaning, a correlation matrix of the product and the theme and a correlation matrix of the word meaning and the theme;
the product vector generation grandchild unit is used for obtaining a vector corresponding to each product according to the correlation matrix of the products and the topics;
and the word list vector generation grandchild unit is used for obtaining a conversion matrix corresponding to the input matrix according to the correlation matrix of the standard participle and the word meaning and the correlation matrix of the word meaning and the theme.
In one embodiment, the target product query device may further include
The second word segmentation module is used for carrying out word segmentation processing on the text at the preset position in each product to obtain standard words;
the position counting module is used for respectively counting the positions of the standard participles appearing in the product;
and the weight calculation module is used for calculating the weight corresponding to the standard participle according to the position of the standard participle in the product.
In one embodiment, the first segmentation module 200 includes:
the initial word segmentation unit is used for segmenting the text data to obtain initial words;
the filtering unit is used for filtering the initial participles through a preset stop word bank;
and the expansion unit is used for expanding the filtered initial segmentation words through a preset common word bank to obtain an input word list, wherein the common word bank is a preset dictionary subjected to personalized processing.
In one embodiment, the target product output module 500 includes:
the initial product acquisition unit is used for acquiring an initial product corresponding to the successfully matched standard word segmentation;
the product matching unit is used for inquiring similar products corresponding to the initial products successfully matched in the core word stock;
and the target product output unit is used for calculating to obtain a target product according to the initial product and the similar product.
In one embodiment, the product matching unit includes:
the third calculating unit is used for calculating the first similarity between the similar product and the input word list and the second similarity between the similar product and the initial product;
a distribution weight acquisition unit for acquiring a similarity distribution weight;
and the fourth calculating unit is used for calculating the target product according to the similarity distribution weight, the first similarity and the second similarity.
In one embodiment, the fourth calculating unit includes:
the reference similarity operator unit is used for calculating and obtaining reference similarity and a corresponding target product according to the similarity distribution weight, the first similarity and the second similarity;
and the sequencing subunit is used for sequencing the target products according to the reference similarity.
For the specific definition of the target product querying device, reference may be made to the above definition of the target product querying method, which is not described herein again. The modules in the target product querying device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a target product query method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: receiving input text data; performing word segmentation processing on the text data to obtain an input word list; acquiring a pre-generated core word library, wherein standard participles corresponding to products and weights of the standard participles are stored in the core word library, and the weights are used for measuring the importance degrees of the standard participles at different positions of the products; matching the input word list with the standard participles according to the weight of each standard participle; and acquiring a product corresponding to the successfully matched standard word segmentation as a target product corresponding to the text data.
In one embodiment, the matching of the input vocabulary and the standard participles according to the weights of the respective standard participles, as implemented by the processor when executing the computer program, comprises: reading each word to be processed in the input word list, and obtaining an input matrix corresponding to the input word list according to the word to be processed; acquiring each standard participle in a core word stock, and inquiring the weight corresponding to the standard participle; calculating the reverse file frequency corresponding to each standard word and the weighted word frequency at the corresponding position of the product; calculating according to the reverse file frequency, the weighted word frequency at the corresponding position of the product and the weight of the standard participle to obtain a word matrix element; and obtaining a word matrix according to the word matrix elements, and matching the input word list with the standard word segmentation through the word matrix and the input matrix.
In one embodiment, the weighted word frequency involved in the execution of the computer program by the processor is calculated by: counting the occurrence frequency of each standard word in the corresponding position of each product; acquiring the number of weighted entries in each product; and counting the times of occurrence of each standard word in the corresponding position of the weighted word frequency product of the corresponding position of each product according to the times and the number of the standard word in each product.
In one embodiment, the obtaining of the word matrix from the word matrix elements and the matching of the input word list and the standard participles by the word matrix and the input matrix, which are performed when the processor executes the computer program, comprises: performing dimensionality reduction on the word matrix to obtain a vector corresponding to each product and a conversion matrix corresponding to the input matrix; processing the input matrix according to the conversion matrix to obtain a vector corresponding to the input word list, and calculating the target similarity of the vector corresponding to the product and the vector corresponding to the input word list; and when the target similarity is more than or equal to a preset value, matching the input word list with the standard participles, otherwise, not matching the input word list with the standard participles.
In one embodiment, the performing, by a processor executing a computer program, a dimension reduction process on a word matrix to obtain a vector corresponding to each product and a transformation matrix corresponding to an input matrix includes: performing dimensionality reduction processing on the word matrix to obtain a correlation matrix of the standard participle and the word meaning, a correlation matrix of the product and the theme and a correlation matrix of the word meaning and the theme; obtaining a vector corresponding to each product according to the correlation matrix of the products and the theme; and obtaining a conversion matrix corresponding to the input matrix according to the correlation matrix of the standard participle and the word meaning and the correlation matrix of the word meaning and the theme.
In one embodiment, the weights involved in the execution of the computer program by the processor are generated in a manner comprising: performing word segmentation processing on the text at the preset position in each product to obtain standard words; respectively counting the position of each standard word in the product; and calculating according to the positions of the standard participles in the product to obtain the weight corresponding to the standard participles.
In one embodiment, the word segmentation processing of the text data to obtain the input word list, which is implemented when the processor executes the computer program, includes: performing word segmentation on the text data to obtain initial word segmentation; filtering the initial segmentation words through a preset disabled word bank; and expanding the filtered initial segmentation words through a preset common word bank to obtain an input word list, wherein the common word bank is a preset dictionary subjected to personalized processing.
In one embodiment, the obtaining of a product corresponding to a successfully matched standard word segmentation as a target product corresponding to text data, which is realized when a processor executes a computer program, includes: acquiring an initial product corresponding to the successfully matched standard word segmentation; searching similar products corresponding to the initial products successfully matched in a core word bank; and calculating according to the initial product and the similar product to obtain the target product.
In one embodiment, the calculation of the target product from the initial product and the similar product, which is performed when the processor executes the computer program, includes: calculating a first similarity between the similar product and the input word list and a second similarity between the similar product and the initial product; acquiring similarity distribution weight; and calculating according to the similarity distribution weight, the first similarity and the second similarity to obtain the target product.
In one embodiment, the calculating the target product according to the weight assigned by the similarity, the first similarity and the second similarity when the processor executes the computer program includes: calculating according to the similarity distribution weight, the first similarity and the second similarity to obtain a reference similarity and a corresponding target product; and sequencing the target products according to the reference similarity.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: receiving input text data; performing word segmentation processing on the text data to obtain an input word list; acquiring a pre-generated core word library, wherein standard participles corresponding to products and weights of the standard participles are stored in the core word library, and the weights are used for measuring the importance degrees of the standard participles at different positions of the products; matching the input word list with the standard participles according to the weight of each standard participle; and acquiring a product corresponding to the successfully matched standard word segmentation as a target product corresponding to the text data.
In one embodiment, the matching of the input vocabulary and the standard participles according to the weights of the respective standard participles, as implemented by the computer program when executed by the processor, comprises: reading each word to be processed in the input word list, and obtaining an input matrix corresponding to the input word list according to the word to be processed; acquiring each standard participle in a core word stock, and inquiring the weight corresponding to the standard participle; calculating the reverse file frequency corresponding to each standard word and the weighted word frequency at the corresponding position of the product; calculating according to the reverse file frequency, the weighted word frequency at the corresponding position of the product and the weight of the standard participle to obtain a word matrix element; and obtaining a word matrix according to the word matrix elements, and matching the input word list with the standard word segmentation through the word matrix and the input matrix.
In one embodiment, the weighted word frequency involved in the execution of the computer program by the processor is calculated by: counting the occurrence frequency of each standard word in the corresponding position of each product; acquiring the number of weighted entries in each product; and counting the times of occurrence of each standard word in the corresponding position of the weighted word frequency product of the corresponding position of each product according to the times and the number of the standard word in each product.
In one embodiment, a computer program implemented when executed by a processor to obtain a word matrix from word matrix elements and match an input vocabulary with standard participles via the word matrix and an input matrix comprises: performing dimensionality reduction on the word matrix to obtain a vector corresponding to each product and a conversion matrix corresponding to the input matrix; processing the input matrix according to the conversion matrix to obtain a vector corresponding to the input word list, and calculating the target similarity of the vector corresponding to the product and the vector corresponding to the input word list; and when the target similarity is more than or equal to a preset value, matching the input word list with the standard participles, otherwise, not matching the input word list with the standard participles.
In one embodiment, the performing, by a processor, a dimension reduction process on a word matrix to obtain a vector corresponding to each product and a transformation matrix corresponding to an input matrix includes: performing dimensionality reduction processing on the word matrix to obtain a correlation matrix of the standard participle and the word meaning, a correlation matrix of the product and the theme and a correlation matrix of the word meaning and the theme; obtaining a vector corresponding to each product according to the correlation matrix of the products and the theme; and obtaining a conversion matrix corresponding to the input matrix according to the correlation matrix of the standard participle and the word meaning and the correlation matrix of the word meaning and the theme.
In one embodiment, the weights involved in the execution of the computer program by the processor are generated in a manner comprising: performing word segmentation processing on the text at the preset position in each product to obtain standard words; respectively counting the position of each standard word in the product; and calculating according to the positions of the standard participles in the product to obtain the weight corresponding to the standard participles.
In one embodiment, the computer program, when executed by a processor, performs word segmentation on text data to obtain an input word list, comprising: performing word segmentation on the text data to obtain initial word segmentation; filtering the initial segmentation words through a preset disabled word bank; and expanding the filtered initial segmentation words through a preset common word bank to obtain an input word list, wherein the common word bank is a preset dictionary subjected to personalized processing.
In one embodiment, the obtaining of a product corresponding to a successfully matched standard word segmentation as a target product corresponding to text data, which is implemented when the computer program is executed by the processor, includes: acquiring an initial product corresponding to the successfully matched standard word segmentation; searching similar products corresponding to the initial products successfully matched in a core word bank; and calculating according to the initial product and the similar product to obtain the target product.
In one embodiment, the computer program when executed by a processor implements a calculation of a target product from an initial product and a similar product, comprising: calculating a first similarity between the similar product and the input word list and a second similarity between the similar product and the initial product; acquiring similarity distribution weight; and calculating according to the similarity distribution weight, the first similarity and the second similarity to obtain the target product.
In one embodiment, the calculation of the target product according to the weight assigned by the similarity, the first similarity and the second similarity when the computer program is executed by the processor includes: calculating according to the similarity distribution weight, the first similarity and the second similarity to obtain a reference similarity and a corresponding target product; and sequencing the target products according to the reference similarity.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (12)

1. A method for querying a target product, the method comprising:
receiving input text data;
performing word segmentation processing on the text data to obtain an input word list;
acquiring a pre-generated core word bank, wherein standard participles corresponding to a product and weights of the standard participles are stored in the core word bank, and the weights are used for measuring the importance degrees of the standard participles at different positions of the product;
matching the input word list with the standard participles according to the weight of each standard participle;
obtaining a product corresponding to the successfully matched standard word segmentation as a target product corresponding to the text data;
the matching the input word list and the standard participles according to the weight of each standard participle comprises the following steps:
reading each word to be processed in the input word list, and obtaining an input matrix corresponding to the input word list according to the word to be processed;
acquiring each standard participle in a core word stock, and inquiring the weight corresponding to the standard participle;
calculating the reverse file frequency corresponding to each standard participle and the weighted word frequency at the corresponding position of the product;
calculating according to the reverse file frequency, the weighted word frequency at the corresponding position of the product and the weight of the standard participle to obtain a word matrix element;
and obtaining a word matrix according to the word matrix elements, and matching an input word list with the standard participles through the word matrix and the input matrix.
2. The method of claim 1, wherein the weighted word frequency is calculated by:
counting the occurrence frequency of each standard word segmentation at the corresponding position of each product;
acquiring the number of weighted entries in each product;
and calculating according to the times and the number to obtain the weighted word frequency of each standard word at the corresponding position of each product.
3. The method of claim 1, wherein matching the input word list with the standard participles via the word matrix and the input matrix comprises:
performing dimensionality reduction on the word matrix to obtain a vector corresponding to each product and a conversion matrix corresponding to an input matrix;
processing the input matrix according to the conversion matrix to obtain a vector corresponding to an input word list;
calculating the target similarity of the vector corresponding to the product and the vector corresponding to the input word list;
and when the target similarity is greater than or equal to a preset value, matching the input word list with the standard participle, otherwise, not matching the input word list with the standard participle.
4. The method of claim 3, wherein the performing the dimension reduction on the word matrix to obtain a vector corresponding to each product and a transformation matrix corresponding to an input matrix comprises:
performing dimensionality reduction processing on the word matrix to obtain a correlation matrix of standard participles and word senses, a correlation matrix of products and themes and a correlation matrix of the word senses and the themes;
obtaining a vector corresponding to each product according to the correlation matrix of the products and the theme;
and obtaining a conversion matrix corresponding to the input matrix according to the correlation matrix of the standard participle and the word meaning and the correlation matrix of the word meaning and the theme.
5. The method according to any one of claims 1 to 4, wherein the weights are generated in a manner that includes:
performing word segmentation processing on the text at the preset position in each product to obtain standard words;
respectively counting the positions of each standard participle in the product;
and calculating according to the position of the standard participle in the product to obtain the weight corresponding to the standard participle.
6. The method of claim 1, wherein the performing a word segmentation process on the text data to obtain an input word list comprises:
performing word segmentation on the text data to obtain initial word segmentation;
filtering the initial participles through a preset disabled word bank;
and expanding the filtered initial segmentation words through a preset common word bank to obtain an input word list, wherein the common word bank is a preset dictionary subjected to personalized processing.
7. The method according to claim 1, wherein the obtaining of the product corresponding to the successfully matched standard word segmentation as the target product corresponding to the text data comprises:
acquiring an initial product corresponding to the successfully matched standard word segmentation;
searching similar products corresponding to the initial products successfully matched in the core word stock;
and calculating to obtain a target product according to the initial product and the similar product.
8. The method of claim 7, wherein calculating a target product from the initial product and the similar product comprises:
calculating a first similarity of the similar product and the input word list and a second similarity of the similar product and the initial product;
acquiring similarity distribution weight;
and calculating according to the similarity distribution weight, the first similarity and the second similarity to obtain a target product.
9. The method of claim 8, wherein the calculating a target product according to the assigned weight of the similarity, the first similarity, and the second similarity comprises:
calculating according to the similarity distribution weight, the first similarity and the second similarity to obtain a reference similarity and a corresponding target product;
and sequencing the target products according to the reference similarity.
10. An apparatus for querying a target product, the apparatus comprising:
the receiving module is used for receiving input text data;
the first word segmentation module is used for carrying out word segmentation processing on the text data to obtain an input word list;
the standard participle acquiring module is used for acquiring a pre-generated core word bank, wherein standard participles corresponding to products and weights of the standard participles are stored in the core word bank, and the weights are used for measuring the importance degrees of the standard participles at different positions of the products;
the matching module is used for matching the input word list with the standard word segmentation according to the weight of each standard word segmentation;
the target product output module is used for acquiring a product corresponding to the successfully matched standard word segmentation as a target product corresponding to the text data;
the matching module includes:
the reading unit is used for reading each word to be processed in the input word list and obtaining an input matrix corresponding to the input word list according to the word to be processed;
the weight query unit is used for acquiring each standard participle in the core word stock and querying the weight corresponding to the standard participle;
the first calculation unit is used for calculating the reverse file frequency corresponding to each standard word segmentation and the weighted word frequency at the corresponding position of the product;
the second calculation unit is used for calculating according to the reverse file frequency, the weighted word frequency at the corresponding position of the product and the weight of the standard word segmentation to obtain a word matrix element;
and the matching unit is used for obtaining a word matrix according to the word matrix elements and matching the input word list with the standard participles through the word matrix and the input matrix.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 10 when executing the computer program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 10.
CN202110514110.XA 2021-05-12 2021-05-12 Target product query method and device, computer equipment and storage medium Active CN112988980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110514110.XA CN112988980B (en) 2021-05-12 2021-05-12 Target product query method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110514110.XA CN112988980B (en) 2021-05-12 2021-05-12 Target product query method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112988980A CN112988980A (en) 2021-06-18
CN112988980B true CN112988980B (en) 2021-07-30

Family

ID=76337549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110514110.XA Active CN112988980B (en) 2021-05-12 2021-05-12 Target product query method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112988980B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590755B (en) * 2021-08-02 2024-10-29 北京小米移动软件有限公司 Word weight generation method and device, electronic equipment and storage medium
CN113627182B (en) * 2021-08-10 2024-07-26 深圳平安智汇企业信息管理有限公司 Data matching method, device, computer equipment and storage medium
CN113850643B (en) * 2021-09-18 2024-09-17 中国平安财产保险股份有限公司 Product recommendation method and device, electronic equipment and readable storage medium
CN114003740A (en) * 2021-11-02 2022-02-01 北京有竹居网络技术有限公司 Descriptor recognition method, device, medium and electronic equipment
CN115017361B (en) * 2022-05-25 2024-07-19 北京奇艺世纪科技有限公司 Video searching method and device, electronic equipment and storage medium
CN115693667B (en) * 2023-01-04 2023-03-21 佰聆数据股份有限公司 Method and device for automatically distributing power grid power supply nodes based on asymmetric grid structure information

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982153A (en) * 2012-11-29 2013-03-20 北京亿赞普网络技术有限公司 Information retrieval method and device
CN104424277A (en) * 2013-08-29 2015-03-18 深圳市腾讯计算机系统有限公司 Processing method and device for report information
US20200279000A1 (en) * 2019-02-28 2020-09-03 Fuji Xerox Co., Ltd. Information processing apparatus and non-transitory computer readable medium storing program
CN112149414A (en) * 2020-09-23 2020-12-29 腾讯科技(深圳)有限公司 Text similarity determination method, device, equipment and storage medium
CN112307184A (en) * 2020-10-30 2021-02-02 山东浪潮通软信息科技有限公司 Data query method, device and computer readable medium
CN112328752A (en) * 2021-01-04 2021-02-05 平安科技(深圳)有限公司 Course recommendation method and device based on search content, computer equipment and medium
CN112527967A (en) * 2020-12-17 2021-03-19 重庆金融资产交易所有限责任公司 Text matching method, device, terminal and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210042363A1 (en) * 2019-08-09 2021-02-11 Capital One Services, Llc Search pattern suggestions for large datasets
CN111414452B (en) * 2020-02-29 2024-07-02 平安国际智慧城市科技股份有限公司 Search word matching method and device, electronic equipment and readable storage medium
CN111427998B (en) * 2020-03-19 2024-03-26 辽宁工业大学 Secure ciphertext query method for cloud data multi-keyword extension weight
CN111949855B (en) * 2020-07-31 2024-11-01 国网上海市电力公司 Knowledge graph-based engineering technology warp knowledge retrieval platform and method thereof
CN111898380A (en) * 2020-08-17 2020-11-06 上海熙满网络科技有限公司 Text matching method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982153A (en) * 2012-11-29 2013-03-20 北京亿赞普网络技术有限公司 Information retrieval method and device
CN104424277A (en) * 2013-08-29 2015-03-18 深圳市腾讯计算机系统有限公司 Processing method and device for report information
US20200279000A1 (en) * 2019-02-28 2020-09-03 Fuji Xerox Co., Ltd. Information processing apparatus and non-transitory computer readable medium storing program
CN112149414A (en) * 2020-09-23 2020-12-29 腾讯科技(深圳)有限公司 Text similarity determination method, device, equipment and storage medium
CN112307184A (en) * 2020-10-30 2021-02-02 山东浪潮通软信息科技有限公司 Data query method, device and computer readable medium
CN112527967A (en) * 2020-12-17 2021-03-19 重庆金融资产交易所有限责任公司 Text matching method, device, terminal and storage medium
CN112328752A (en) * 2021-01-04 2021-02-05 平安科技(深圳)有限公司 Course recommendation method and device based on search content, computer equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
利用TF-IDF算法优化地方性新闻搜索;史航;《软件导刊》;20111130;第10卷(第11期);第59-60页 *

Also Published As

Publication number Publication date
CN112988980A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN112988980B (en) Target product query method and device, computer equipment and storage medium
Alami et al. Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modeling
CN109829104B (en) Semantic similarity based pseudo-correlation feedback model information retrieval method and system
US7451124B2 (en) Method of analyzing documents
US20220261427A1 (en) Methods and system for semantic search in large databases
US8756229B2 (en) System and methods for units-based numeric information retrieval
US20120323968A1 (en) Learning Discriminative Projections for Text Similarity Measures
Soyusiawaty et al. Book data content similarity detector with cosine similarity (case study on digilib. uad. ac. id)
CN106708929B (en) Video program searching method and device
CN107844493B (en) File association method and system
CN108875065B (en) Indonesia news webpage recommendation method based on content
Liu et al. Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing
Lin et al. A simple but effective method for Indonesian automatic text summarisation
CN115794995A (en) Target answer obtaining method and related device, electronic equipment and storage medium
CN106570196B (en) Video program searching method and device
CN114330335B (en) Keyword extraction method, device, equipment and storage medium
Wijewickrema et al. Selecting a text similarity measure for a content-based recommender system: A comparison in two corpora
US20140181097A1 (en) Providing organized content
CN115630144A (en) Document searching method and device and related equipment
Zhang et al. Mutual-reinforcement document summarization using embedded graph based sentence clustering for storytelling
CN113505196B (en) Text retrieval method and device based on parts of speech, electronic equipment and storage medium
CN111859066A (en) Query recommendation method and device for operation and maintenance work order
CN117435685A (en) Document retrieval method, document retrieval device, computer equipment, storage medium and product
CN110688559A (en) Retrieval method and device
JP2012104051A (en) Document index creating device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant