WO2021207936A1 - 文本匹配方法、装置、电子设备及存储介质 - Google Patents
文本匹配方法、装置、电子设备及存储介质 Download PDFInfo
- Publication number
- WO2021207936A1 WO2021207936A1 PCT/CN2020/084762 CN2020084762W WO2021207936A1 WO 2021207936 A1 WO2021207936 A1 WO 2021207936A1 CN 2020084762 W CN2020084762 W CN 2020084762W WO 2021207936 A1 WO2021207936 A1 WO 2021207936A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- index
- keywords
- matched
- index table
- text
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 89
- 238000012545 processing Methods 0.000 claims description 31
- 238000010586 diagram Methods 0.000 description 17
- 230000008569 process Effects 0.000 description 14
- 230000014509 gene expression Effects 0.000 description 13
- 238000012552 review Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000003058 natural language processing Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
Definitions
- This application relates to the technical field of text processing, and more specifically, to a text matching method, device, electronic device, and storage medium.
- Natural language processing is an important research direction in the field of computer science and artificial intelligence.
- keyword matching is applied in various aspects, such as content review and intelligent customer service.
- the matching of massive keywords needs to ensure the processing efficiency of the matching process to meet the needs of users.
- this application proposes a text matching method, device, electronic device, and storage medium.
- an embodiment of the present application provides a text matching method, the method includes: obtaining a text to be matched; matching the keywords in the text to be matched according to a dictionary tree of keywords that is built in advance; When the keyword in the text to be matched is matched, the target position of the matched keyword in the preset index table is queried, and the preset index table includes at least one index corresponding to the matching condition, and the matching condition includes one Or multiple keywords; adding the matched keywords to the target location in a temporary index table, where the temporary index table is a blank index table with the same structure as the preset index table; according to the The added keywords in the temporary index table determine whether the text to be matched meets the matching condition.
- an embodiment of the present application provides a text matching device, the device includes: a text acquisition module, a keyword matching module, a location query module, a keyword adding module, and a condition determination module, wherein the text acquisition module Used to obtain the text to be matched; the keyword matching module is used to match the keywords in the text to be matched according to a pre-built dictionary tree of keywords; the position query module is used to match the When searching for keywords in the text to be matched, query the target position of the matched keywords in a preset index table, the preset index table includes at least one index corresponding to a matching condition, and the matching condition includes one or more key Words; the keyword adding module is used to add the matched keywords to the target location in a temporary index table, the temporary index table being a blank index table with the same structure as the preset index table The condition determining module is used to determine whether the text to be matched meets the matching condition according to the keywords that have been added in the temporary index table.
- an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more application programs, wherein the one or more application programs are stored in the memory and Is configured to be executed by the one or more processors, and the one or more programs are configured to execute the text matching method provided in the first aspect described above.
- an embodiment of the present application provides a computer-readable storage medium.
- the computer-readable storage medium stores program code, and the program code can be invoked by a processor to execute the text provided in the first aspect. Matching method.
- the solution provided by this application is to obtain the text to be matched, according to the dictionary tree of the pre-built keywords, to match the keywords in the text to be matched, and when the keywords in the text to be matched are matched, query the matched keywords
- the preset index table includes at least one index corresponding to the matching condition, and the matching condition includes one or more keywords, and then the matched keywords are added to the temporary index table.
- the target location, the temporary index table is a blank index table with the same structure as the preset index table, and then according to the added keywords in the temporary index table, it is determined whether the text to be matched meets the matching conditions, thereby realizing a dictionary tree combining keywords , And the index table formed by the matching conditions to match the text to be matched to determine whether the text to be matched meets the matching conditions, reduce the complexity of text matching, and improve the efficiency of text matching.
- Fig. 1 shows a schematic diagram of a dictionary tree provided by an embodiment of the present application.
- Fig. 2 shows a flowchart of a text matching method according to an embodiment of the present application.
- Fig. 3 shows a schematic diagram of a dictionary tree provided by an embodiment of the present application.
- FIG. 4 shows a schematic diagram of a preset index table provided by an embodiment of the present application.
- Fig. 5 shows a schematic diagram of a temporary index table provided by an embodiment of the present application.
- Fig. 6 shows a flowchart of a text matching method according to another embodiment of the present application.
- FIG. 7 shows a state diagram of the temporary index table provided by another embodiment of the present application.
- Fig. 8 shows another state diagram of the temporary index table provided by another embodiment of the present application.
- FIG. 9 shows still another state diagram of the temporary index table provided by another embodiment of the present application.
- Fig. 10 shows a flowchart of a text matching method according to another embodiment of the present application.
- FIG. 11 shows a schematic diagram of a preset index table provided by another embodiment of the present application.
- FIG. 12 shows a time-consuming schematic diagram of a text matching method provided by another embodiment of the present application.
- Fig. 13 shows a block diagram of a text matching device according to an embodiment of the present application.
- FIG. 14 is a block diagram of an electronic device for executing the text matching method according to an embodiment of the present application according to an embodiment of the present application.
- Fig. 15 is a storage unit for storing or carrying the program code for implementing the text matching method according to the embodiment of the present application according to an embodiment of the present application.
- NLP Natural Language Processing
- keyword matching is performed on the customer’s question text to assist in decision-making scenarios.
- the capacity of the keyword library has rapidly expanded.
- the number of keywords exceeds 50,000, it can be called a massive keyword, which means that there is often a need to match the massive keywords in the text.
- massive keyword matching it is necessary to ensure the low time-consuming and flexibility of keyword matching at the same time, which is technically challenging.
- keywords are usually matched through regular expressions and dictionary trees.
- the time complexity of the regular expression matching the keyword database is: O(m*n), that is, for each keyword combination, you need to use every The regular expression corresponding to each keyword combination matches the text to be matched.
- Dictionary tree also known as word search tree, Trie tree, is a tree structure and a variant of hash tree.
- the dictionary tree can store keywords as a tree structure, and each node of the tree is a character.
- keywords include: apple, application, regression, and return to China.
- the constructed dictionary tree is shown in Figure 1.
- Trie algorithm it is necessary to traverse the text to be matched once. Traverse along the way and find the exact same string as in the dictionary tree, and get the flag bit of the last node, then the keyword is matched. If the text length of the text to be matched is n, the time complexity of the dictionary tree matching is O(n), that is, the dictionary tree needs to be used to traverse the text to be matched to complete the matching of the keywords in the text to be matched .
- the inventor found that in the way of matching keywords through regular expressions, the matching time complexity is O(m*n), and the matching time is large in the scenario of massive keywords.
- the time complexity regardless of the number of keywords, is O(n).
- the flexibility is low and it is only suitable for single word matching. For example, in the above example, words such as apple and application can only be matched individually, and two or more multi-word combinations cannot be completed (such as apple...application, "... ”Represents any character). In actual business, its use has limitations.
- the inventor proposed the text matching method, device, electronic device, and storage medium provided by the embodiments of the present application, which can implement a dictionary tree combining keywords and an index table formed by matching conditions, and match the text to be matched to Determine whether the text to be matched meets the matching conditions, reduce the complexity of text matching, and improve the efficiency of text matching.
- the specific text matching method will be described in detail in the subsequent embodiments.
- FIG. 2 shows a schematic flowchart of a text matching method provided by an embodiment of the present application.
- the text matching method is applied to the text matching device 400 shown in FIG. 13 and the mobile terminal 100 equipped with the text matching device 400 (FIG. 14 ).
- the following will take a mobile terminal as an example to illustrate the specific process of this embodiment.
- the electronic device applied in this embodiment may be a server, a PC computer, a notebook computer, etc., which is not limited here.
- the process shown in FIG. 2 will be described in detail below, and the text matching method may specifically include the following steps:
- Step S110 Obtain the text to be matched.
- the electronic device can obtain the text to be matched, that is, the text that needs to be determined whether it meets the matching condition.
- the text to be processed can be text extracted from web content, text extracted from chat content, text obtained by converting speech to text, text extracted from text images, etc., specific text Can not be used as a limit.
- the electronic device can obtain the text to be matched online, and the network obtains the text to be matched from other devices.
- the electronic device can be a server, and can receive text content uploaded by other users, and match the conditions of keyword matching on the text content to complete the content review; another example
- the matching conditions of keywords can be matched on the question text sent by the user, and questions and answers can be conducted according to the matching results.
- the specific way of obtaining the text to be matched online may not be a limitation.
- the electronic device may also obtain the text to be matched from a locally stored file, that is, the text to be processed is pre-stored locally in the electronic device.
- the teacher can store the text of the homework to be reviewed in an electronic device, and when it needs to be reviewed, read the text of the homework and perform keyword matching to complete the corresponding review work.
- the scene in which the electronic device obtains the text to be matched locally may not be a limitation.
- the length of the text and the number of characters in the text to be matched may not be limited.
- the text to be matched may be a paragraph or a sentence.
- Step S120 Match the keywords in the text to be matched according to the pre-built dictionary tree of keywords.
- a dictionary tree with keywords can be pre-built, and the dictionary tree can be stored in the electronic device.
- the keyword may be a keyword included in the matching condition, and the matching condition may be one or more.
- Each matching condition can be a single keyword or a combination of multiple keywords.
- the dictionary tree of keywords can include all keywords included in the matching conditions.
- the matching condition can be "Keyword 1"; for another example, the matching condition can be "Keyword 1...Keyword 2", and "" is any character.
- the above matching conditions are only examples, and the specific number of keywords in the matching conditions may not be limited.
- the dictionary tree of keywords may be created by an electronic device, and the dictionary tree of keywords may be stored locally, so that when the text to be matched needs to be processed, the dictionary tree is read locally.
- the dictionary tree of keywords can also be created by other devices (such as servers, etc.), and then sent to the electronic device by the other device, and then the electronic device stores it, so that it can be processed when the text to be matched is needed.
- the dictionary tree of keywords can also be created by other devices (such as servers, etc.), and then sent to the electronic device by the other device, and then the electronic device stores it, so that it can be processed when the text to be matched is needed.
- processing read the dictionary tree from the local.
- the electronic device may also obtain a dictionary tree of keywords from other devices (such as a server) in real time every time the text to be matched needs to be processed, and then cache the obtained dictionary tree for use in The processing of matching text this time.
- other devices such as a server
- FIG. 3 shows a schematic diagram of the dictionary tree provided by an embodiment of the present application.
- the matching conditions include: “court”, “court...play”, “national team...captain”, “court” ...Captain”, all keywords corresponding to the matching conditions can include: “court”, “national team”, “captain” and “playing”. Therefore, it can be set up to include “court”, “national team”, and “captain” And the dictionary tree of "playing ball”.
- the electronic device may match the keywords in the text to be matched according to the dictionary tree of keywords established above. Specifically, the electronic device uses the dictionary tree matching algorithm to traverse the text to be matched using the dictionary tree. If the traversal along the way finds the exact same string as in the dictionary tree and obtains the flag bit of the last node, the keyword is matched.
- Step S130 When a keyword in the text to be matched is matched, query the target position of the matched keyword in a preset index table, and the preset index table includes at least one index corresponding to the matching condition.
- the matching condition includes one or more keywords.
- a preset index table corresponding to the matching condition may also be created in advance, wherein the preset index table may include an index corresponding to each matching condition, and each index may include its corresponding index.
- the keywords in the matching conditions may be created in advance, wherein the preset index table may include an index corresponding to each matching condition, and each index may include its corresponding index.
- the index may be row content or column content in a preset index table.
- the keywords in a column of the preset index table can constitute the matching condition corresponding to the index corresponding to the column; when the index is the row content in the preset index table, Then the keywords in a row of the preset index table can constitute the matching condition corresponding to the index corresponding to the row.
- the preset index table may also include other information, for example, it may also include the position of the keyword in each index in the preset index table. , The number of keywords in each index (that is, the number of keywords in the matching condition corresponding to the index), the index identification of each index, etc.
- FIG. 4 shows a schematic diagram of a preset index table provided by an embodiment of the present application.
- the matching conditions include: “court”, “court...playing”, “national team...captain...playing", For “court... captain”, “court... national team”, the index mark can be included in the preset index table.
- the index is the row content in the preset index table, and the preset index table includes each The index identifier of the index, the position identifier of the keyword in each index, and the number of keywords in each index, for example, the matching condition "court...playing", the corresponding index identifier in the preset index table is 2, row
- the content includes the keywords "court” and “playing", and the number of corresponding keywords is 2.
- the size (rows and columns) in the preset index table can be determined according to the number of matching conditions and the maximum number of keywords in the matching conditions. For example, as in the above example, if the matching conditions include 5, the preset index table includes 5 rows of content, and the maximum number of keywords is 3, then the preset index table includes 3 columns of content.
- the electronic device when the electronic device matches the keyword in the text to be matched, it can query the target position of the matched keyword in the preset index table according to the preset index table, so that subsequent matching can be performed.
- the keywords are added to the temporary index table. For example, in the above example, when the keyword "court” is matched, the preset index table can be queried, and the query can find that "court" is located in the first position of index 1, index 2, index 4, and index 5, namely Location ID 1.
- the electronic device can query the target position of each keyword in the preset index table after searching all the keywords in the text to be matched; the electronic device can also query each keyword every time a keyword is found. Perform a query on the target position of the keyword in the preset index table.
- Step S140 Add the matched keyword to the target location in a temporary index table, where the temporary index table is a blank index table with the same structure as the preset index table.
- the electronic device may add the matched keyword to the target position in the temporary index table, so as to be based on the temporary index For the added keywords in the table, determine whether the text to be matched meets the matching conditions.
- the temporary index table corresponds to the preset index table.
- the temporary index table may be a blank index table with the same structure as the preset index table, wherein the number of rows and columns of the temporary index table and the preset index table may be the same, and Other parameter information included in the preset index table may also be the same.
- Figure 5 is a temporary index table corresponding to the preset index table shown in Figure 4. The structure and information included in the two are the same. The difference from the preset index table is that the temporary index table The position of each keyword is empty.
- Step S150 Determine whether the text to be matched meets the matching condition according to the added keywords in the temporary index table.
- the electronic device after the electronic device adds the matched keywords to the temporary index table, it can determine whether the text to be matched meets the matching condition according to the added keywords in the temporary index table.
- whether the number of keywords in each index (row or column) in the temporary index table is the same as the number of keywords recorded in the index can be used to determine whether the matching condition corresponding to the index is satisfied. In other words, if they are the same, it means that the keyword of the index is included in the text to be matched, that is, the matching condition corresponding to the index is satisfied. Of course, you can also directly compare whether the keywords in the index in the temporary index table are the same as the keywords in the preset index table. If they are the same, it also means that the keyword of the index is included in the text to be matched, that is, the corresponding index is satisfied. Match conditions.
- the electronic device After determining whether the matching condition corresponding to the index is satisfied according to the keywords added in each index temporarily, the electronic device can obtain the matching condition that the text to be matched meets, so as to obtain the result of matching the matching condition of the text to be matched .
- the keywords in the text to be matched are determined through the dictionary tree, and then the matched keywords are added to the temporary index table according to the preset index table established in advance, and then only need to be based on
- the keywords added in the temporary index table can be used to determine whether the text to be matched meets the corresponding matching conditions. It is not necessary for each keyword combination to use the regular expression corresponding to each keyword combination to match the text to be matched. Effectively save the time of keyword matching and improve the efficiency of keyword matching.
- FIG. 6 shows a schematic flowchart of a text matching method provided by another embodiment of the present application. This method is applied to the above-mentioned electronic equipment, and the process shown in FIG. 6 will be described in detail below.
- the text matching method may specifically include the following steps:
- Step S210 Obtain the text to be matched.
- Step S220 Match the keywords in the text to be matched according to the pre-built dictionary tree of keywords.
- step S210 and step S220 can refer to the content in the foregoing embodiment, which will not be repeated here.
- Step S230 When a keyword in the text to be matched is matched according to the dictionary tree, query the target position of the currently matched keyword in a preset index table, and the preset index table includes at least one The index corresponding to the matching condition, where the matching condition includes one or more keywords.
- the electronic device may also query the target position of the currently matched keyword in the preset index table every time a keyword is found, so as to add the currently matched keyword to the temporary index surface.
- the target location may be one location or multiple locations, that is, when the keyword is in the index corresponding to multiple matching conditions, the target location is multiple locations, and when the keyword is in one match When the condition corresponds to the index, the target position is 1 position.
- Step S240 Add the matched keyword to the target location in a temporary index table, where the temporary index table is a blank index table with the same structure as the preset index table.
- the electronic device may add the currently matched keyword to the temporary index table according to the target position.
- Step S250 Determine whether the text to be matched meets the matching condition according to the added keywords in the temporary index table.
- the matching conditions of the text to be matched can be determined according to the added keywords in the current temporary index table.
- the preset index table is shown in Figure 4, the temporary index table is shown in Figure 5, and the text to be matched is "coach, I want to go to the stadium to watch the national team play", then the first keyword matched is the stadium. Therefore, as shown in Figure 7, after the query finds that the position of the "court" in the index identifiers 1, 2, 4, and 5 is identified as 1, it can be added to the indexes of the index identifiers 1, 2, 4, and 5, respectively. The first position. At this time, you can determine the matching conditions that the text to be matched meets based on the keywords that have been added in the current temporary index table.
- Step S260 Repeat the steps of matching the keywords in the text to be matched according to the dictionary tree based on the pre-established keywords, until determining all the keywords according to the added keywords in the temporary index table.
- each keyword in the text to be matched is matched, after the keyword is added to the preset index table, it is determined whether the matching condition is satisfied. There may be other keywords in the text to be matched, so you can continue to perform keyword matching to add the matched keywords to the temporary index table, and then proceed according to the added keywords in the temporary index table. Determine whether the text to be matched meets the corresponding matching conditions.
- the electronic device may repeatedly perform step S230 to step S250.
- the dictionary tree When the dictionary tree is used to traverse the text to be matched, all keywords are determined, and the last matched keyword is added to the temporary index table, and based on the added keywords in the temporary index table, determine whether the text to be matched is After the corresponding matching condition is satisfied, the process can be ended, so that the result of matching the text to be matched with the matching condition can be obtained.
- the text to be matched is "coach, I want to go to the stadium to watch the national team play", and the second keyword matched is "national team". Therefore, as shown in Figure 8, in It is found that the position identification of "national team" in the index of index identification 3 is 1, and after the position identification of index identification 5 is 2, the “national team” can be added to the index of index identification 2 respectively. The first position, and the second position in the index of index identification 5. At this time, you can determine the matching conditions that the text to be matched meets based on the keywords that have been added in the current temporary index table.
- the number of keywords in the index of the current index ID 1, 2, 4, and 5 are 1, 1, 1, 1, and 2, respectively.
- the number of keywords in the index corresponding to index ID 1 in the preset index table is the same. Therefore, it is currently determined that the text to be matched meets the matching condition corresponding to the index corresponding to index ID 1, that is, "court", and index ID 5 corresponds to
- the number of keywords in the index is the same as the number of keywords in the index corresponding to index mark 5 in the preset index table. Therefore, it can be determined that the text to be matched meets the matching condition corresponding to the index corresponding to index mark 5, that is, "court ...national team".
- the third keyword matched is "playing". Therefore, as shown in Figure 9, the position of "playing" in the index of index identifier 2 is identified as 2, and in the index of index identifier 3. After the position of is marked as 3, "playing" can be added to the second position in the index of index mark 3 and the third position in the index of index mark 3 respectively. At this time, you can determine the matching conditions that the text to be matched meets based on the keywords that have been added in the current temporary index table. For example, you can compare whether the current number of keywords in each index is with the total number of keywords in the index recorded The same, as in the above example, the number of keywords in the index of the current index identification 1, 2, 4, and 5 are 1, 2, 2, 1, and 2, respectively.
- the keyword in the index corresponding to the index identification 1 can be determined
- the number of is the same as the number of keywords in the index corresponding to index ID 1 in the preset index table
- the number of keywords in the index corresponding to index ID 2 is the same as the number of keywords in the index corresponding to index ID 2 in the preset index table
- the number of keywords in the index corresponding to index identifier 5 is the same as the number of keywords in the index corresponding to index identifier 5 in the preset index table, so it is currently determined that the text to be matched meets the matching corresponding to the index corresponding to index identifier 1.
- the condition, namely "court”, satisfies the matching condition corresponding to the index corresponding to index mark 2, namely "court...playing", and satisfies the matching condition corresponding to the index corresponding to index mark 5, namely "court...national team”.
- the electronic device may also perform a corresponding index corresponding to the matching condition from the temporary index table after determining the matching condition that the text to be matched meets each time. Processing, to avoid that after adding the next keyword, the keyword added in the index will be judged, and the processing time will be wasted. Therefore, before repeating steps S230 to S250, the text matching method may further include: if it is determined that the text to be matched meets the matching condition corresponding to the specified index, performing preset processing on the specified index in the temporary index table, and preset Processing is used to indicate that the matching condition corresponding to the specified index has been matched.
- the specified index is any index. After performing preset processing on the specified index, when the electronic device subsequently processes the next matched keyword, that is, after adding the next keyword, it will no longer judge the keywords added in the index, so It further saves the time of text matching and improves the efficiency of keyword matching.
- the electronic device performs preset processing on the specified index in the temporary index table, which may be to delete the row or column corresponding to the specified index from the temporary index table; as another implementation manner, the electronic device The preset processing of the designated index in the index table may be to mark the row or column corresponding to the designated index in the temporary index table.
- the specific preset processing is not limited. The preset processing only needs to make the electronic device not process the specified index in the temporary index table when processing the next matched keyword. Improve the efficiency of keyword matching.
- the electronic device uses the dictionary tree to traverse the text to be matched, it determines all keywords, and adds the last matched keyword to the temporary index table, and according to the added value in the temporary index table Keywords: After determining all the matching conditions that the text to be matched meets, you can also output all the matching conditions that the text to be matched meets, so that the user can understand the result of matching the matching conditions of the text to be matched this time, that is, the keyword matching result. For example, all the matching conditions that are met can be displayed, or voice playback can be performed, which is not limited here.
- the keywords in the text to be matched are determined through the dictionary tree, and then according to the preset index table established in advance, after each keyword is matched, the keyword is added to the temporary Index table, and then only need to add keywords in the temporary index table to determine whether the text to be matched meets the corresponding matching conditions, and then repeat the execution to add the keywords that are matched each time to the temporary index table, and then determine Whether the text to be matched meets the corresponding matching conditions, the matching conditions that the entire text to be matched meets can be determined at last. It is not necessary for each keyword combination to use the regular expression corresponding to each keyword combination to match the text to be matched. Effectively save the time of keyword matching and improve the efficiency of keyword matching.
- FIG. 10 shows a schematic flowchart of a text matching method provided by another embodiment of the present application. This method is applied to the above-mentioned electronic device. The following will describe the process shown in FIG. 10 in detail.
- the text matching method may specifically include the following steps:
- Step S310 Obtain keywords in each matching condition.
- the electronic device may obtain the keywords in the matching condition to construct a dictionary tree in advance.
- the electronic device may obtain keywords in each matching condition.
- the electronic device may obtain the matching condition from other devices, or obtain the matching condition input by the user.
- the electronic device can directly obtain the keywords in the matching condition; when the obtained matching condition is in text form, the electronic device can segment the matching condition and determine Find the keywords in the matching condition. For example, if the matching condition is: playing on the court, the keywords can be determined as "court" and "playing".
- Step S320 Construct a dictionary tree according to the keywords.
- the electronic device can construct a dictionary tree. Specifically, the electronic device can determine the first node of each branch in the dictionary tree according to the first word of the keyword, and then connect other words in the keyword as a node to the first node in turn, thereby constructing a keyword Dictionary tree. After the electronic device constructs the dictionary tree of the keywords, it can store the dictionary tree.
- Step S330 According to the keywords in each matching condition, an index is generated and added to the initial index table to obtain the preset index table.
- the electronic device may also construct a preset index table in advance according to the keywords in each matching condition.
- the electronic device may add the keywords in each matching condition to a row or column in the initial index table to obtain a preset index table, where the row or column of each matching condition corresponds to the matching condition. index.
- the initial index table may be a blank index table, and the number of rows and columns in the initial table may be unlimited, but the minimum requirements required by the matching conditions should be met.
- the electronic device after the electronic device adds the keyword in each matching condition to a row or column in the initial index table, it can also delete the position where the keyword does not exist in the initial index table, so as to obtain the final preset index surface.
- the rows and columns in the preset index table only include the positions where keywords exist, so that the preset index table is concise, which not only saves space, but also improves the addition of electronic equipment in the temporary index table.
- the keywords to determine whether the text to be matched meets the matching conditions.
- the electronic device may also record the position of the keyword of each index in the preset index table and the number of keywords of each index. For example, in the preset index table shown in FIG. 11, each index The location identifiers of the keywords and the number of keywords are recorded in the preset index table, so that the electronic device can use the preset index table to process the matched keywords more quickly.
- Step S340 Obtain the text to be matched.
- Step S350 Match the keywords in the text to be matched according to the pre-built dictionary tree of keywords.
- Step S360 When a keyword in the text to be matched is matched, query the target position of the matched keyword in a preset index table, where the preset index table includes at least one index corresponding to the matching condition, and The matching condition includes one or more keywords.
- Step S370 Add the matched keyword to the target position in a temporary index table, where the temporary index table is a blank index table with the same structure as the preset index table.
- Step S380 Determine whether the text to be matched meets the matching condition according to the added keywords in the temporary index table.
- the electronic device may query all keywords in the text to be matched, query the target position of each keyword in the preset index table, and then perform steps S370 and S380 to determine the to-be-matched The matching condition that the text satisfies. As described in the previous embodiment, the electronic device may also perform steps S370 and S380 each time a keyword is matched, until the last matched keyword is added to the preset index table, and the matching condition is determined .
- the electronic device determining whether the text to be matched meets the matching condition according to the added keywords in the temporary index table may include:
- the added keywords in each index of the temporary index table are compared with the keywords of the corresponding index in the preset index table; according to the comparison result, it is determined whether the text to be matched meets the matching condition.
- the electronic device can obtain the number of keywords added in each index of the temporary index table, and then compare the number of keywords added in each index with the keywords of the corresponding index in the preset index table. The number is compared; if the number of keywords added in the target index is the same as the number of keywords in the target index in the preset index table, it is determined that the text to be matched meets the matching conditions corresponding to the target index, and the target index is a temporary index table Any index in. If the number of added keywords in no index is the same as the number of keywords in the same index in the preset index table, it is determined that the text to be matched does not satisfy any matching condition.
- the structure of the temporary index table is the same as the structure of the preset index table, and each matched keyword is added at the same position as the keyword in the preset index table, the key to be added in any index
- the number of words is the same as the number of keywords in the index in the preset index table, it means that the keyword of the index is included in the text to be matched, that is, the matching condition corresponding to the index is satisfied.
- the electronic device can also compare whether the keywords that have been added in each index are the same as the keywords of the corresponding index in the preset index table; if the keywords that have been added in the target index are the same as those in the preset index table If the keywords of the target index are the same, it is determined that the text to be matched meets the matching condition corresponding to the target index, and the target index is any index in the temporary index table. If there is no keyword that has been added in any index that is the same as the keyword of the same index in the preset index table, it is determined that the text to be matched does not satisfy any matching condition. It is understandable that if the keyword added in any index is the same as the keyword in the index in the preset index table, it means that the keyword of the index is included in the text to be matched, that is, the matching condition corresponding to the index is satisfied.
- the preset index table may not have positions other than the positions of the keywords in the index, for example, in the preset index table as shown in FIG. 10, rows in the preset index table Only the positions where keywords exist are included in the sum column. Therefore, the electronic device can determine whether there is a space in each index, that is, according to the keywords that have been added in the temporary index table, determine whether there is a space in the row or column corresponding to each index in the temporary index table; if the target index corresponds to There is no vacant position in the row or column, it means that the row or column corresponding to the target index in the temporary index table is filled.
- the keyword of the target index is included in the text to be matched, so that it can be determined that the text to be matched satisfies
- the matching condition corresponding to the target index the target index is any index in the temporary index table.
- the text matching method provided in the embodiments of the present application takes less time than the regular expression matching method, and is close to the dictionary tree algorithm.
- FIG. 1 the number of combined keywords is m and the length of the matched text is n
- the time complexity of the regular expression (re) matching keyword library is still O(m*n)
- the text matching method provided in the embodiment of the application The time complexity of is O(n), but the Trie algorithm cannot directly match compound words.
- the time complexity of the co-Trie algorithm is O(n)+O(mp), where p is the average word frequency in the lexicon. For a general scenario, p is much smaller than m.
- the text matching method provided by the embodiment of the present application takes less time than the regular expression matching method, and is close to the dictionary tree algorithm.
- FIG. 12 shows an experimental result diagram of a time-consuming comparison experiment between the text matching method and the regular expression matching method provided by the embodiment of the present application, where the string length of the matched text is 200. It can be found from FIG. 12 that as the number of keywords increases, the time-consuming of the text matching method provided in the embodiments of this application is basically unchanged, while the time-consuming of the regular expression matching method increases with the increase of the number of keywords. In addition, in the matching of a large number of keywords, the time-consuming of the text matching method provided in the embodiment of the present application is significantly lower than that of the regular expression matching method.
- FIG. 13 shows a structural block diagram of a text matching apparatus 400 provided by an embodiment of the present application.
- the text matching device 400 uses the aforementioned electronic equipment.
- the text matching device 400 includes: a text acquisition module 410, a keyword matching module 420, a location query module 430, a keyword adding module 440, and a condition determining module 450.
- the text acquisition module 410 is used to obtain the text to be matched;
- the keyword matching module 420 is used to match the keywords in the text to be matched according to a pre-built dictionary tree of keywords;
- the position The query module 430 is configured to query the target position of the matched keyword in a preset index table when the keyword in the text to be matched is matched, and the preset index table includes at least one index corresponding to the matching condition,
- the matching condition includes one or more keywords;
- the keyword adding module 440 is configured to add the matched keywords to the target position in the temporary index table, and the temporary index table is the same as the A blank index table with the same structure of the preset index table;
- the condition determining module 450 is configured to determine whether the text to be matched meets the matching condition according to the added keywords in the temporary index table.
- the preset index table includes multiple indexes corresponding to matching conditions.
- the condition determination module 450 may include: a comparison unit and a first determination unit. Wherein, the comparing unit is used to compare the added keywords in each index of the temporary index table with the keywords of the corresponding index in the preset index table; the first determining unit is used to determine according to the comparison result Whether the to-be-matched text meets the matching condition.
- the comparison unit may include: a quantity acquisition subunit and a quantity comparison subunit.
- the quantity obtaining subunit is used to obtain the quantity of the added keywords in each index of the temporary index table;
- the quantity comparison subunit is used to compare the quantity of the added keywords in each index with the The number of keywords in the corresponding index in the preset index table is compared.
- the first determining unit may be specifically configured to: if the number of added keywords in the target index is the same as the number of keywords in the target index in the preset index table, determine the to-be-matched The text satisfies the matching condition corresponding to the target index, and the target index is any index in the temporary index table.
- the comparison unit may include: a keyword comparison subunit.
- the keyword comparison subunit is used to compare whether the added keyword in each index is the same as the keyword of the corresponding index in the preset index table.
- the first determining unit may be specifically configured to: if the keyword added in the target index is the same as the keyword of the target index in the preset index table, determine that the text to be matched satisfies all According to the matching condition corresponding to the target index, the target index is any index in the temporary index table.
- the preset index table includes a plurality of indexes corresponding to matching conditions, and there is no other than the position of the keyword in the index in the preset index table and the temporary index table.
- the condition determination module 450 may include: a position determination unit and a second determination unit. Wherein, the position determining unit is used to determine whether there is a free position in the row or column corresponding to each index in the temporary index table according to the keywords that have been added in the temporary index table; the second determining unit is used to determine if the target index If there is no free position in the corresponding row or column, it is determined that the text to be matched meets the matching condition corresponding to the target index, and the target index is any index in the temporary index table.
- the location query module 430 may be specifically configured to: when a keyword in the text to be matched is matched according to the dictionary tree, query the currently matched keyword in the preset index table target location.
- the text matching device 400 may further include: a repeated execution module.
- the repeat execution module is configured to: repeat the step of matching the keywords in the text to be matched according to the dictionary tree based on the pre-established keywords to the keywords that have been added in the temporary index table. , The step of determining whether the text to be matched satisfies the matching condition, until all the keywords in the text to be matched are added to the temporary index table, according to the added key in the temporary index table Word, determine whether the text to be matched meets the matching condition.
- the text matching device 400 may further include: an index processing module.
- the index processing module is configured to repeatedly execute the step of matching the keywords in the text to be matched according to the dictionary tree based on the pre-established keywords to the steps that have been added in the temporary index table according to the Keywords, before the step of determining whether the text to be matched meets the matching condition, if it is determined that the text to be matched meets the matching condition corresponding to the specified index, the specified index in the temporary index table is pre-prepared Assuming processing, the preset processing is used to indicate that the matching condition corresponding to the specified index has been matched.
- the index processing module may be specifically configured to delete the row or column corresponding to the specified index from the temporary index table.
- the index processing module may also be specifically configured to mark the row or column corresponding to the specified index in the temporary index table.
- the location query module 430 may also be specifically configured to: after matching all the keywords in the text to be matched, query for each keyword in the preset index table. Target location in.
- the text matching device 400 may further include: an output module.
- the output module is configured to output all the matching conditions satisfied by the text to be matched after determining whether the text to be matched meets the matching condition according to the added keywords in the temporary index table.
- the text matching device 400 may further include: a keyword acquisition module and an index generation module.
- the keyword obtaining module is used to obtain the keywords in each matching condition when the keywords in the text to be matched are matched, and the keywords matched by the query are before the target position in the preset index table;
- the module is used to generate an index according to the keywords in each matching condition and add it to the initial index table to obtain the preset index table.
- the index generation module may be specifically configured to: add keywords in each matching condition to a row or a column in the initial index table to obtain the preset index table, wherein each matching condition The row or column is the index corresponding to the matching condition.
- index generation module may also be used to delete positions where no keywords exist in the initial index table.
- the text matching device 400 may further include: an index recording module.
- the index recording module is used to generate an index according to the keywords in each matching condition and add it to the initial index table, and after obtaining the preset index table, record the key of each index in the preset index table The position of the word and the number of keywords in each index.
- the text matching device 400 may further include: a keyword acquisition module and a dictionary tree construction module.
- the keyword acquisition module is used to construct a dictionary tree based on the keywords before matching the keywords in the text to be matched according to the dictionary tree based on the pre-built keywords.
- the coupling between the modules may be electrical, mechanical or other forms of coupling.
- each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
- the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules.
- the solution provided by this application obtains the text to be matched, and matches the keywords in the text to be matched according to the dictionary tree of the pre-built keywords.
- the query The matched keyword is at the target position in the preset index table, the preset index table includes at least one index corresponding to the matching condition, and the matching condition includes one or more keywords, and then the matched keywords are added to the temporary
- the temporary index table is a blank index table with the same structure as the preset index table, and then according to the added keywords in the temporary index table, it is determined whether the text to be matched meets the matching conditions, thereby realizing the combination
- the dictionary tree of keywords and the index table formed by the matching conditions match the text to be matched to determine whether the text to be matched meets the matching conditions, reduce the complexity of text matching, and improve the efficiency of text matching.
- the electronic device 100 may be an electronic device capable of running application programs, such as a server, a PC computer, a notebook computer, and a mobile terminal.
- the electronic device 100 in this application may include one or more of the following components: a processor 110, a memory 120, and one or more application programs.
- One or more application programs may be stored in the memory 120 and configured to be Or multiple processors 110 execute, and one or more programs are configured to execute the method described in the foregoing method embodiment.
- the processor 110 may include one or more processing cores.
- the processor 110 uses various interfaces and lines to connect various parts of the entire electronic device 100, and executes by running or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and calling data stored in the memory 120.
- Various functions and processing data of the electronic device 100 may adopt at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA).
- DSP Digital Signal Processing
- FPGA Field-Programmable Gate Array
- PDA Programmable Logic Array
- the processor 110 may be integrated with one or a combination of a central processing unit (CPU), a graphics processing unit (GPU), a modem, and the like.
- the CPU mainly processes the operating system, user interface, and application programs; the GPU is used for rendering and drawing of display content; the modem is used for processing wireless communication. It is understandable that the above-mentioned modem may not be integrated into the processor 110, but may be implemented by a communication chip alone.
- the memory 120 may include random access memory (RAM) or read-only memory (Read-Only Memory).
- the memory 120 may be used to store instructions, programs, codes, code sets or instruction sets.
- the memory 120 may include a program storage area and a data storage area, where the program storage area may store instructions for implementing an operating system and instructions for implementing at least one function (such as touch function, sound playback function, image playback function, etc.) , Instructions used to implement the following various method embodiments, etc.
- the storage data area can also store data (such as phone book, audio and video data, chat record data) created by the electronic device 100 during use.
- FIG. 15 shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application.
- the computer-readable medium 800 stores program code, and the program code can be invoked by a processor to execute the method described in the foregoing method embodiment.
- the computer-readable storage medium 800 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
- the computer-readable storage medium 800 includes a non-transitory computer-readable storage medium.
- the computer-readable storage medium 800 has storage space for the program code 810 for executing any method steps in the above-mentioned methods. These program codes can be read from or written into one or more computer program products.
- the program code 810 may be compressed in a suitable form, for example.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种文本匹配方法、装置、电子设备及存储介质,该文本匹配方法包括:获取待匹配文本(S110);根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配(S120);当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,所述预设索引表包括至少一个匹配条件对应的索引,所述匹配条件包括一个或多个关键词(S130);将所述匹配到的关键词添加至临时索引表中的所述目标位置,所述临时索引表为与所述预设索引表的结构相同的空白索引表(S140);根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件(S150)。
Description
本申请涉及文本处理技术领域,更具体地,涉及一种文本匹配方法、装置、电子设备及存储介质。
自然语言处理是计算机科学领域与人工智能领域中的重要研究方向,在自然语言处理领域中,关键词匹配被应用在各个方面,例如内容审核、智能客服等。关键词的匹配处理中,对于海量关键词的匹配需要保证匹配处理的处理效率,以满足用户需求。
发明内容
鉴于上述问题,本申请提出了一种文本匹配方法、装置、电子设备及存储介质。
第一方面,本申请实施例提供了一种文本匹配方法,所述方法包括:获取待匹配文本;根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配;当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,所述预设索引表包括至少一个匹配条件对应的索引,所述匹配条件包括一个或多个关键词;将所述匹配到的关键词添加至临时索引表中的所述目标位置,所述临时索引表为与所述预设索引表的结构相同的空白索引表;根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
第二方面,本申请实施例提供了一种文本匹配装置,所述装置包括:文本获取模块、关键词匹配模块、位置查询模块、关键词添加模块以及条件确定模块,其中,所述文本获取模块用于获取待匹配文本;所述关键词匹配模块用于根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配;所述位置查询模块用于当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,所述预设索引表包括至少一个匹配条件对应的索引,所述匹配条件包括一个或多个关键词;所述关键词添加模块用于将所述匹配到的关键词添加至临时索引表中的所述目标位置,所述临时索引表为与所述预设索引表的结构相同的空白索引表;所述条件确定模块用于根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
第三方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储器;一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行上述第一方面提供的文本匹配方法。
第四方面,本申请实施例提供了一种计算机可读取存储介质,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行上述第一方面提供的文本匹配方法。
本申请提供的方案,通过获取待匹配文本,根据预先建立的关键词的字典树,对待匹配文本中的关键词进行匹配,当匹配到待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,该预设索引表包括至少一个匹配条件对应的索引,匹配条件中包括一个或多个关键词,再将匹配到的关键词添加至临时索引表中的该目标位置,该临时索引表为与预设索引表的结构相同的空白索引表,然后根据临时索引表中已添加的关键词,确定待匹配文本是否满足匹配条件,从而实现结合关键词的字 典树,以及匹配条件形成的索引表,对待匹配文本进行匹配,以确定待匹配文本是否满足匹配条件,降低文本匹配的复杂度,从而提升文本匹配的效率。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示出了本申请实施例提供的字典树的一种示意图。
图2示出了根据本申请一个实施例的文本匹配方法流程图。
图3示出了本申请一个实施例提供的字典树的一种示意图。
图4示出了本申请一个实施例提供的预设索引表的一种示意图。
图5示出了本申请一个实施例提供的临时索引表的一种示意图。
图6示出了根据本申请另一个实施例的文本匹配方法流程图。
图7示出了本申请另一个实施例提供的临时索引表的一种状态图。
图8示出了本申请另一个实施例提供的临时索引表的另一种状态图。
图9示出了本申请另一个实施例提供的临时索引表的再一种状态图。
图10示出了根据本申请又一个实施例的文本匹配方法流程图。
图11示出了本申请又一个实施例提供的预设索引表的示意图。
图12示出了本申请又一个实施例提供的文本匹配方法的耗时示意图。
图13示出了根据本申请一个实施例的文本匹配装置的一种框图。
图14是本申请实施例的用于执行根据本申请实施例的文本匹配方法的电子设备的框图。
图15是本申请实施例的用于保存或者携带实现根据本申请实施例的文本匹配方法的程序代码的存储单元。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
在自然语言处理(NLP,Natural Language Processing)领域中,关键词匹配的任务应用较为广泛。例如,在违规内容审核中,对用户发表的评论或文章进行违规关键词匹配,防范大量违规文本的传播;在智能客服中,对客户的提问文本进行关键词匹配,辅助决策场景问答。随着业务的增长,关键词库容量迅速扩大,当关键词数量超过5万条时,可称为海量关键词,也就是说经常存在需要对文本中的海量关键词进行匹配。在海量关键词匹配任务中,需要同时保证关键词匹配的低耗时和灵活性,技术上具有挑战性。
在传统的关键词匹配中,通常通过正则表达式和字典树对关键词进行匹配。
其中,正则表达式使用单个字符串来描述、匹配一系列符合某个句法规则的字符串。由于匹配句法的丰富度,正则表达式匹配具有极大的灵活性。当存在海量关键词时,需要循环调用每个关键词的匹配句法。示例说明:如有以下关键词组合:下载...游戏,其中“...”表示任意字符。被匹配的文本:“我打算下载A游戏”,可以利用规则句法为:“下载\S+游戏”进行匹配。若待匹配文本的文本长度为n,关键词库词数量为m,则正则表达式匹配关键词库的时间复杂度为:O(m*n),即对于每个关键词组合,需要利用每个关键词组合对应的正则表达式对待匹配文本进行匹配。
字典树又称单词查找树,Trie树,是一种树形结构,是一种哈希树的变种。字典树可以将关键词存储为树结构,树的每个节点为一个字符。例如,关键词包括:苹果、应用、回归和回国,则构建的字典树如图1所示,在利用字典树匹配关键词的算法(Trie 算法)中,需要遍历一次所需匹配的文本,如果遍历沿路发现与字典树中完全相同的字符串,且获得最后一个节点的标志位,则匹配到该关键词。若待匹配文本的文本长度为n,则字典树匹配的时间复杂度为O(n),也就是说,需要利用字典树对对待匹配文本进行一遍的遍历,完成对待匹配文本中关键词的匹配。
发明人经过长期研究发现,通过正则表达式对关键词进行匹配的方式中,其匹配时间复杂度为O(m*n),在海量关键词的场景下,匹配耗时大。通过字典树对关键词进行匹配的方式中,其时间复杂度,无论关键词数量是多少,均为O(n)。但灵活性低,仅适用于单个词匹配,例如,在以上实例中,苹果、应用等词只能单独匹配,无法完成两个及两个以上多词组合(如苹果…应用,“...”表示任意字符),在实际业务中,其使用具有局限性。
针对上述问题,发明人提出了本申请实施例提供的文本匹配方法、装置、电子设备以及存储介质,可以实现结合关键词的字典树,以及匹配条件形成的索引表,对待匹配文本进行匹配,以确定待匹配文本是否满足匹配条件,降低文本匹配的复杂度,从而提升文本匹配的效率。其中,具体的文本匹配方法在后续的实施例中进行详细的说明。
请参阅图2,图2示出了本申请一个实施例提供的文本匹配方法的流程示意图。在具体的实施例中,所述文本匹配方法应用于如图13所示的文本匹配装置400以及配置有所述文本匹配装置400的移动终端100(图14)。下面将以移动终端为例,说明本实施例的具体流程,当然,可以理解的,本实施例所应用的电子设备可以为服务器、PC电脑、笔记本电脑等,在此不做限定。下面将针对图2所示的流程进行详细的阐述,所述文本匹配方法具体可以包括以下步骤:
步骤S110:获取待匹配文本。
在本申请实施例中,电子设备可以获取待匹配文本,即需要进行确定是否满足匹配条件的文本。待处理文本可以为从网页内容中提取的文本,也可以为聊天内容中提取的文本,也可以为语音转文字而获得的文本,还可以为从文本图像中提取出的文本等,具体的文本可以不作为限定。
在一些实施方式中,电子设备可以通过在线的方式获取待匹配文本,网络从其他设备获取待匹配文本。例如,在线上的发布内容审核的场景中,电子设备可以为服务器,并且可以接收其他用户上传的文本内容,并对文本内容,进行关键词的匹配的条件的匹配,以完成内容审核;又例如,在智能客服的场景中,可以对用户发送的提问文本进行关键词的匹配条件的匹配,并根据匹配结果进行问答。当然,具体在线获取待匹配文本的方式可以不作为限定。
在一些实施方式中,电子设备也可以从本地存储的文件获取待匹配文本,也就是说,待处理的文本被预先存储于电子设备本地。例如,在教师审核作业的场景中,教师可以将待审核的作业文本存储于电子设备,待需要审核时,读取出作业文本,进行关键词的匹配,以完成相应的审核工作等。当然,电子设备从本地获取待匹配文本的场景可以不作为限定。
在一些实施方式中,待匹配文本中的文本长度及文字数量可以不作为限定,例如,待匹配文本可以为一段话,也可以为一句话等。
步骤S120:根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配。
在本申请实施例中,可以预先建立有关键词的字典树,并且将字典树存储于电子设备。其中,关键词可以为匹配条件中所包括的关键词,匹配条件可以为一个,也可以为多个。每个匹配条件可以为单个的关键词,也可以为多个关键词的组合。可以理解的,关键词的字典树中可以包括所有匹配条件中包括的关键词。例如,匹配条件可以为“关键词1”;又例如,匹配条件可以为“关键词1…关键词2”,“…”为任意字 符。当然,以上匹配条件仅为举例,匹配条件中关键词的具体数量可以不作为限定。
在一些实施方式中,关键词的字典树可以由电子设备创建后,并将关键词的字典树存储于本地,从而在需要对待匹配文本进行处理时,从本地读取字典树。
在另一些实施方式中,关键词的字典树也可以由其他设备(例如服务器等)创建后,再有该其他设备发送至电子设备,然后电子设备将其进行存储,以在需要对待匹配文本进行处理时,从本地读取字典树。
在又一些实施方式中,电子设备也可以每次在需要对待匹配文本进行处理时,从其他设备(例如服务器等)实时获取关键词的字典树,然后将获得的字典树进行缓存,以用于本次对待匹配文本的处理。
示例性的,请参阅图3,图3示出了本申请实施例提供的字典树的一种示意图,匹配条件包括:“球场”、“球场…打球”、“国家队…队长”、“球场…队长”,则匹配条件对应的所有关键词可以包括:“球场”、“国家队”、“队长”和“打球”,因此,可以建立包括有“球场”、“国家队”、“队长”和“打球”的字典树。
在一些实施方式中,电子设备在获取到待匹配文本之后,可以根据以上建立的关键词的字典树,对待匹配文本中存在的关键词进行匹配。具体地,电子设备利用字典树匹配算法,利用字典树遍历待匹配文本,如果遍历沿路发现与字典树中完全相同的字符串,且获得最后一个节点的标志位,则匹配到该关键词。
步骤S130:当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,所述预设索引表包括至少一个匹配条件对应的索引,所述匹配条件包括一个或多个关键词。
在本申请实施例中,还可以预先创建有匹配条件对应的预设索引表,其中,该预设索引表中可以包括有每个匹配条件对应的索引,每个索引可以中可以包括有其对应的匹配条件中的关键词。
在一些实施方式中,索引可以为预设索引表中的行内容、或者列内容。当索引为预设索引表中的列内容时,则预设索引表的一列中的关键词可以构成该列对应的索引所对应的匹配条件;当索引为预设索引表中的行内容时,则预设索引表的一行中的关键词可以构成该行对应的索引所对应的匹配条件。
在一些实施方式中,预设索引表中除了包括每个匹配条件对应的索引的内容之外,还可以包括其他信息,例如,还可以包括每个索引中关键词在预设索引表中的位置、每个索引中关键词的数量(即该索引对应的匹配条件中关键词的数量)、每个索引的索引标识等。
示例性的,请参阅图4,图4示出了本申请实施例提供的预设索引表的示意图,匹配条件包括:“球场”、“球场…打球”、“国家队…队长…打球”、“球场…队长”,“球场…国家队”,预设索引表中可以包括有索引标识,在预设索引表中,索引为预设索引表中的行内容,预设索引表中包括每个索引的索引标识,每个索引中关键词的位置标识,以及每个索引中关键词的数量,例如,匹配条件“球场…打球”,在该预设索引表中对应的索引标识为2,行内容中包括关键词“球场”和“打球”,其对应的关键词数量为2。
在一些实施方式中,预设索引表中的大小(行和列)可以根据匹配条件的数量,以及匹配条件中关键词的最大数量确定。例如,如以上示例中,匹配条件包括5个,则预设索引表包括5行内容,并且关键词的最大数量为3,则预设索引表包括3列内容。
在本申请实施例中,当电子设备匹配到所述待匹配文本中的关键词时,可以根据预设索引表,查询匹配到的关键词在预设索引表中的目标位置,以便后续将匹配到的关键词添加至临时索引表中。例如,以上示例中,在匹配到关键词“球场”时,可以通过查询预设索引表,而查询到“球场”位于索引1、索引2、索引4和索引5中的第一个位置,即位置标识1。
在一些实施方式中,电子设备可以在查询到待匹配文本中的所有关键词之后,查询每个关键词在预设索引表中的目标位置;电子设备也可以在每查到一个关键词时,执行查询该关键词在预设索引表中的目标位置。
步骤S140:将所述匹配到的关键词添加至临时索引表中的所述目标位置,所述临时索引表为与所述预设索引表的结构相同的空白索引表。
在本申请实施例中,电子设备在查询获得匹配到的关键词在预设索引表中的目标位置之后,则可以将匹配到的关键词添加至临时索引表中的目标位置,以便根据临时索引表中已添加的关键词,确定待匹配文本是否满足匹配条件。该临时索引表与预设索引表相对应,临时索引表可以为与预设索引表的结构相同的空白索引表,其中,临时索引表与预设索引表的行数、列数可以相同,并且预设索引表中包括的其他参数信息也可以相同。例如,请参阅图5,图5是与图4所示的预设索引表相对应的临时索引表,两者的结构以及包括的信息相同,与预设索引表不同的是,临时索引表中各个关键词的位置为空。
步骤S150:根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
在本申请实施例中,电子设备在将匹配到的关键词添加至临时索引表之后,则可以根据临时索引表中已添加的关键词,确定待匹配文本是否满足匹配条件。
在一些实施方式中,可以根据临时索引表中每个索引中(行或者列)中关键词的数量,是否与记录该索引的关键词的数量相同,从而确定是否满足该索引对应的匹配条件,也就是说,如果相同,则表示待匹配文本中包括有该索引的关键词,即满足该索引对应的匹配条件。当然,也可以直接比较临时索引表中索引中的关键词是否与预设索引表中的关键词是否相同,如果相同,同样表示待匹配文本中包括有该索引的关键词,即满足该索引对应的匹配条件。
在根据临时每个索引中添加的关键词,进行是否满足该索引对应的匹配条件的确定之后,电子设备可以得出待匹配文本满足的匹配条件,从而获得对待匹配文本进行匹配条件的匹配的结果。
通过本申请实施例提供的文本匹配方法,通过字典树来确定待匹配文本中的关键词,然后根据预先建立的预设索引表,将匹配出的关键词添加至临时索引表,然后只需要根据临时索引表中添加的关键词,即可确定出待匹配文本是否满足相应的匹配条件,无需对于每个关键词组合,需要利用每个关键词组合对应的正则表达式对待匹配文本进行匹配,从而有效节省了关键词匹配的时间,提升了关键词匹配的效率。
请参阅图6,图6示出了本申请另一个实施例提供的文本匹配方法的流程示意图。该方法应用于上述电子设备,下面将针对图6所示的流程进行详细的阐述,所述文本匹配方法具体可以包括以下步骤:
步骤S210:获取待匹配文本。
步骤S220:根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配。
在本申请实施例中,步骤S210及步骤S220可以参阅前述实施例中的内容,在此不再赘述。
步骤S230:当根据所述字典树,匹配到所述待匹配文本中的一个关键词时,查询当前匹配到的关键词在预设索引表中的目标位置,所述预设索引表包括至少一个匹配条件对应的索引,所述匹配条件包括一个或多个关键词。
在本申请实施例中,电子设备也可以在每查到一个关键词时,执行查询当前匹配到的关键词在预设索引表中的目标位置,以便将当前匹配到的关键词添加至临时索引表。可以理解的,目标位置可能为一个位置,也可能为多个位置,即当该关键词处于多个匹配条件对应的索引中时,则目标位置为多个位置,当该关键词处于1个匹配条 件对应的索引中时,则目标位置为1个位置。
步骤S240:将所述匹配到的关键词添加至临时索引表中的所述目标位置,所述临时索引表为与所述预设索引表的结构相同的空白索引表。
在本申请实施例中,电子设备在确定出当前匹配到的关键词在预设索引表中的目标位置,则可以根据目标位置,将当前匹配到的关键词添加至临时索引表。
步骤S250:根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
在本申请实施例中,在将当前匹配到的关键词添加至临时索引表之后,则可以根据当前临时索引表中已添加的关键词,确定待匹配文本所匹配的匹配条件。
例如,预设索引表为图4所示,临时索引表为图5所示,待匹配文本为“教练,我想去球场看国家队打球”,则匹配到的第一个关键词为球场,因此,如图7所示,在查询到“球场”在索引标识1、2、4和5中的位置标识为1之后,则可以分别添加到索引标识1、2、4和5的索引中的第一个位置。此时可以根据当前临时索引表中已添加的关键词,确定待匹配文本所满足的匹配条件,例如,可以比较每个索引中目前的关键词的数量是否与记录的该索引中关键词的总数相同,如以上示例中,当前索引标识1、2、4和5的索引中关键词的数量分别为1、1、0、1、1,此时只有索引标识1对应的索引中关键词的数量与预设索引表中索引标识1对应的索引中关键词的数量相同,因此当前确定出待匹配文本满足索引标识1对应的索引所对应的匹配条件,即“球场”。
步骤S260:重复执行所述根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配的步骤,至所述根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件的步骤,直至所述待匹配文本中所有存在的关键词全部被添加至所述临时索引表之后,根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
在本申请实施例中,由于是每匹配到待匹配文本中的一个关键词时,将关键词添加到预设索引表之后,进行是否满足匹配条件的确定。而待匹配文本中可能还存在其他的关键词,因此可以继续去执行关键词的匹配,以将匹配到的关键词添加至临时索引表中,再进行根据临时索引表中已添加的关键词,确定待匹配文本是否满足相应的匹配条件。从而,电子设备可以重复执行步骤S230至步骤S250。当利用字典树遍历完待匹配文本之后,即确定出所有关键词,并且将最后一个匹配到的关键词添加至临时索引表,并根据临时索引表中已添加的关键词,确定待匹配文本是否满足相应的匹配条件之后,则可以结束流程,从而可以获得待匹配文本与匹配条件进行匹配的结果。
示例性的,如以上示例中,待匹配文本为“教练,我想去球场看国家队打球”,则匹配到的第二个关键词为“国家队”,因此,如图8所示,在查询到“国家队”在索引标识3的索引中的位置标识为1,在索引标识5的索引中的位置标识为2之后,则可以将“国家队”分别添加到索引标识2的索引中的第一个位置,以及索引标识5的索引中的第二个位置。此时可以根据当前临时索引表中已添加的关键词,确定待匹配文本所满足的匹配条件,例如,可以比较每个索引中目前的关键词的数量是否与记录的该索引中关键词的总数相同,如以上示例中,当前索引标识1、2、4和5的索引中关键词的数量分别为1、1、1、1、2,此时有索引标识1对应的索引中关键词的数量与预设索引表中索引标识1对应的索引中关键词的数量相同,因此当前确定出待匹配文本满足索引标识1对应的索引所对应的匹配条件,即“球场”,并且索引标识5对应的索引中关键词的数量与预设索引表中索引标识5对应的索引中关键词的数量相同,因此当前还可以确定出待匹配文本满足索引标识5对应的索引所对应的匹配条件,即“球场…国家队”。
同理,匹配到的第三个关键词为“打球”,因此,如图9所示,在查询到“打球” 在索引标识2的索引中的位置标识为2,在索引标识3的索引中的位置标识为3之后,则可以将“打球”分别添加到索引标识3的索引中的第二个位置,以及索引标识3的索引中的第三个位置。此时可以根据当前临时索引表中已添加的关键词,确定待匹配文本所满足的匹配条件,例如,可以比较每个索引中目前的关键词的数量是否与记录的该索引中关键词的总数相同,如以上示例中,当前索引标识1、2、4和5的索引中关键词的数量分别为1、2、2、1、2,此时可以确定出索引标识1对应的索引中关键词的数量与预设索引表中索引标识1对应的索引中关键词的数量相同,索引标识2对应的索引中关键词的数量与预设索引表中索引标识2对应的索引中关键词的数量相同,以及索引标识5对应的索引中关键词的数量与预设索引表中索引标识5对应的索引中关键词的数量相同,因此当前确定出待匹配文本满足索引标识1对应的索引所对应的匹配条件,即“球场”,满足索引标识2对应的索引所对应的匹配条件,即“球场…打球”,以及满足索引标识5对应的索引所对应的匹配条件,即“球场…国家队”。
在一些实施方式中,电子设备在重复执行步骤S230至步骤S250的过程中,还可以在每次确定出待匹配文本满足的匹配条件之后,将该匹配条件对应的索引从临时索引表中进行相应处理,以避免在添加下一个关键词之后,还会对该索引中添加的关键词进行判断,而浪费处理的时间。因此,在重复执行步骤S230至步骤S250之前,该文本匹配方法还可以包括:如果确定出待匹配文本满足指定索引对应的匹配条件,则对临时索引表中的指定索引进行预设处理,预设处理用于指示指定索引对应的匹配条件已被匹配。其中,该指定索引为任一索引。通过对指定索引进行预设处理后,则电子设备在后续对下一个匹配到的关键词进行处理时,即添加下一个关键词之后,不会再对该索引中添加的关键词进行判断,从而进一步地节省了文本匹配的时间,提升了关键词匹配的效率。
作为一种实施方式,电子设备对临时索引表中的指定索引进行预设处理,可以为,将指定索引对应的行或列从临时索引表中删除;作为另一种实施方式,电子设备对临时索引表中的指定索引进行预设处理,可以为,在临时索引表中对指定索引对应的行或列进行标记。当然,具体的预设处理可以不作为限定,预设处理仅需要使得电子设备在对下一个匹配到的关键词进行处理时,不会对临时索引表中该指定索引再进行处理即可,以提升关键词匹配的效率。
在一些实施方式中,电子设备在利用字典树遍历完待匹配文本之后,即确定出所有关键词,并且将最后一个匹配到的关键词添加至临时索引表,并根据临时索引表中已添加的关键词,确定待匹配文本满足的所有匹配条件之后,还可以将待匹配文本满足的所有匹配条件进行输出,以便用户了解到本次对待匹配文本进行匹配条件的匹配的结果,即关键词匹配的结果。例如,可以将满足的所有匹配条件进行显示,或者进行语音播放,在此不做限定。
通过本申请实施例提供的文本匹配方法,通过字典树来确定待匹配文本中的关键词,然后根据预先建立的预设索引表,在每匹配出一个关键词后,将该关键词添加至临时索引表,然后只需要根据临时索引表中添加的关键词,即可确定出待匹配文本是否满足相应的匹配条件,再重复的执行将每次匹配到的关键词添加至临时索引表,然后确定待匹配文本是否满足相应的匹配条件,最后可以确定出整个待匹配文本满足的匹配条件,无需对于每个关键词组合,需要利用每个关键词组合对应的正则表达式对待匹配文本进行匹配,从而有效节省了关键词匹配的时间,提升了关键词匹配的效率。
请参阅图10,图10示出了本申请又一个实施例提供的文本匹配方法的流程示意图。该方法应用于上述电子设备,下面将针对图10所示的流程进行详细的阐述,所述文本匹配方法具体可以包括以下步骤:
步骤S310:获取每个匹配条件中的关键词。
在本申请实施例中,电子设备可以获取匹配条件中的关键词,以预先构建字典树。 其中,匹配条件可以为多个,电子设备可以获取每个匹配条件中的关键词。
在一些实施方式中,电子设备可以通过从其他设备获取匹配条件,或者获取用户输入的匹配条件。当获得到的匹配条件为仅包括关键词的匹配条件时,电子设备可以直接获取到匹配条件中的关键词;当获得的匹配条件为文字形式时,电子设备可以对匹配条件进行分词后,确定出匹配条件中的关键词,例如匹配条件为:在球场打球,则关键词可以确定为“球场”“打球”。
步骤S320:根据所述关键词,构建字典树。
在本申请实施例中,电子设备在获得到每个匹配条件中的关键词之后,则可以构建字典树。具体地,电子设备可以根据关键词的首个文字,确定字典树中的各个分支的首个节点,然后再依次将关键词中的其他文字作为节点连接至首个节点,从而构建出关键词的字典树。电子设备在构建出关键词的字典树之后,则可以将字典树进行存储。
步骤S330:根据每个匹配条件中的关键词,生成一个索引并添加至初始索引表,获得所述预设索引表。
在本申请实施例中,电子设备还可以预先根据每个匹配条件中的关键词,对预设索引表进行构建。
在一些实施方式中,电子设备可以将每个匹配条件中的关键词,添加至初始索引表中一行或者一列,获得预设索引表,其中,每个匹配条件所在行或列作为匹配条件对应的索引。可以理解的,初始索引表可以为空白的索引表,并且初始表中的行数和列数可以不限,但是应当满足匹配条件所需的最低需求。
进一步地,电子设备在将每个匹配条件中的关键词,添加至初始索引表中一行或者一列之后,还可以将初始索引表中不存在关键词的位置进行删除,从而获得最终的预设索引表。如图11所示,预设索引表中的行和列中仅包括存在关键词的位置,从而使预设索引表简洁话,不仅能够节省空间,还能提升电子设备在临时索引表中已添加的关键词,确定待匹配文本是否满足匹配条件时的效率。
在一些实施方式中,电子设备还可以记录预设索引表中每个索引的关键词的位置以及每个索引的关键词的数量,例如,图11所示的预设索引表中,每个索引的关键词的位置标识以及关键词的数量被记录于预设索引表中,从而使得电子设备在使用预设索引表对匹配的关键词进行处理时能够更加的快捷。
步骤S340:获取待匹配文本。
步骤S350:根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配。
步骤S360:当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,所述预设索引表包括至少一个匹配条件对应的索引,所述匹配条件包括一个或多个关键词。
步骤S370:将所述匹配到的关键词添加至临时索引表中的所述目标位置,所述临时索引表为与所述预设索引表的结构相同的空白索引表。
步骤S380:根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
在本申请实施例中,电子设备可以是查询到待匹配文本中的所有关键词之后,查询每个关键词在预设索引表中的目标位置,然后进行步骤S370和S380,从而确定出待匹配文本所满足的匹配条件。如上个实施例所述的内容,电子设备也可以每匹配到一个关键词时,进行步骤S370和S380,直到最后一个匹配到的关键词被添加至预设索引表中,并进行匹配条件的确定。
在一些实施方式中,电子设备根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件,可以包括:
将临时索引表的每个索引中已添加的关键词与预设索引表中对应的索引的关键词进行比较;根据比较结果,确定待匹配文本是否满足匹配条件。
作为一种方式,电子设备可以获取临时索引表的每个索引中已添加的关键词的数量,然后将每个索引中已添加的关键词的数量与预设索引表中对应的索引的关键词数量进行比较;如果目标索引中已添加的关键词的数量与预设索引表中目标索引的关键词数量相同,则确定待匹配文本满足所述目标索引对应的匹配条件,目标索引为临时索引表中的任一索引。如果不存在任一索引中已添加的关键词的数量与预设索引表中相同索引的关键词数量相同,则确定待匹配文本不满足任一匹配条件。可以理解的,由于临时索引表的结构与预设索引表的结构相同,并且每个匹配出的关键词添加的位置与预设索引表中关键词的位置相同,因此在任一索引中添加的关键词的数量与预设索引表中该索引中关键词的数量相同时,则表示待匹配文本中包括有该索引的关键词,即满足该索引对应的匹配条件。
作为另一种方式,电子设备也可以比较每个索引中已添加的关键词与预设索引表中对应的索引的关键词是否相同;如果目标索引中已添加的关键词与预设索引表中目标索引的关键词相同,则确定待匹配文本满足目标索引对应的匹配条件,目标索引为临时索引表中的任一索引。如果不存在任一索引中已添加的关键词与预设索引表中相同索引的关键词相同,则确定待匹配文本不满足任一匹配条件。可以理解的,如果任一索引中添加的关键词与预设索引表中该索引中关键词相同时,则表示待匹配文本中包括有该索引的关键词,即满足该索引对应的匹配条件。
在一些实施方式中,由于预设索引表可以中可以不存在除索引中的关键词的位置以外的其余位置,例如,如图10所示的预设索引表中,预设索引表中的行和列中仅包括存在关键词的位置。因此,电子设备可以确定每个索引中是否存在空位,即根据临时索引表中已添加的关键词,确定临时索引表中每个索引对应的行或列中是否存在空余位置;如果目标索引对应的行或列中不存在空余位置,则表示临时索引表中该目标索引对应的行或列中均被填满,因此待匹配文本中包括有该目标索引的关键词,从而可以确定待匹配文本满足目标索引对应的匹配条件,目标索引为临时索引表中的任一索引。通过利用空位的方式确定待匹配文本是否满足相应索引对应的匹配条件,由于只需要判断空余位置是否为0,因此可以进一步地提升文本的关键词匹配的效率。
下面通过示例性的说明本申请实施例提供的文本匹配方法所带来的匹配效率的提升。若含有组合关键词数量为m,被匹配的文本长度为n,则正则表达式(re)匹配关键词库的时间复杂度仍为O(m*n),本申请实施例提供的文本匹配方法的时间复杂度为O(n),但Trie算法不能直接匹配组合词,co-Trie算法的时间复杂度为O(n)+O(mp),其中p为词库中平均词频。对于一般场景,p远小于m,理论上,本申请实施例提供的文本匹配方法耗时上低于正则表达式匹配的方法,并接近字典树算法。图12示出了本申请实施例提供的文本匹配方法与正则表达式匹配的方法的耗时对比实验的实验结果图,其中,被匹配的文本的字符串长度为200。从图12中可以发现,随着关键词数量的增加,本申请实施例提供的文本匹配方法的耗时基本不变,而正则表达式匹配的方法的耗时,随着关键词数量的增加而增加,在海量关键词的匹配中,本申请实施例提供的文本匹配方法的耗时明显低于正则表达式匹配的方法。
请参阅图13,其示出了本申请实施例提供的一种文本匹配装置400的结构框图。该文本匹配装置400应用上述的电子设备,该文本匹配装置400包括:文本获取模块410、关键词匹配模块420、位置查询模块430、关键词添加模块440以及条件确定模块450。其中,所述文本获取模块410用于获取待匹配文本;所述关键词匹配模块420用于根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配;所述位置查询模块430用于当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,所述预设索引表包括至少一个匹配条件对应的索引,所 述匹配条件包括一个或多个关键词;所述关键词添加模块440用于将所述匹配到的关键词添加至临时索引表中的所述目标位置,所述临时索引表为与所述预设索引表的结构相同的空白索引表;所述条件确定模块450用于根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
在一些实施方式中,所述预设索引表包括多个匹配条件对应的索引。条件确定模块450可以包括:比较单元以及第一确定单元。其中,比较单元用于将所述临时索引表的每个索引中已添加的关键词与所述预设索引表中对应的索引的关键词进行比较;第一确定单元用于根据比较结果,确定所述待匹配文本是否满足所述匹配条件。
作为一种实施方式,比较单元可以包括:数量获取子单元以及数量比较子单元。其中,数量获取子单元用于获取所述临时索引表的每个索引中已添加的关键词的数量;数量比较子单元用于将所述每个索引中已添加的关键词的数量与所述预设索引表中对应的索引的关键词数量进行比较。
在该实施方式中,第一确定单元可以具体用于:如果目标索引中已添加的关键词的数量与所述预设索引表中所述目标索引的关键词数量相同,则确定所述待匹配文本满足所述目标索引对应的匹配条件,所述目标索引为所述临时索引表中的任一索引。
作为另一种实施方式,比较单元可以包括:关键词比较子单元。关键词比较子单元用于比较所述每个索引中已添加的关键词与所述预设索引表中对应的索引的关键词是否相同。
在该实施方式中,第一确定单元可以具体用于:如果目标索引中已添加的关键词与所述预设索引表中所述目标索引的关键词相同,则确定所述待匹配文本满足所述目标索引对应的匹配条件,所述目标索引为所述临时索引表中的任一索引。
在另一些实施方式中,所述预设索引表包括多个匹配条件对应的索引,所述预设索引表以及所述临时索引表中不存在除所述索引中的关键词的位置以外的其余位置。条件确定模块450可以包括:位置确定单元以及第二确定单元。其中,位置确定单元用于根据所述临时索引表中已添加的关键词,确定所述临时索引表中每个索引对应的行或列中是否存在空余位置;第二确定单元用于如果目标索引对应的行或列中不存在空余位置,则确定所述待匹配文本满足所述目标索引对应的匹配条件,所述目标索引为所述临时索引表中的任一索引。
在一些实施方式中,位置查询模块430可以具体用于:当根据所述字典树,匹配到所述待匹配文本中的一个关键词时,查询当前匹配到的关键词在预设索引表中的目标位置。
在该实施方式中,该文本匹配装置400还可以包括:重复执行模块。重复执行模块用于:重复执行所述根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配的步骤,至所述根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件的步骤,直至所述待匹配文本中所有存在的关键词全部被添加至所述临时索引表之后,根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
在该实施方式中,该文本匹配装置400还可以包括:索引处理模块。索引处理模块用于在所述重复执行所述根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配的步骤,至所述根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件的步骤之前,如果确定出所述待匹配文本满足指定索引对应的匹配条件,则对所述临时索引表中的所述指定索引进行预设处理,所述预设处理用于指示所述指定索引对应的匹配条件已被匹配。
作为一种方式,索引处理模块可以具体用于:将所述指定索引对应的行或列从所述临时索引表中删除。
作为另一种方式,索引处理模块也可以具体用于:在所述临时索引表中对所述指 定索引对应的行或列进行标记。
在一些实施方式中,位置查询模块430也可以具体用于:当匹配出所述待匹配文本中存在的所有关键词后,查询所述所有关键词中每个关键词在所述预设索引表中的目标位置。
在一些实施方式中,该文本匹配装置400还可以包括:输出模块。输出模块用于在根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件之后,输出所述待匹配文本满足的所有匹配条件。
在一些实施方式中,该文本匹配装置400还可以包括:关键词获取模块以及索引生成模块。关键词获取模块用于在当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置之前,获取每个匹配条件中的关键词;索引生成模块用于根据每个匹配条件中的关键词,生成一个索引并添加至初始索引表,获得所述预设索引表。
在该实施方式中,索引生成模块可以具体用于:将每个匹配条件中的关键词,添加至所述初始索引表中一行或者一列,获得所述预设索引表,其中,每个匹配条件所在行或列作为匹配条件对应的索引。
进一步地,索引生成模块还可以用于:将所述初始索引表中不存在关键词的位置进行删除。
在一些实施方式中,该文本匹配装置400还可以包括:索引记录模块。索引记录模块用于在所述根据每个匹配条件中的关键词,生成一个索引并添加至初始索引表,获得所述预设索引表之后,记录所述预设索引表中每个索引的关键词的位置以及每个索引的关键词的数量。
在一些实施方式中,该文本匹配装置400还可以包括:关键词获取模块以及字典树构建模块。关键词获取模块用于在所述根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配之前,字典树构建模块用于根据所述关键词,构建字典树。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,模块相互之间的耦合可以是电性,机械或其它形式的耦合。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
综上所述,本申请提供的方案,通过获取待匹配文本,根据预先建立的关键词的字典树,对待匹配文本中的关键词进行匹配,当匹配到待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,该预设索引表包括至少一个匹配条件对应的索引,匹配条件中包括一个或多个关键词,再将匹配到的关键词添加至临时索引表中的该目标位置,该临时索引表为与预设索引表的结构相同的空白索引表,然后根据临时索引表中已添加的关键词,确定待匹配文本是否满足匹配条件,从而实现结合关键词的字典树,以及匹配条件形成的索引表,对待匹配文本进行匹配,以确定待匹配文本是否满足匹配条件,降低文本匹配的复杂度,从而提升文本匹配的效率。
请参考图14,其示出了本申请实施例提供的一种电子设备的结构框图。该电子设备100可以是服务器、PC电脑、笔记本电脑、移动终端等能够运行应用程序的电子设备。本申请中的电子设备100可以包括一个或多个如下部件:处理器110、存储器120以及一个或多个应用程序,其中一个或多个应用程序可以被存储在存储器120中并被配置为由一个或多个处理器110执行,一个或多个程序配置用于执行如前述方法实施例所描述的方法。
处理器110可以包括一个或者多个处理核。处理器110利用各种接口和线路连接整个电子设备100内的各个部分,通过运行或执行存储在存储器120内的指令、程序、代码集或指令集,以及调用存储在存储器120内的数据,执行电子设备100的各种功能和处理数据。可选地,处理器110可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器110可集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示内容的渲染和绘制;调制解调器用于处理无线通信。可以理解的是,上述调制解调器也可以不集成到处理器110中,单独通过一块通信芯片进行实现。
存储器120可以包括随机存储器(Random Access Memory,RAM),也可以包括只读存储器(Read-Only Memory)。存储器120可用于存储指令、程序、代码、代码集或指令集。存储器120可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储电子设备100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。
请参考图15,其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质800中存储有程序代码,所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。
计算机可读存储介质800可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地,计算机可读存储介质800包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质800具有执行上述方法中的任何方法步骤的程序代码810的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码810可以例如以适当形式进行压缩。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。
Claims (20)
- 一种文本匹配方法,其特征在于,所述方法包括:获取待匹配文本;根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配;当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,所述预设索引表包括至少一个匹配条件对应的索引,所述匹配条件包括一个或多个关键词;将所述匹配到的关键词添加至临时索引表中的所述目标位置,所述临时索引表为与所述预设索引表的结构相同的空白索引表;根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
- 根据权利要求1所述的方法,其特征在于,所述预设索引表包括多个匹配条件对应的索引,所述根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件,包括:将所述临时索引表的每个索引中已添加的关键词与所述预设索引表中对应的索引的关键词进行比较;根据比较结果,确定所述待匹配文本是否满足所述匹配条件。
- 根据权利要求2所述的方法,其特征在于,所述将所述临时索引表的每个索引中已添加的关键词与所述预设索引表中对应的索引的关键词进行比较,包括:获取所述临时索引表的每个索引中已添加的关键词的数量;将所述每个索引中已添加的关键词的数量与所述预设索引表中对应的索引的关键词数量进行比较;所述根据比较结果,确定所述待匹配文本是否满足所述匹配条件,包括:如果目标索引中已添加的关键词的数量与所述预设索引表中所述目标索引的关键词数量相同,则确定所述待匹配文本满足所述目标索引对应的匹配条件,所述目标索引为所述临时索引表中的任一索引。
- 根据权利要求2所述的方法,其特征在于,所述将所述每个索引中已添加的关键词与所述预设索引表中对应的索引的关键词进行比较,包括:比较所述每个索引中已添加的关键词与所述预设索引表中对应的索引的关键词是否相同;所述根据比较结果,确定所述待匹配文本是否满足所述匹配条件,包括:如果目标索引中已添加的关键词与所述预设索引表中所述目标索引的关键词相同,则确定所述待匹配文本满足所述目标索引对应的匹配条件,所述目标索引为所述临时索引表中的任一索引。
- 根据权利要求1所述的方法,其特征在于,所述预设索引表包括多个匹配条件对应的索引,所述预设索引表以及所述临时索引表中不存在除所述索引中的关键词的位置以外的其余位置,所述根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件,包括:根据所述临时索引表中已添加的关键词,确定所述临时索引表中每个索引对应的行或列中是否存在空余位置;如果目标索引对应的行或列中不存在空余位置,则确定所述待匹配文本满足所述目标索引对应的匹配条件,所述目标索引为所述临时索引表中的任一索引。
- 根据权利要求1-5任一项所述的方法,其特征在于,所述当匹配到所述待匹配文 本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,包括:当根据所述字典树,匹配到所述待匹配文本中的一个关键词时,查询当前匹配到的关键词在预设索引表中的目标位置。
- 根据权利要求6所述的方法,其特征在于,在所述根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件之后,所述方法还包括:重复执行所述根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配的步骤,至所述根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件的步骤,直至所述待匹配文本中所有存在的关键词全部被添加至所述临时索引表之后,根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
- 根据权利要求7所述的方法,其特征在于,在所述重复执行所述根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配的步骤,至所述根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件的步骤之前,所述方法还包括:如果确定出所述待匹配文本满足指定索引对应的匹配条件,则对所述临时索引表中的所述指定索引进行预设处理,所述预设处理用于指示所述指定索引对应的匹配条件已被匹配。
- 根据权利要求8所述的方法,其特征在于,所述对所述临时索引表中的所述指定索引进行预设处理,包括:将所述指定索引对应的行或列从所述临时索引表中删除。
- 根据权利要求8所述的方法,其特征在于,所述对所述临时索引表中的所述指定索引进行预设处理,包括:在所述临时索引表中对所述指定索引对应的行或列进行标记。
- 根据权利要求1-5任一项所述的方法,其特征在于,所述当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,包括:当匹配出所述待匹配文本中存在的所有关键词后,查询所述所有关键词中每个关键词在所述预设索引表中的目标位置。
- 根据权利要求1-11任一项所述的方法,其特征在于,在根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件之后,所述方法还包括:输出所述待匹配文本满足的所有匹配条件。
- 根据权利要求1-12任一项所述的方法,其特征在于,在当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置之前,所述方法还包括:获取每个匹配条件中的关键词;根据每个匹配条件中的关键词,生成一个索引并添加至初始索引表,获得所述预设索引表。
- 根据权利要求13所述的方法,其特征在于,所述根据每个匹配条件中的关键词,生成一个索引并添加至初始索引表,获得所述预设索引表,包括:将每个匹配条件中的关键词,添加至所述初始索引表中一行或者一列,获得所述预设索引表,其中,每个匹配条件所在行或列作为匹配条件对应的索引。
- 根据权利要求14所述的方法,其特征在于,在所述将每个匹配条件中的关键词,添加至所述初始索引表中一行或者一列之后,所述方法还包括:将所述初始索引表中不存在关键词的位置进行删除。
- 根据权利要求13-15任一项所述的方法,其特征在于,在所述根据每个匹配条件中的关键词,生成一个索引并添加至初始索引表,获得所述预设索引表之后,所述方法还包括:记录所述预设索引表中每个索引的关键词的位置以及每个索引的关键词的数量。
- 根据权利要求1-16任一项所述的方法,其特征在于,在所述根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配之前,所述方法还包括:获取每个匹配条件中的关键词;根据所述关键词,构建字典树。
- 一种文本匹配装置,其特征在于,所述装置包括:文本获取模块、关键词匹配模块、位置查询模块、关键词添加模块以及条件确定模块,其中,所述文本获取模块用于获取待匹配文本;所述关键词匹配模块用于根据预先建立的关键词的字典树,对所述待匹配文本中的关键词进行匹配;所述位置查询模块用于当匹配到所述待匹配文本中的关键词时,查询匹配到的关键词在预设索引表中的目标位置,所述预设索引表包括至少一个匹配条件对应的索引,所述匹配条件包括一个或多个关键词;所述关键词添加模块用于将所述匹配到的关键词添加至临时索引表中的所述目标位置,所述临时索引表为与所述预设索引表的结构相同的空白索引表;所述条件确定模块用于根据所述临时索引表中已添加的关键词,确定所述待匹配文本是否满足所述匹配条件。
- 一种电子设备,其特征在于,包括:一个或多个处理器;存储器;一个或多个应用程序,其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序配置用于执行如权利要求1-17任一项所述的方法。
- 一种计算机可读取存储介质,其特征在于,所述计算机可读取存储介质中存储有程序代码,所述程序代码可被处理器调用执行如权利要求1-17任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/084762 WO2021207936A1 (zh) | 2020-04-14 | 2020-04-14 | 文本匹配方法、装置、电子设备及存储介质 |
CN202080094722.1A CN115004173A (zh) | 2020-04-14 | 2020-04-14 | 文本匹配方法、装置、电子设备及存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/084762 WO2021207936A1 (zh) | 2020-04-14 | 2020-04-14 | 文本匹配方法、装置、电子设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021207936A1 true WO2021207936A1 (zh) | 2021-10-21 |
Family
ID=78083901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/084762 WO2021207936A1 (zh) | 2020-04-14 | 2020-04-14 | 文本匹配方法、装置、电子设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115004173A (zh) |
WO (1) | WO2021207936A1 (zh) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050144553A1 (en) * | 2000-04-06 | 2005-06-30 | International Business Machines Corporation | Longest prefix match (LPM) algorithm implementation for a network processor |
CN103914498A (zh) * | 2013-03-18 | 2014-07-09 | 百度在线网络技术(北京)有限公司 | 一种地图搜索的搜索建议方法和装置 |
CN104035980A (zh) * | 2014-05-26 | 2014-09-10 | 王和平 | 一种面向结构化医药信息的检索方法和系统 |
CN104346331A (zh) * | 2013-07-23 | 2015-02-11 | 北大方正集团有限公司 | Xml数据库的检索方法及系统 |
CN105654201A (zh) * | 2015-12-30 | 2016-06-08 | 上海珍岛信息技术有限公司 | 一种广告流量预测方法及装置 |
CN110019647A (zh) * | 2017-10-25 | 2019-07-16 | 华为技术有限公司 | 一种关键词搜索方法、装置和搜索引擎 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EA200500336A1 (ru) * | 2005-02-04 | 2006-08-25 | Александр Иванович Силаев | Интерактивная игровая система, способ интерактивной интеллектуальной игры |
CN102346783B (zh) * | 2011-11-09 | 2014-09-17 | 华为技术有限公司 | 数据检索方法及装置 |
CN103902594A (zh) * | 2012-12-28 | 2014-07-02 | 重庆凯泽科技有限公司 | 一种基于子镜头倒排索引的匹配方法 |
CN105550298B (zh) * | 2015-12-11 | 2019-12-10 | 北京搜狗科技发展有限公司 | 一种关键词模糊匹配的方法及装置 |
-
2020
- 2020-04-14 WO PCT/CN2020/084762 patent/WO2021207936A1/zh active Application Filing
- 2020-04-14 CN CN202080094722.1A patent/CN115004173A/zh active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050144553A1 (en) * | 2000-04-06 | 2005-06-30 | International Business Machines Corporation | Longest prefix match (LPM) algorithm implementation for a network processor |
CN103914498A (zh) * | 2013-03-18 | 2014-07-09 | 百度在线网络技术(北京)有限公司 | 一种地图搜索的搜索建议方法和装置 |
CN104346331A (zh) * | 2013-07-23 | 2015-02-11 | 北大方正集团有限公司 | Xml数据库的检索方法及系统 |
CN104035980A (zh) * | 2014-05-26 | 2014-09-10 | 王和平 | 一种面向结构化医药信息的检索方法和系统 |
CN105654201A (zh) * | 2015-12-30 | 2016-06-08 | 上海珍岛信息技术有限公司 | 一种广告流量预测方法及装置 |
CN110019647A (zh) * | 2017-10-25 | 2019-07-16 | 华为技术有限公司 | 一种关键词搜索方法、装置和搜索引擎 |
Also Published As
Publication number | Publication date |
---|---|
CN115004173A (zh) | 2022-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6634515B2 (ja) | 自動質問応答システムにおける質問クラスタリング処理方法及び装置 | |
WO2020140386A1 (zh) | 基于TextCNN知识抽取方法、装置、计算机设备及存储介质 | |
CN109670163B (zh) | 信息识别方法、信息推荐方法、模板构建方法及计算设备 | |
US10971133B2 (en) | Voice synthesis method, device and apparatus, as well as non-volatile storage medium | |
US20170116521A1 (en) | Tag processing method and device | |
CN110930980A (zh) | 一种中英文混合语音的声学识别模型、方法及系统 | |
JP6932360B2 (ja) | オブジェクト検索方法、装置およびサーバ | |
WO2021189195A1 (zh) | 数据查询方法、装置、服务器及存储介质 | |
CN108920649A (zh) | 一种信息推荐方法、装置、设备和介质 | |
US11573961B2 (en) | Delta graph traversing system | |
CN105843882A (zh) | 一种信息匹配方法及装置 | |
CN113901214B (zh) | 表格信息的提取方法、装置、电子设备及存储介质 | |
CN108334491B (zh) | 文本分析方法、装置、计算设备及存储介质 | |
CN109190116B (zh) | 语义解析方法、系统、电子设备及存储介质 | |
WO2021207936A1 (zh) | 文本匹配方法、装置、电子设备及存储介质 | |
CN112101023B (zh) | 文本处理方法、装置以及电子设备 | |
JP2001092841A (ja) | クラスター分析処理方法およびクラスター分析プログラムを記録した記録媒体 | |
CN115640420A (zh) | 基于es的音频信息索引库建立检索方法、设备及存储介质 | |
CN107728806A (zh) | 输入法候选词展示方法、装置、计算机装置及存储介质 | |
CN114676138A (zh) | 数据处理方法、电子设备及可读存储介质 | |
CN113297273B (zh) | 查询元数据的方法、装置和电子设备 | |
WO2022204845A1 (zh) | 实体热度生成方法、装置、存储介质及电子设备 | |
CN112966505B (zh) | 一种从文本语料中提取持续性热点短语的方法、装置及存储介质 | |
CN115004169A (zh) | 用户选取方法、装置、服务器及存储介质 | |
CN117725077A (zh) | 标识搜索方法、装置、计算机设备、存储介质和程序产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20931155 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 15/03/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20931155 Country of ref document: EP Kind code of ref document: A1 |