[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111898376B - Name data processing method and device, storage medium and computer equipment - Google Patents

Name data processing method and device, storage medium and computer equipment Download PDF

Info

Publication number
CN111898376B
CN111898376B CN202010618976.0A CN202010618976A CN111898376B CN 111898376 B CN111898376 B CN 111898376B CN 202010618976 A CN202010618976 A CN 202010618976A CN 111898376 B CN111898376 B CN 111898376B
Authority
CN
China
Prior art keywords
name
standard
word
target
word segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010618976.0A
Other languages
Chinese (zh)
Other versions
CN111898376A (en
Inventor
张朝胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lazas Network Technology Shanghai Co Ltd
Original Assignee
Lazas Network Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lazas Network Technology Shanghai Co Ltd filed Critical Lazas Network Technology Shanghai Co Ltd
Priority to CN202010618976.0A priority Critical patent/CN111898376B/en
Publication of CN111898376A publication Critical patent/CN111898376A/en
Application granted granted Critical
Publication of CN111898376B publication Critical patent/CN111898376B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a name data processing method and device, a storage medium and computer equipment, wherein the method comprises the following steps: obtaining a standard name index table, wherein the standard name index table comprises standard name segmentation words and standard names corresponding to the standard name segmentation words; performing word segmentation on a target name to be processed to obtain a target name word segmentation corresponding to the target name; and determining a standard name corresponding to the target name based on the target name segmentation and the standard name segmentation in the standard name index table. According to the application, the standard names matched with the target names are found out by associating the target name word fragments corresponding to the target names with the standard name word fragments corresponding to the standard names, so that the name standardization is realized, and the influence of the word fragments on the target name standardization can be ignored under the condition that the word fragments in the target names are different from the word fragments in the standard names, so that the accuracy of the name standardization is improved.

Description

Name data processing method and device, storage medium and computer equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a name data processing method, a device, a storage medium, and a computer apparatus.
Background
Along with the continuous progress of science and technology and society, the takeaway catering industry is rapidly developed. Dishes are also becoming more and more important as important indicators in the take-away catering industry, as well as in research on their related data. In addition, because the names of the dishes are filled by the merchants, and possibly have special significance, the description modes of the dishes are extremely rich at present, different stores or different periods of the same store have different naming modes of the same dishes, so that the difficulty of extracting the names of the dishes is increased, the dish is described as the most authentic braised meat by store A, the dish is described as the most authentic braised meat by store B, and the description mode of the dish by store C is the "special price-braised meat (necessary point)".
The current method for normalizing the names of the dishes generally calculates the similarity (such as hamming distance) between the custom dish name and the standard dish name, so as to normalize the custom dish name in a mode of finding the standard dish name with the highest similarity. However, in the naming of dishes, a large number of characters have different positions but the same meaning, for example, the 'shredded potatoes fried with green peppers' and the 'shredded potatoes green peppers' are actually one dish, but the similarity of the two is low, so that the traditional method for normalizing the names of dishes has the problem of low accuracy. The existence of the phenomenon seriously influences the follow-up recommendation, searching, ranking list, knowledge graph construction and other works of the takeaway platform.
Therefore, how to improve the accuracy of the standardized processing of the names of the dishes becomes a problem to be solved in each take-out platform.
Disclosure of Invention
In view of the above, the present application provides a name data processing method, apparatus, storage medium and computer device, which are helpful for improving accuracy of name standardization.
According to an aspect of the present application, there is provided a name data processing method, the method comprising:
obtaining a standard name index table, wherein the standard name index table comprises standard name segmentation words and standard names corresponding to the standard name segmentation words;
performing word segmentation on a target name to be processed to obtain a target name word segmentation corresponding to the target name;
and determining a standard name corresponding to the target name based on the target name segmentation and the standard name segmentation in the standard name index table.
Specifically, the word segmentation processing is performed on the target name to be processed to obtain a target name word segmentation corresponding to the target name, which specifically includes:
calculating the length of the target name;
if the length of the target name is greater than or equal to a maximum word segmentation length threshold, word segmentation is carried out on the target name according to each word segmentation length in the range from the maximum word segmentation length threshold to the minimum word segmentation length threshold, and the word segmentation of the target name is obtained;
if the length of the target name is smaller than the maximum word segmentation length threshold, word segmentation is carried out on the target name according to each word segmentation length ranging from the length of the target name to the minimum word segmentation length threshold, and the word segmentation of the target name is obtained.
Specifically, before the obtaining the length of the target name, the method further includes:
and filtering the target name based on preset characters to remove the preset characters contained in the target name.
Specifically, before the standard name index table is obtained, the method further includes:
acquiring at least one standard name, and respectively performing word segmentation on each standard name to obtain standard name word segmentation;
and establishing the standard name index table based on different standard name word segmentation and at least one standard name corresponding to the standard name word segmentation.
Specifically, the determining, based on the target name word segmentation and the standard name word segmentation in the standard name index table, a standard name corresponding to the target name specifically includes:
inquiring a matched word segment corresponding to the target name word segment in the standard name word segment contained in the standard name index table;
acquiring at least one first standard name corresponding to the matched segmentation word based on the corresponding relation between the standard name segmentation word and the standard name in the standard name index table;
And extracting a second standard name corresponding to the target name from the first standard names according to the similarity between the target name and the first standard names.
Specifically, the querying, in the standard name word segment included in the standard name index table, a matching word segment corresponding to the target name word segment specifically includes:
Extracting the word to be matched with the maximum length from the target name word;
if the standard name index table contains the standard name word corresponding to the word to be matched, determining the standard name word as the matched word;
otherwise, continuously extracting a new word to be matched with the maximum length from the rest target name words until the matched word matched with the target name word is found out.
Specifically, the extracting, according to the similarity between the target name and the first standard name, a second standard name corresponding to the target name from the first standard name specifically includes:
calculating the similarity between the target name and each of the first standard names, wherein the similarity=the number of common characters of the target name and the first standard names/the shortest character length of the target name and the first standard names;
And extracting the second standard name with highest similarity with the target name from the first standard names.
According to another aspect of the present application, there is provided a name data processing apparatus including:
The index table acquisition module is used for acquiring a standard name index table, wherein the standard name index table comprises standard name segmentation words and standard names corresponding to the standard name segmentation words;
the target name word segmentation module is used for carrying out word segmentation on a target name to be processed to obtain target name word segmentation corresponding to the target name;
And the target name processing module is used for determining a standard name corresponding to the target name based on the target name word segmentation and the standard name word segmentation in the standard name index table.
Specifically, the target name word segmentation module specifically includes:
a length calculation unit for calculating the length of the target name;
The first word segmentation unit is used for segmenting the target name according to each word segmentation length in the range from the maximum word segmentation length threshold to the minimum word segmentation length threshold if the length of the target name is greater than or equal to the maximum word segmentation length threshold, so as to obtain the target name word segmentation;
and the second word segmentation unit is used for segmenting the target name according to each word segmentation length ranging from the length of the target name to the minimum word segmentation length threshold value if the length of the target name is smaller than the maximum word segmentation length threshold value, so as to obtain the target name word segmentation.
Specifically, the device further comprises:
And the target name filtering module is used for filtering the target name based on preset characters before acquiring the length of the target name so as to remove the preset characters contained in the target name.
Specifically, the device further comprises:
the standard name word segmentation module is used for acquiring at least one standard name before acquiring a standard name index table, and respectively segmenting each standard name to obtain the standard name word segmentation;
the index table establishing module is used for establishing the standard name index table based on different standard name word segmentation and at least one standard name corresponding to the standard name word segmentation.
Specifically, the target name processing module specifically includes:
The matching word segmentation query unit is used for querying matching word segmentation corresponding to the target name word segmentation in the standard name word segmentation contained in the standard name index table;
The first processing unit is used for acquiring at least one first standard name corresponding to the matched segmentation word based on the corresponding relation between the standard name segmentation word and the standard name in the standard name index table;
and the second processing unit is used for extracting a second standard name corresponding to the target name from the first standard names according to the similarity between the target name and the first standard names.
Specifically, the matching word segmentation query unit specifically includes:
The word segmentation extraction subunit to be matched is used for extracting the word segmentation to be matched with the maximum length from the target name word segmentation;
the first matched word segmentation determining subunit is configured to determine the standard name index table as the matched word segment if the standard name index table contains the standard name word segment corresponding to the word segment to be matched;
And the second matched word segmentation determining subunit is used for continuously extracting new word segments to be matched with the maximum length from the rest target name word segments until the matched word segments matched with the target name word segments are found out.
Specifically, the second processing unit specifically includes:
A similarity calculating subunit configured to calculate a similarity between the target name and each of the first standard names, where the similarity = a number of common characters of the target name and the first standard names/a shortest character length of the target name and the first standard names;
and the target name processing subunit is used for extracting the second standard name with highest similarity with the target name from the first standard names.
According to still another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described name data processing method.
According to still another aspect of the present application, there is provided a computer apparatus including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, the processor implementing the above-mentioned name data processing method when executing the program.
By means of the technical scheme, the name data processing method, the name data processing device, the storage medium and the computer equipment provided by the application are used for obtaining corresponding target name segmentation words by performing segmentation processing on target names to be processed, so that the target name segmentation words are used for associating with standard name segmentation words contained in a pre-established standard name index table, and finally, standard names corresponding to the target names are determined. Compared with the method for determining the standard names based on the similarity by calculating the similarity between the target names and the standard names and then determining the standard names based on the similarity in the prior art, the method can omit the influence of the word segmentation sequence on the processing of the target names under the condition that the word segmentation sequence in the target names is different from the word segmentation sequence in the standard names, so as to obtain the standard names which are more matched with the actual semantics of the target names, and improve the accuracy of the name processing.
The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 shows a flow diagram of a name data processing method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of another name data processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a name data processing device according to an embodiment of the present application;
Fig. 4 is a schematic structural diagram of another name data processing device according to an embodiment of the present application.
Detailed Description
The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
In this embodiment, a name data processing method is provided, as shown in fig. 1, and the method includes:
step 101, obtaining a standard name index table, wherein the standard name index table comprises standard name segmentation words and corresponding standard names;
102, word segmentation processing is carried out on the target name to be processed, and target name word segmentation corresponding to the target name is obtained;
step 103, determining the standard name corresponding to the target name based on the target name segmentation and the standard name segmentation in the standard name index table.
The embodiment of the application can be used for carrying out standardized processing on the common vegetable names of take-out platform merchants or other individuals to obtain the standard vegetable names corresponding to the common vegetable names, and can be applied to other scenes, for example, the embodiment of the application can be used for carrying out standardized processing on the drink names to obtain the standard drink names, and the embodiment of the application is not limited to be applied to standardizing the common commodity names on any other platform of the take-out platform into the standard commodity names.
The embodiment of the application is explained by standardizing the common vegetable names into standard vegetable names. Before the standard names of the common dishes are standardized, a standard name index table is pre-established, wherein the standard name index table contains preset standard names, for example, a "carrot and meat pot rice", "braised pork", "curry chicken meat" and the like, the index table also contains standard name word fragments corresponding to the standard names, the standard name word fragments are obtained by word fragments processing of the standard names, the word fragments method is not limited herein, for example, the standard names can be subjected to word fragments according to a preset word library to obtain word fragments "carrot", "meat wire", "pot rice" corresponding to the "carrot and meat pot rice", the word fragments can be performed according to an N-gram model word fragments algorithm, and 5gram word fragments "carrot and meat wire", "Bo Rousi gram word fragments", "meat pot rice", "4 gram word fragments, 3gram fragments and the like corresponding to the" carrot and meat pot rice "can be sequentially obtained, and the standard name index table can be established based on the corresponding relation between the standard name fragments and the standard name.
Then, the target name to be processed is normalized by using the standard name index table. Specifically, after the target name to be processed is obtained, word segmentation processing is performed on the target name to obtain a target name word segment corresponding to the target name, a word segmentation processing method of the target name is consistent with a word segmentation method adopted when word segmentation processing is performed on the standard name to obtain the target name word segment, and after the target name word segment is obtained, the standard name corresponding to the target name can be determined by inquiring the standard name word segment corresponding to the target name word segment in an index table, so that standardization of the target name is realized. For example, the target name word corresponding to the target name a includes "carrot", the index table includes the standard name word "carrot" and the standard name "shredded pork rice" corresponding to the standard name word, and it may be determined that the standard name corresponding to the target name a includes "shredded pork rice", so as to implement standardization of the target name a.
By applying the technical scheme of the embodiment, the corresponding target name word segmentation is obtained by word segmentation processing of the target name to be processed, so that the target name word segmentation is correlated with standard name word segmentation contained in a pre-established standard name index table, and finally, the standard name corresponding to the target name is determined. Compared with the method for determining the standard names based on the similarity by calculating the similarity between the target names and the standard names and then determining the standard names based on the similarity in the prior art, the method provided by the embodiment of the application has the advantages that the influence of the word segmentation sequence on the standard processing of the target names can be ignored under the condition that the word segmentation sequence in the target names is different from the word segmentation sequence in the standard names, so that the standard names which are matched with the actual semantics of the target names more can be obtained, and the accuracy of the standard processing of the names is improved.
Further, as a refinement and extension of the foregoing embodiment, in order to fully describe the implementation procedure of this embodiment, another name data processing method is provided, as shown in fig. 2, where the method includes:
Step 201, a standard name index table is obtained, wherein the standard name index table comprises standard name segmentation words and corresponding standard names.
The standard name index table establishing step of the embodiment of the application specifically comprises the following steps: acquiring at least one standard name, and respectively performing word segmentation on each standard name to obtain standard name word segmentation; and establishing a standard name index table based on different standard name segmentation words and at least one standard name corresponding to the standard name segmentation words.
Firstly, carrying out segmentation processing on standard vegetable names according to N-grams (N represents word segmentation length) of different established types, wherein the value range of N is from a maximum word segmentation length threshold value to a minimum word segmentation length threshold value, taking dish "carrot shredded pork pot rice" as an example, the maximum word segmentation length threshold value is 5, the minimum word segmentation length threshold value is 2, and the segmentation result is shown in table 1:
TABLE 1
N-gram type Segmentation result
5gram Shredded carrot, shredded radish, shredded pork pot, bo Rousi pot, shredded pork pot and rice
4gram Carrot meat, shredded radish meat, bo Rousi pot, shredded meat pot, shredded pot rice and rice pot
3gram Carrot, radish, shredded pork pot, shredded pot, pot rice and pot rice
2gram Carrot, radish, shredded pork, pot and rice
Next, the above cutting result is corresponding to the standard name of the dish, taking the dish "carrot shredded pork steamed rice" as an example, and the corresponding result is shown in table 2:
TABLE 2
Finally, for the case that standard names of different dishes contain the same word, the same word needs to be combined, for example, the word "shredded pork with carrot" and the word "shredded pork with carrot" are corresponding to the word "shredded pork with carrot" and the word "shredded pork with carrot" respectively, then the standard names corresponding to the word "shredded pork with carrot" and the word "shredded pork with carrot" can be combined, and the standard dish names are combined under the same index word, specifically as shown in table 3, wherein table 3 only exemplifies two index words of "shredded pork with carrot" and the corresponding relation of other index words and the dish names in the index table is the same as table 2.
TABLE 3 Table 3
N-gram type Index words Standard vegetable name
5gram Shredded pork with carrot Steamed rice with shredded carrot and shredded carrot, and fried rice with shredded carrot
3gram Carrot (carrot) Steamed rice with shredded carrot and stewed mutton with shredded carrot
Step 202, filtering the target name based on the preset character to remove the preset character contained in the target name.
In the above embodiment, the preset characters may include personalized characters, metering characters, labeling characters, and combination characters, and these characters should be filtered before normalization. Wherein personalizing the character may include: Etc.; metering characters (including combinations with numbers) may include: milliliters, liters, jin, grams, kilograms, two, slices, bottles, bags, pouches, boxes, cups, pieces, tins, individual, strings, degrees, pounds, parts, and the like; the indicator character may comprise: "," and "[ MEANS FOR SOLVING PROBLEMS ] ({, <, [, ], >, }, a (j), a" and so on ". Through filtering the preset characters, the subsequent word segmentation on the target name can be facilitated, and the influence of the characters on the name standardization is avoided.
Step 203, calculating the length of the target name.
Step 204, if the length of the target name is greater than or equal to the maximum word segmentation length threshold, word segmentation is performed on the target name according to each word segmentation length ranging from the maximum word segmentation length threshold to the minimum word segmentation length threshold, and the target name word segmentation is obtained.
Step 205, if the length of the target name is smaller than the maximum word segmentation length threshold, word segmentation is performed on the target name according to each word segmentation length ranging from the length of the target name to the minimum word segmentation length threshold, so as to obtain the word segmentation of the target name.
In steps 203 to 205, a method for performing word segmentation on a target name is provided, where the length of the target name is first obtained, and then the word segmentation length is determined based on the length of the target name, so as to achieve word segmentation on the target name, and obtain word segmentation on the target name with a corresponding length. Wherein, if the length of the target name is greater than or equal to the maximum word segmentation length threshold, the value range of N is the maximum word segmentation length threshold to the minimum word segmentation length threshold, and if the length of the target name is less than the maximum word segmentation length threshold, the value range of N is the length of the target name to the minimum word segmentation length threshold, for example, the length of the target name is 7 characters, the maximum word segmentation length threshold is 5, and the minimum word segmentation length threshold is 2, and N takes 5, 4, 3, and 2. The specific word segmentation method is consistent with the word segmentation method of the standard name, and is not described herein.
Step 206, inquiring the matching word segment corresponding to the target name word segment in the standard name word segments contained in the standard name index table.
In the above embodiment, after the target name word is obtained, the index table is searched for the index word identical to the target name word (i.e., the standard name word), and the searched index word identical to the target name word is used as the matching word corresponding to the index word.
In order to improve the matching efficiency, step 206 of the embodiment of the present application may specifically include:
step 2061, extracting the word to be matched with the maximum length from the target name word;
Step 2062, if the standard name index table contains standard name word corresponding to the word to be matched, determining the standard name word as the matched word;
Step 2063, if not, continuing to extract the new word to be matched with the maximum length from the rest of the target name words until the matched word matched with the target name word is found.
In steps 2061 to 2063, the word with the largest length is extracted from the target name word as the word to be matched, then, whether the index word identical to the word to be matched exists is searched in the index table, if yes, the index word is used as the matching word, if not, the word to be matched with the largest length is continuously extracted from the rest of the target name words again, and then the corresponding matching word is continuously searched. For example, if the word with the largest length in the target name word is 5 characters, extracting the target name word with 5 characters as the word to be matched, then inquiring whether the index word with the same type as the word to be matched exists in the index words with the type of 5-gram in the index table, if so, directly determining the index word as the matched word, if not, continuously extracting the word to be matched with 4 characters from the target name word, and continuously searching the matched word corresponding to the word until the matched word is found.
By using the method, the matching word is searched in the index table according to the sequence of the character length from large to small, so that the matching word with the maximum length corresponding to the target name word can be found, and the method is beneficial to finding the standard name with higher matching degree with the target name based on the matching word later.
Step 207, obtaining at least one first standard name corresponding to the matched word segment based on the corresponding relation between the standard name word segment and the standard name in the standard name index table.
In the above embodiment, the target names are mounted in the index table based on the matching word corresponding to the target names, that is, the standard names corresponding to the matching word in the index table are found according to the matching word, and table 4 shows the mounting results of several target dishes. As shown in table 4, for example, the first standard name corresponding to the matching word "drumstick" of the target dish name includes drumstick mixed noodles, drumstick mixed noodles C, drumstick mixed noodles, dry-picked drumstick mixed noodles, and strong drumstick mixed noodles.
TABLE 4 Table 4
Step 208, according to the similarity between the target name and the first standard name, extracting a second standard name corresponding to the target name from the first standard names.
In the above embodiment, if there is only one of the first standard names corresponding to the matching word, the one first standard name may be directly determined as the standardized name corresponding to the target name, that is, the second standard name, and if the first standard name includes a plurality of the first standard names, the standardized name corresponding to the target name, that is, the second standard name, should also be selected from the plurality of the first standard names.
Step 208 may specifically include:
step 2081, calculating the similarity between the target name and each first standard name, wherein the similarity=the number of common characters between the target name and the first standard name/the shortest character length between the target name and the first standard name;
step 2082, extracting the second standard name with the highest similarity with the target name from the first standard names.
In steps 2081 and 2082, a name normalization method that is more suitable for the characteristics of chinese language is provided, and it is clearly shown that the sequence of chinese characters is not necessarily capable of affecting reading (for example, in this sentence, the sequence of "chinese characters after changing the sequence of chinese characters is not necessarily capable of affecting reading", and a general user can understand the same meaning when reading these two sentences), so that the embodiment of the present application calculates the similarity between the target name and the first standard name by the ratio of the number of common characters, and compared with the method that is generally used in the prior art and that calculates the similarity based on the hamming distance (the similarity between two strings is measured by comparing the difference between the same positions of two strings), it is more reasonable in the scene of the embodiment of the present application, for example, "shredded potato" and "shredded potato" are the same dishes, if the similarity is calculated according to the hamming distance, it is obviously unreasonable to obtain a lower similarity, but if the similarity is calculated according to the application based on the ratio of common characters.
Specifically, the similarity between the target name X and any one of the first standard names y=the number of common characters of X and Y/MIN (length (X), length (Y)), for example, the target name X contains 8 characters, the first standard name Y includes 10 characters, the number of common characters of X and Y is 6, and then the similarity between X and y=6/8=0.75. After the similarity between the target name and each first standard name is calculated according to the method, a second standard name with the highest similarity is found out from the plurality of first standard names and is used as the standardized name of the target name. The method for calculating the similarity between the target dish name and the standard dish name through the common character number ratio not only greatly improves the efficiency compared with other similarity calculation modes, but also ensures the accuracy to a certain extent.
Compared with the prior art, the technical scheme of the embodiment of the application determines the corresponding first standard name according to the target name word segmentation to ensure that only the target name and the standard name with the common character participate in similarity calculation without calculating the similarity between the target name and each standard name, thereby reducing the calculation sample size.
Further, as a specific implementation of the method of fig. 1, an embodiment of the present application provides a name data processing device, as shown in fig. 3, where the device includes: an index table acquisition module 31, a target name word segmentation module 32 and a target name processing module 33.
An index table obtaining module 31, configured to obtain a standard name index table, where the standard name index table includes standard name segmentation words and standard names corresponding to the standard name segmentation words;
the target name word segmentation module 32 is configured to perform word segmentation on a target name to be processed to obtain a target name word corresponding to the target name;
The target name processing module 33 is configured to determine a standard name corresponding to the target name based on the target name segmentation and the standard name segmentation in the standard name index table.
In a specific application scenario, as shown in fig. 4, the object name word segmentation module 32 specifically includes: length calculation section 321, first word segmentation section 322, and second word segmentation section 323.
A length calculation unit 321 for calculating the length of the target name;
The first word segmentation unit 322 is configured to segment the target name according to each word segmentation length ranging from the maximum word segmentation length threshold to the minimum word segmentation length threshold if the length of the target name is greater than or equal to the maximum word segmentation length threshold, so as to obtain a target name word segmentation;
And the second word segmentation unit 323 is configured to segment the target name according to each word segmentation length ranging from the length of the target name to the minimum word segmentation length threshold value if the length of the target name is less than the maximum word segmentation length threshold value, so as to obtain a target name word segmentation.
In a specific application scenario, as shown in fig. 4, the apparatus further includes: the destination name filtering module 34.
The target name filtering module 34 is configured to filter the target name based on the preset character before acquiring the length of the target name, so as to remove the preset character included in the target name.
In a specific application scenario, as shown in fig. 4, the apparatus further includes: standard name word segmentation module 35, index table establishment module 36.
The standard name word segmentation module 35 is configured to obtain at least one standard name before obtaining the standard name index table, and segment each standard name to obtain a standard name word;
the index table creating module 36 is configured to create a standard name index table based on different standard name words and at least one standard name corresponding to the standard name words.
In a specific application scenario, as shown in fig. 4, the object name processing module 33 specifically includes: a match word segmentation query unit 331, a first processing unit 332, a second processing unit 333.
A matching word segmentation query unit 331, configured to query, in standard name word segments included in the standard name index table, matching word segments corresponding to the target name word segments;
a first processing unit 332, configured to obtain at least one first standard name corresponding to the matched word segment based on the correspondence between the standard name word segment and the standard name in the standard name index table;
The second processing unit 333 is configured to extract, from the first standard names, second standard names corresponding to the target names according to the similarity between the target names and the first standard names.
In a specific application scenario, the matching word segmentation query unit 331 specifically includes (not shown in the figure): a to-be-matched word extraction subunit 3311, a first matched word determination subunit 3312, and a second matched word determination subunit 3313.
A word to be matched extracting subunit 3311, configured to extract a word to be matched with a maximum length from the target name word;
a first match word segmentation determining subunit 3312, configured to determine the standard name index table as a match word segment if the standard name index table contains a standard name word segment corresponding to the word segment to be matched;
And a second match word determining subunit 3313, configured to, if not, continue to extract a new word to be matched with the maximum length from the remaining target name word until a match word matching the target name word is found.
In a specific application scenario, the second processing unit 333 specifically includes (not shown in the figure): a similarity calculation subunit 3331, and a target name processing subunit 3332.
A similarity calculating subunit 3331 configured to calculate a similarity between the target name and each of the first standard names, where similarity=a number of common characters of the target name and the first standard names/a shortest character length of the target name and the first standard names;
the target name processing subunit 3332 is configured to extract, from the first standard names, the second standard names with the highest similarity to the target names.
It should be noted that, for other corresponding descriptions of each functional unit related to the name data processing device provided by the embodiment of the present application, reference may be made to corresponding descriptions in fig. 1 and fig. 2, and details are not repeated herein.
Based on the above-described methods shown in fig. 1 and 2, correspondingly, the embodiment of the present application further provides a storage medium, on which a computer program is stored, which when executed by a processor, implements the above-described name data processing method shown in fig. 1 and 2.
Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective implementation scenario of the present application.
Based on the methods shown in fig. 1 and fig. 2 and the virtual device embodiments shown in fig. 3 and fig. 4, in order to achieve the above objects, the embodiments of the present application further provide a computer device, which may specifically be a personal computer, a server, a network device, etc., where the computer device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the name data processing method as shown in fig. 1 and 2 described above.
Optionally, the computer device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., bluetooth interface, WI-FI interface), etc.
It will be appreciated by those skilled in the art that the architecture of a computer device provided in the present embodiment is not limited to the computer device, and may include more or fewer components, or may combine certain components, or may be arranged in different components.
The storage medium may also include an operating system, a network communication module. An operating system is a program that manages and saves computer device hardware and software resources, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among all components in the storage medium and communication with other hardware and software in the entity equipment.
Through the description of the above embodiments, it can be clearly understood by those skilled in the art that the present application may be implemented by adding a necessary general hardware platform to software, or may be implemented by hardware to obtain a corresponding target name word by performing word segmentation on a target name to be processed, so that the target name word is associated with a standard name word included in a pre-established standard name index table, and a standard name corresponding to the target name is finally determined. Compared with the method for determining the standard names based on the similarity by calculating the similarity between the target names and the standard names and then determining the standard names based on the similarity in the prior art, the method provided by the embodiment of the application has the advantages that the influence of the word segmentation sequence on the standard processing of the target names can be ignored under the condition that the word segmentation sequence in the target names is different from the word segmentation sequence in the standard names, so that the standard names which are matched with the actual semantics of the target names more can be obtained, and the accuracy of the standard processing of the names is improved.
Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the application. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the application, and the application is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the application.

Claims (14)

1. A name data processing method, the method comprising:
Obtaining a standard name index table, wherein the standard name index table comprises standard name word segmentation and standard names corresponding to the standard name index table, the standard names are standard dish names, and the standard name word segmentation is obtained by carrying out N-element model word segmentation algorithm word segmentation on the standard names;
Determining word segmentation length according to an N-element model word segmentation algorithm, and performing word segmentation processing on target names to be processed according to each word segmentation length to obtain target name word segmentation corresponding to the target names, wherein the target names are dish names;
Determining a matched word segment with the largest length in target name word segments matched with the standard name word segments, and acquiring at least one first standard name corresponding to the matched word segment based on the corresponding relation between the standard name word segments and the standard names; and extracting a second standard name corresponding to the target name from the first standard names according to the similarity between the target name and the first standard names.
2. The method of claim 1, wherein the word segmentation processing is performed on the target name to be processed to obtain the target name word segmentation corresponding to the target name, and the method specifically comprises the following steps:
calculating the length of the target name;
if the length of the target name is greater than or equal to a maximum word segmentation length threshold, word segmentation is carried out on the target name according to each word segmentation length in the range from the maximum word segmentation length threshold to the minimum word segmentation length threshold, and the word segmentation of the target name is obtained;
if the length of the target name is smaller than the maximum word segmentation length threshold, word segmentation is carried out on the target name according to each word segmentation length ranging from the length of the target name to the minimum word segmentation length threshold, and the word segmentation of the target name is obtained.
3. The method of claim 2, wherein prior to said calculating the length of the target name, the method further comprises:
and filtering the target name based on preset characters to remove the preset characters contained in the target name.
4. The method of claim 3, wherein prior to the obtaining the standard name index table, the method further comprises:
acquiring at least one standard name, and respectively performing word segmentation on each standard name to obtain standard name word segmentation;
and establishing the standard name index table based on different standard name word segmentation and at least one standard name corresponding to the standard name word segmentation.
5. The method according to claim 1, wherein the determining the matching word segment with the largest length among the target name word segments matching the standard name word segment specifically comprises:
Extracting the word to be matched with the maximum length from the target name word;
if the standard name index table contains the standard name word corresponding to the word to be matched, determining the standard name word as the matched word;
otherwise, continuously extracting a new word to be matched with the maximum length from the rest target name words until the matched word matched with the target name word is found out.
6. The method according to claim 5, wherein the extracting the second standard name corresponding to the target name from the first standard names according to the similarity between the target name and the first standard names specifically includes:
Calculating the similarity between the target name and each of the first standard names, wherein the similarity=the number of common characters of the target name and the first standard names/the shortest character length of the target name and the first standard names;
And extracting the second standard name with highest similarity with the target name from the first standard names.
7. A name data processing device, the device comprising:
The index table acquisition module is used for acquiring a standard name index table, wherein the standard name index table comprises standard name word segmentation and standard names corresponding to the standard name index table, the standard names are standard dish names, and the standard name word segmentation is obtained by carrying out N-element model word segmentation algorithm word segmentation on the standard names;
the target name word segmentation module is used for determining word segmentation lengths according to an N-element model word segmentation algorithm, and performing word segmentation processing on target names to be processed according to each word segmentation length to obtain target name word segmentation corresponding to the target names, wherein the target names are dish names;
The target name processing module is used for determining a matched word segment with the largest length in target name word segments matched with the standard name word segments, and acquiring at least one first standard name corresponding to the matched word segment based on the corresponding relation between the standard name word segments and the standard names; and extracting a second standard name corresponding to the target name from the first standard names according to the similarity between the target name and the first standard names.
8. The device according to claim 7, wherein the target name word segmentation module specifically comprises:
a length calculation unit for calculating the length of the target name;
The first word segmentation unit is used for segmenting the target name according to each word segmentation length in the range from the maximum word segmentation length threshold to the minimum word segmentation length threshold if the length of the target name is greater than or equal to the maximum word segmentation length threshold, so as to obtain the target name word segmentation;
and the second word segmentation unit is used for segmenting the target name according to each word segmentation length ranging from the length of the target name to the minimum word segmentation length threshold value if the length of the target name is smaller than the maximum word segmentation length threshold value, so as to obtain the target name word segmentation.
9. The apparatus of claim 8, wherein the apparatus further comprises:
And the target name filtering module is used for filtering the target name based on preset characters before acquiring the length of the target name so as to remove the preset characters contained in the target name.
10. The apparatus of claim 9, wherein the apparatus further comprises:
the standard name word segmentation module is used for acquiring at least one standard name before acquiring a standard name index table, and respectively segmenting each standard name to obtain the standard name word segmentation;
the index table establishing module is used for establishing the standard name index table based on different standard name word segmentation and at least one standard name corresponding to the standard name word segmentation.
11. The apparatus of claim 7, wherein the object name processing module specifically comprises:
The word segmentation extraction subunit to be matched is used for extracting the word segmentation to be matched with the maximum length from the target name word segmentation;
the first matched word segmentation determining subunit is configured to determine the standard name index table as the matched word segment if the standard name index table contains the standard name word segment corresponding to the word segment to be matched;
And the second matched word segmentation determining subunit is used for continuously extracting new word segments to be matched with the maximum length from the rest target name word segments until the matched word segments matched with the target name word segments are found out.
12. The apparatus of claim 11, wherein the object name processing module specifically comprises:
A similarity calculating subunit configured to calculate a similarity between the target name and each of the first standard names, where the similarity = a number of common characters of the target name and the first standard names/a shortest character length of the target name and the first standard names;
and the target name processing subunit is used for extracting the second standard name with highest similarity with the target name from the first standard names.
13. A storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the name data processing method of any one of claims 1 to 6.
14. A computer device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the name data processing method of any of claims 1 to 6 when executing the program.
CN202010618976.0A 2020-07-01 2020-07-01 Name data processing method and device, storage medium and computer equipment Active CN111898376B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010618976.0A CN111898376B (en) 2020-07-01 2020-07-01 Name data processing method and device, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010618976.0A CN111898376B (en) 2020-07-01 2020-07-01 Name data processing method and device, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN111898376A CN111898376A (en) 2020-11-06
CN111898376B true CN111898376B (en) 2024-04-26

Family

ID=73191197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010618976.0A Active CN111898376B (en) 2020-07-01 2020-07-01 Name data processing method and device, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN111898376B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651696A (en) * 2016-11-16 2017-05-10 福建天泉教育科技有限公司 Approximate question push method and system
CN108021553A (en) * 2017-09-30 2018-05-11 北京颐圣智能科技有限公司 Word treatment method, device and the computer equipment of disease term
CN108304378A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Text similarity computing method, apparatus, computer equipment and storage medium
CN109712612A (en) * 2018-12-28 2019-05-03 广东亿迅科技有限公司 A kind of voice keyword detection method and device
CN109785919A (en) * 2018-11-30 2019-05-21 平安科技(深圳)有限公司 Noun matching process, device, equipment and computer readable storage medium
CN110895961A (en) * 2019-10-29 2020-03-20 泰康保险集团股份有限公司 Text matching method and device in medical data
CN111325032A (en) * 2020-02-21 2020-06-23 中国建设银行股份有限公司 5G + intelligent banking institution name standardization method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651696A (en) * 2016-11-16 2017-05-10 福建天泉教育科技有限公司 Approximate question push method and system
CN108021553A (en) * 2017-09-30 2018-05-11 北京颐圣智能科技有限公司 Word treatment method, device and the computer equipment of disease term
CN108304378A (en) * 2018-01-12 2018-07-20 深圳壹账通智能科技有限公司 Text similarity computing method, apparatus, computer equipment and storage medium
CN109785919A (en) * 2018-11-30 2019-05-21 平安科技(深圳)有限公司 Noun matching process, device, equipment and computer readable storage medium
CN109712612A (en) * 2018-12-28 2019-05-03 广东亿迅科技有限公司 A kind of voice keyword detection method and device
CN110895961A (en) * 2019-10-29 2020-03-20 泰康保险集团股份有限公司 Text matching method and device in medical data
CN111325032A (en) * 2020-02-21 2020-06-23 中国建设银行股份有限公司 5G + intelligent banking institution name standardization method and device

Also Published As

Publication number Publication date
CN111898376A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
US9430568B2 (en) Method and system for querying information
CN108628833B (en) Method and device for determining summary of original content and method and device for recommending original content
US9471644B2 (en) Method and system for scoring texts
CN111400507B (en) Entity matching method and device
CN111104526A (en) Financial label extraction method and system based on keyword semantics
CN110110577B (en) Method and device for identifying dish name, storage medium and electronic device
CN107067293A (en) Merchant category method, device and electronic equipment
US20130144875A1 (en) Set expansion processing device, set expansion processing method, program and non-transitory memory medium
CN112633000A (en) Method and device for associating entities in text, electronic equipment and storage medium
CN106649276A (en) Identification method and device for core product word in title
CN111737473A (en) Text classification method, device and equipment
CN108304381B (en) Entity edge establishing method, device and equipment based on artificial intelligence and storage medium
CN113836316B (en) Processing method, training method, device, equipment and medium for ternary group data
CN111898376B (en) Name data processing method and device, storage medium and computer equipment
CN111429200B (en) Content association method and device, storage medium and computer equipment
CN105095385B (en) A kind of output method and device of retrieval result
CN104657343B (en) Recognize the method and device of transliteration name
CN115033797A (en) Content search method and device, storage medium and computer equipment
CN109960752A (en) Querying method, device, computer equipment and storage medium in application program
US20180005300A1 (en) Information presentation device, information presentation method, and computer program product
CN105095322A (en) Personnel name unit dictionary expansion method, personnel name language recognition method, personnel name unit dictionary expansion device and personnel name language recognition device
CN109447719B (en) Target promoted commodity automatic determination method, device, medium and electronic equipment
CN115438141A (en) Information retrieval method based on knowledge graph model
CN107818144A (en) A kind of method that multi-data source data are integrated based on Solr
CN111444345A (en) Dish name classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant