CN112988753A - Data searching method and device - Google Patents
Data searching method and device Download PDFInfo
- Publication number
- CN112988753A CN112988753A CN202110349226.2A CN202110349226A CN112988753A CN 112988753 A CN112988753 A CN 112988753A CN 202110349226 A CN202110349226 A CN 202110349226A CN 112988753 A CN112988753 A CN 112988753A
- Authority
- CN
- China
- Prior art keywords
- data
- index
- word
- search
- segmentation processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 230000011218 segmentation Effects 0.000 claims abstract description 118
- 238000012545 processing Methods 0.000 claims abstract description 113
- 238000013507 mapping Methods 0.000 claims description 14
- 238000004140 cleaning Methods 0.000 claims description 12
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 238000003058 natural language processing Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 5
- 238000005406 washing Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 13
- 230000006870 function Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data searching method and device, and relates to the technical field of big data. One embodiment of the method comprises: receiving data to be searched, and performing word segmentation processing on the data to be searched; matching the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein, the data index is constructed according to the RNN model and the search field; and acquiring target data from the distributed storage device according to the target index. According to the embodiment, the accuracy of data search is improved, the applicable scenes are expanded, and the user experience is improved.
Description
Technical Field
The invention relates to the technical field of big data, in particular to a data searching method and device.
Background
The existing search system mainly adopts NLP (Natural Language Processing) algorithm, and its basic logic idea is based on dictionary matching, i.e. the Chinese text to be segmented is segmented and adjusted according to a certain rule, then matched with the words in the dictionary, if matching is successful, the words are segmented according to the dictionary, and if matching is failed, the adjustment or reselection is passed, and so on, the cycle is repeated.
The prior art has at least the following problems:
the existing data searching method does not take the searching field into consideration range, so that the searching accuracy is low, the searching effect difference of different fields is large, the applicable scene is narrow, and the user experience is poor.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data search method and apparatus, which can incorporate the search field into one of the consideration factors for constructing the data index, improve the accuracy of data search, expand the applicable scenarios, and improve the user experience.
To achieve the above object, according to a first aspect of embodiments of the present invention, there is provided a data search method including:
receiving data to be searched, and performing word segmentation processing on the data to be searched;
matching the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein, the data index is constructed according to the RNN model and the search field;
and acquiring target data from the distributed storage device according to the target index.
Further, the step of constructing the data index according to the RNN model and the search domain includes:
acquiring a plurality of storage data, and respectively classifying and extracting the plurality of storage data by adopting an RNN (radio network) model and a search field to determine dictionary words corresponding to each storage data;
and constructing a data index according to the dictionary words and the storage positions of the storage data corresponding to the dictionary words.
Further, classifying and extracting the plurality of stored data by using the RNN model and the search field to determine a dictionary word corresponding to each stored data further includes:
carrying out data cleaning and character conversion processing on a plurality of stored data;
and according to the search field and the RNN model, sequentially performing word extraction and word classification on the plurality of storage data after data cleaning to determine dictionary words corresponding to each storage data.
Further, the step of sequentially performing word classification on the plurality of stored data after data cleaning further includes:
carrying out word classification on the plurality of stored data in sequence by combining word characteristics; wherein the word characteristics include at least one of the following characteristics: the term belongs to the search field, the term context and the term functional characteristics.
Further, matching the word segmentation processing result with a data index stored in a search engine, further comprising:
determining at least one item of search fields, word contexts and word functional characteristics of words to which the words corresponding to the word segmentation processing results belong; and are
And determining a candidate data index according to the word characteristics corresponding to the word segmentation processing result, and matching the word segmentation processing result with the candidate data index.
Further, after the step of classifying and extracting the plurality of stored data respectively by using the RNN model and the search domain to determine the dictionary word corresponding to each stored data, the method further includes:
determining word frequency corresponding to the dictionary words;
dictionary words are updated according to word frequency.
Further, determining data sources corresponding to the plurality of stored data, and constructing a data index according to the RNN model and the search field, further comprising:
aiming at the storage data of different data sources, respectively constructing data indexes corresponding to the storage data of different data sources according to the RNN model and the search field; and are
And updating the data index in response to a stored data updating request sent by the data source.
Further, the method further comprises:
responding to service requirements sent by different data sources, and determining an application number corresponding to each data source according to the service requirements;
and constructing a mapping table for indicating the corresponding relation between the data source and the application number.
Further, after the step of receiving data to be searched, the method further comprises:
determining an application number corresponding to data to be retrieved;
determining a data source set corresponding to the data to be retrieved according to the application number and the mapping table;
and matching the data to be searched with the data indexes corresponding to the data source set.
Further, the step of matching the word segmentation processing result with the data index stored in the search engine further comprises:
respectively extracting the characteristics of the word segmentation processing result and the data index to obtain a word segmentation processing result characteristic vector and a data index characteristic vector;
and calculating the similarity between the word segmentation processing result feature vector and the data index feature vector to realize the matching between the word segmentation processing result and the data index.
Further, before the step of performing word segmentation processing on the data to be searched, the method further comprises the following steps of:
and performing natural language processing on the data to be searched to adjust the data to be retrieved.
According to a second aspect of the embodiments of the present invention, there is provided a data search apparatus including:
the word segmentation processing module is used for receiving data to be searched and carrying out word segmentation processing on the data to be searched;
the matching module is used for matching the word segmentation processing result with a data index stored in a search engine so as to determine a target index; wherein, the data index is constructed according to the RNN model and the search field;
and the target data acquisition module is used for acquiring target data from the distributed storage device according to the target index.
Further, the apparatus further comprises a data index building module configured to:
acquiring a plurality of storage data, and respectively classifying and extracting the plurality of storage data by adopting an RNN (radio network) model and a search field to determine dictionary words corresponding to each storage data;
and constructing a data index according to the dictionary words and the storage positions of the storage data corresponding to the dictionary words.
According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including:
one or more processors;
a storage device for storing one or more programs,
when executed by one or more processors, cause the one or more processors to implement any of the data search methods described above.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing any one of the data search methods described above.
One embodiment of the above invention has the following advantages or benefits: because the data to be searched is received, the word segmentation processing is carried out on the data to be searched; matching the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein, the data index is constructed according to the RNN model and the search field; the technical means for acquiring the target data from the distributed storage device according to the target index overcomes the technical problems of low search accuracy, large difference of search effects in different fields, narrow applicable scene and poor user experience in the existing data search method, and further achieves the technical effects of bringing the search fields into consideration for constructing the data index, improving the accuracy of data search, expanding the applicable scene and improving the user experience.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a data search method provided according to a first embodiment of the present invention;
fig. 2 is a schematic diagram of a main flow of a data search method provided according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of the main modules of a data search apparatus provided according to an embodiment of the present invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of a data search method provided according to a first embodiment of the present invention; as shown in fig. 1, the data search method provided in the embodiment of the present invention mainly includes:
step S101, receiving data to be searched, and performing word segmentation processing on the data to be searched.
Specifically, according to the embodiment of the present invention, the data to be searched can be segmented according to the techniques such as knowledge graph and segmentation rule. The word segmentation processing steps can be executed by adopting the existing word segmentation algorithm, and the word segmentation processing is carried out on the data to be searched, so that the subsequent searching is conveniently carried out according to the word segmentation processing result, and the data searching efficiency is improved.
Further, according to the embodiment of the present invention, before the step of performing word segmentation processing on the data to be searched, the method further includes:
and performing natural language processing on the data to be searched to adjust the data to be retrieved.
Specifically, according to the embodiment of the present invention, after receiving the data to be searched, the adjustment operations such as adding, deleting, modifying, and the like can be performed on the data to be retrieved through text processing and search sentence understanding (specifically including modes such as language detection, word segmentation, labeling, and intention recognition), so as to improve the word segmentation processing efficiency and the word segmentation effect.
According to a specific implementation manner of the embodiment of the present invention, a search field to which the data to be retrieved belongs may also be determined, and the word segmentation processing result may be adjusted according to a technical term, a professional vocabulary, and the like corresponding to the search field.
Step S102, matching the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein the data index is constructed according to the RNN model and the search domain.
Specifically, according to an embodiment of the present invention, the step of constructing the data index according to the RNN model and the search field includes:
acquiring a plurality of storage data, and respectively classifying and extracting the plurality of storage data by adopting an RNN (radio network) model and a search field to determine dictionary words corresponding to each storage data;
and constructing a data index according to the dictionary words and the storage positions of the storage data corresponding to the dictionary words.
Through the arrangement, based on different search fields, the stored data in the distributed database are classified and extracted by combining with an RNN (Recurrent Neural Network) model, and then the data index is constructed. The problem that the searching time is long due to the fact that all data indexes need to be inquired when data searching is carried out subsequently is solved. It should be noted that, according to the embodiment of the present invention, the data indexes may be further extracted by classifying the locally stored storage data by the distributed databases, so as to construct data indexes, and then the data indexes respectively constructed by the distributed databases are summarized by the search system.
Further, according to an embodiment of the present invention, the classifying and extracting the plurality of stored data by using the RNN model and the search field to determine the dictionary word corresponding to each stored data further includes:
carrying out data cleaning and character conversion processing on a plurality of stored data;
and according to the search field and the RNN model, sequentially performing word extraction and word classification on the plurality of storage data after data cleaning to determine dictionary words corresponding to each storage data.
Specifically, according to the embodiment of the present invention, before the step of classifying and extracting the plurality of pieces of stored data, the stored data is converted into a character stream form (i.e., a character conversion processing procedure is performed), and HTML tags and the like in the stored data are washed. Through the arrangement, the word segmentation effect is promoted.
Preferably, according to an embodiment of the present invention, the step of sequentially performing word classification on the plurality of pieces of stored data after the data washing further includes:
carrying out word classification on the plurality of stored data in sequence by combining word characteristics; wherein the word characteristics include at least one of the following characteristics: the term belongs to the search field, the term context and the term functional characteristics.
Through the arrangement, in the process of carrying out word classification on the stored data in the distributed storage device, word characteristics such as search fields to which the words belong, word contexts, word functional characteristics and the like are taken as classification bases, so that the subsequent determined word dictionary and the data index determined according to the word dictionary correspond to the search fields, the accuracy of data search is further improved, the application scene is expanded, and the user experience is improved.
Illustratively, according to an embodiment of the present invention, the matching the word segmentation processing result with the data index stored in the search engine further includes:
determining at least one item of search fields, word contexts and word functional characteristics of words to which the words corresponding to the word segmentation processing results belong; and are
And determining a candidate data index according to the word characteristics corresponding to the word segmentation processing result, and matching the word segmentation processing result with the candidate data index.
Through the arrangement, after the received data to be searched is subjected to word segmentation processing, word characteristics of the word segmentation corresponding to the data to be searched are determined, and then the candidate data index is determined from the data index according to at least one of the search field, the word context and the word function characteristics of the word in the word characteristics, so that the number of the data indexes needing to be matched is reduced, the data search efficiency is improved, and meanwhile, the accuracy of the data search is improved by searching according to the search field.
Optionally, according to an embodiment of the present invention, after the step of classifying and extracting the plurality of stored data respectively by using the RNN model and the search field to determine the dictionary word corresponding to each stored data, the method further includes:
determining word frequency corresponding to the dictionary words;
dictionary words are updated according to word frequency.
Specifically, the term frequency may be a historical search frequency corresponding to the term, or may also be a frequency of occurrence of the term in the stored data, and by the above setting, the dictionary term with a higher term frequency is used as a basis for constructing a data index, which significantly improves data search efficiency.
Further, according to an embodiment of the present invention, determining data sources corresponding to a plurality of storage data, where the constructing a data index according to the RNN model and the search field further includes:
aiming at the storage data of different data sources, respectively constructing data indexes corresponding to the storage data of different data sources according to the RNN model and the search field; and are
And updating the data index in response to a stored data updating request sent by the data source.
Specifically, according to the embodiment of the present invention, the stored data sources are from different databases of the distributed storage system, where the different databases may have differences in access permissions set by different visitors, and the data indexes are respectively constructed for the stored data from the different data sources through the setting, which is helpful for subsequently adjusting the number of the data indexes to be matched according to permission requirements of different data sources when matching the word segmentation processing result with the data indexes, thereby improving the search efficiency and improving the user experience.
Optionally, according to an embodiment of the present invention, the method further includes:
responding to service requirements sent by different data sources, and determining an application number corresponding to each data source according to the service requirements;
and constructing a mapping table for indicating the corresponding relation between the data source and the application number.
Specifically, the service requirement indicates an access right of the database corresponding to the data source to the application corresponding to each application number.
Preferably, according to an embodiment of the present invention, after the step of receiving data to be searched, the method further includes:
determining an application number corresponding to data to be retrieved;
determining a data source set corresponding to the data to be retrieved according to the application number and the mapping table;
and matching the data to be searched with the data indexes corresponding to the data source set.
Through the setting, the application number corresponding to the data to be retrieved is determined, so that the number of the data indexes needing to be matched is adjusted according to authority requirements of different data sources (determined according to the application number and the mapping table) when the word segmentation processing result is matched with the data indexes, the searching efficiency is improved, and the user experience is improved.
Illustratively, according to an embodiment of the present invention, the step of matching the segmentation processing result with the data index stored in the search engine further includes:
respectively extracting the characteristics of the word segmentation processing result and the data index to obtain a word segmentation processing result characteristic vector and a data index characteristic vector;
and calculating the similarity between the word segmentation processing result feature vector and the data index feature vector to realize the matching between the word segmentation processing result and the data index.
Specifically, the metric for calculating the similarity may be an euclidean distance, a cosine included angle, or the like. According to a specific implementation of the embodiment of the present invention, any matching method in the prior art may also be adopted.
And step S103, acquiring target data from the distributed storage device according to the target index.
Specifically, according to the embodiment of the present invention, the number of the target indexes may be one or more, and after the target index is determined, corresponding search data (i.e., target data) is directly obtained according to the distributed database corresponding to the target index.
According to the technical scheme of the embodiment of the invention, the data to be searched is received and word segmentation processing is carried out on the data to be searched; matching the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein, the data index is constructed according to the RNN model and the search field; the technical means for acquiring the target data from the distributed storage device according to the target index overcomes the technical problems of low search accuracy, large difference of search effects in different fields, narrow applicable scene and poor user experience in the existing data search method, and further achieves the technical effects of bringing the search fields into consideration for constructing the data index, improving the accuracy of data search, expanding the applicable scene and improving the user experience.
Fig. 2 is a schematic diagram of a main flow of a data search method provided according to a second embodiment of the present invention; as shown in fig. 2, the data search method provided in the embodiment of the present invention mainly includes:
step S201, acquiring a plurality of storage data, and performing data cleaning and character conversion processing on the plurality of storage data.
Specifically, according to the embodiment of the present invention, the stored data is converted into a character stream form (i.e., a character conversion process is performed), and HTML tags and the like in the stored data are washed. Through the arrangement, the word segmentation effect is promoted.
And step S202, according to the search field and the RNN model, performing word extraction and word classification on the plurality of storage data after data cleaning in sequence to determine dictionary words corresponding to each storage data.
Specifically, according to the embodiment of the invention, the RNN model is used as an algorithm for word segmentation. RNN can be good to model the sequence, keeps the sequence information. The RNN sequentially inputs the segmented words obtained by the segmentation into the cyclic unit to generate a fixed-size vector representing the sequence, and obtains a final dictionary word. The RNN has better learning ability, so that the RNN has better recognition effect on ambiguous words and unknown words (words which do not appear in stored data), and further improves the data search accuracy.
Through the arrangement, based on different search fields, the stored data in the distributed database are classified and extracted by combining the RNN model, so that dictionary words corresponding to each stored data are determined, and a data index is conveniently constructed subsequently according to the dictionary words. The problem that the searching time is long due to the fact that all data indexes need to be inquired when data searching is carried out subsequently is solved.
Preferably, according to an embodiment of the present invention, the step of sequentially performing word classification on the plurality of pieces of stored data after the data washing further includes:
carrying out word classification on the plurality of stored data in sequence by combining word characteristics; wherein the word characteristics include at least one of the following characteristics: the term belongs to the search field, the term context and the term functional characteristics.
Through the arrangement, in the process of carrying out word classification on the stored data in the distributed storage device, word characteristics such as search fields to which the words belong, word contexts, word functional characteristics and the like are taken as classification bases, so that the subsequent determined word dictionary and the data index determined according to the word dictionary correspond to the search fields, the accuracy of data search is further improved, the application scene is expanded, and the user experience is improved.
According to a specific implementation of the embodiment of the invention, the words related to the fund and insurance are classified into the category of financial products by taking the financial field as an example.
Optionally, according to an embodiment of the present invention, after the step of determining the dictionary word corresponding to each piece of stored data, the method further includes:
determining word frequency corresponding to the dictionary words;
dictionary words are updated according to word frequency.
Specifically, the term frequency may be a historical search frequency corresponding to the term, or may also be a frequency of occurrence of the term in the stored data, and by the above setting, the dictionary term with a higher term frequency is used as a basis for constructing a data index, which significantly improves data search efficiency.
Step S203 builds a data index from the dictionary words and the storage locations where the storage data corresponding to the dictionary words are located.
It should be noted that, according to the embodiment of the present invention, the data indexes may be further extracted by classifying the locally stored storage data by the distributed databases, so as to construct data indexes, and then the data indexes respectively constructed by the distributed databases are summarized by the search system.
Further, according to an embodiment of the present invention, determining data sources corresponding to a plurality of storage data, where the constructing a data index further includes:
aiming at the storage data of different data sources, respectively constructing data indexes corresponding to the storage data of different data sources according to the RNN model and the search field; and are
And updating the data index in response to a stored data updating request sent by the data source.
Specifically, according to the embodiment of the present invention, the stored data sources are from different databases of the distributed storage system, where the different databases may have differences in access permissions set by different visitors, and the data indexes are respectively constructed for the stored data from the different data sources through the setting, which is helpful for subsequently adjusting the number of the data indexes to be matched according to permission requirements of different data sources when matching the word segmentation processing result with the data indexes, thereby improving the search efficiency and improving the user experience. According to a specific implementation manner of the embodiment of the present invention, a plurality of data sources accessing the search system may be adjusted according to actual requirements, such as adding a new data source, deleting an existing data source, and the like.
Optionally, according to an embodiment of the present invention, the method further includes:
responding to service requirements sent by different data sources, and determining an application number corresponding to each data source according to the service requirements;
and constructing a mapping table for indicating the corresponding relation between the data source and the application number.
Specifically, the service requirement indicates an access right of the database corresponding to the data source to the application corresponding to each application number.
And step S204, receiving the data to be searched, and performing word segmentation on the data to be searched.
Specifically, according to the embodiment of the present invention, the received data to be searched is the text "how much weather today! The word segmentation device receives the byte stream of the text, splits the byte stream into single word segments, and outputs a word segment stream, such as [ today, weather, how ] three words. In the above-mentioned process of word segmentation, a word dictionary corresponding to the data to be retrieved is actually generated, and in the process of word segmentation, the word frequency information can also be taken into consideration.
Further, according to the embodiment of the present invention, after receiving the data to be searched, the method further includes performing data cleaning and character conversion processing on the data to be retrieved.
Specifically, the character conversion process refers to converting characters (e.g., converting upper case to lower case); data cleansing refers to the removal of stop words (stop words are the most common words in any natural language, which may not be too valuable for composing a document, e.g. "o, no", etc., in order to analyze text data and construct NLP models). And the stop words are deleted or eliminated from the data to be searched, so that the word segmentation effect can be improved.
Preferably, according to an embodiment of the present invention, after the step of receiving data to be searched, the method further includes:
determining an application number corresponding to data to be retrieved;
determining a data source set corresponding to the data to be retrieved according to the application number and the mapping table;
and matching the data to be searched with the data indexes corresponding to the data source set.
Through the setting, the application number corresponding to the data to be retrieved is determined, so that the number of the data indexes needing to be matched is adjusted according to authority requirements of different data sources (determined according to the application number and the mapping table) when the word segmentation processing result is matched with the data indexes, the searching efficiency is improved, and the user experience is improved.
Step S205, determining at least one item of search field, word context and word function characteristics of the word corresponding to the word segmentation processing result; and determining a candidate data index according to the word characteristics corresponding to the word segmentation processing result, matching the word segmentation processing result with the candidate data index, and determining a target index.
Through the arrangement, after the received data to be searched is subjected to word segmentation processing, word characteristics of the word segmentation corresponding to the data to be searched are determined, and then the candidate data index is determined from the data index according to at least one of the search field, the word context and the word function characteristics of the word in the word characteristics, so that the number of the data indexes needing to be matched is reduced, the data search efficiency is improved, and meanwhile, the accuracy of the data search is improved by searching according to the search field.
Illustratively, according to an embodiment of the present invention, the step of matching the segmentation processing result with the data index stored in the search engine further includes:
respectively extracting the characteristics of the word segmentation processing result and the data index to obtain a word segmentation processing result characteristic vector and a data index characteristic vector;
and calculating the similarity between the word segmentation processing result feature vector and the data index feature vector to realize the matching between the word segmentation processing result and the data index.
Specifically, the metric for calculating the similarity may be an euclidean distance, a cosine included angle, or the like. According to a specific implementation of the embodiment of the present invention, any matching method in the prior art may also be adopted.
And step S206, acquiring target data from the distributed storage device according to the target index.
Specifically, according to the embodiment of the present invention, the number of the target indexes may be one or more, and after the target index is determined, corresponding search data (i.e., target data) is directly obtained according to the distributed database corresponding to the target index.
According to the technical scheme of the embodiment of the invention, the data to be searched is received and word segmentation processing is carried out on the data to be searched; matching the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein, the data index is constructed according to the RNN model and the search field; the technical means for acquiring the target data from the distributed storage device according to the target index overcomes the technical problems of low search accuracy, large difference of search effects in different fields, narrow applicable scene and poor user experience in the existing data search method, and further achieves the technical effects of bringing the search fields into consideration for constructing the data index, improving the accuracy of data search, expanding the applicable scene and improving the user experience.
FIG. 3 is a schematic diagram of the main modules of a data search apparatus provided according to an embodiment of the present invention; as shown in fig. 3, the data search apparatus 300 according to the embodiment of the present invention mainly includes:
and the word segmentation processing module 301 is configured to receive data to be searched and perform word segmentation processing on the data to be searched.
Specifically, according to the embodiment of the present invention, the data to be searched can be segmented according to the techniques such as knowledge graph and segmentation rule. The word segmentation processing steps can be executed by adopting the existing word segmentation algorithm, and the word segmentation processing is carried out on the data to be searched, so that the subsequent searching is conveniently carried out according to the word segmentation processing result, and the data searching efficiency is improved.
Further, according to the embodiment of the present invention, the data searching apparatus 300 further includes a to-be-retrieved data adjusting module, before the step of performing word segmentation processing on the to-be-retrieved data, configured to:
and performing natural language processing on the data to be searched to adjust the data to be retrieved.
Specifically, according to the embodiment of the present invention, after receiving the data to be searched, the adjustment operations such as adding, deleting, modifying, and the like can be performed on the data to be retrieved through text processing and search sentence understanding (specifically including modes such as language detection, word segmentation, labeling, and intention recognition), so as to improve the word segmentation processing efficiency and the word segmentation effect.
According to a specific implementation manner of the embodiment of the present invention, a search field to which the data to be retrieved belongs may also be determined, and the word segmentation processing result may be adjusted according to a technical term, a professional vocabulary, and the like corresponding to the search field.
A matching module 302, configured to match the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein the data index is constructed according to the RNN model and the search domain.
Specifically, according to the embodiment of the present invention, the data search apparatus 300 further includes a data index building module, configured to:
acquiring a plurality of storage data, and respectively classifying and extracting the plurality of storage data by adopting an RNN (radio network) model and a search field to determine dictionary words corresponding to each storage data;
and constructing a data index according to the dictionary words and the storage positions of the storage data corresponding to the dictionary words.
Through the arrangement, based on different search fields, the stored data in the distributed database are classified and extracted by combining the RNN model, and then the data index is constructed. The problem that the searching time is long due to the fact that all data indexes need to be inquired when data searching is carried out subsequently is solved. It should be noted that, according to the embodiment of the present invention, the data indexes may be further extracted by classifying the locally stored storage data by the distributed databases, so as to construct data indexes, and then the data indexes respectively constructed by the distributed databases are summarized by the search system.
Further, according to an embodiment of the present invention, the data index building module is further configured to:
carrying out data cleaning and character conversion processing on a plurality of stored data;
and according to the search field and the RNN model, sequentially performing word extraction and word classification on the plurality of storage data after data cleaning to determine dictionary words corresponding to each storage data.
Specifically, according to the embodiment of the present invention, before the step of classifying and extracting the plurality of pieces of stored data, the stored data is converted into a character stream form (i.e., a character conversion processing procedure is performed), and HTML tags and the like in the stored data are washed. Through the arrangement, the word segmentation effect is promoted.
Preferably, according to an embodiment of the present invention, the data index building module is further configured to:
carrying out word classification on the plurality of stored data in sequence by combining word characteristics; wherein the word characteristics include at least one of the following characteristics: the term belongs to the search field, the term context and the term functional characteristics.
Through the arrangement, in the process of carrying out word classification on the stored data in the distributed storage device, word characteristics such as search fields to which the words belong, word contexts, word functional characteristics and the like are taken as classification bases, so that the subsequent determined word dictionary and the data index determined according to the word dictionary correspond to the search fields, the accuracy of data search is further improved, the application scene is expanded, and the user experience is improved.
Illustratively, according to an embodiment of the present invention, the matching module 302 is further configured to:
determining at least one item of search fields, word contexts and word functional characteristics of words to which the words corresponding to the word segmentation processing results belong; and are
And determining a candidate data index according to the word characteristics corresponding to the word segmentation processing result, and matching the word segmentation processing result with the candidate data index.
Through the arrangement, after the received data to be searched is subjected to word segmentation processing, word characteristics of the word segmentation corresponding to the data to be searched are determined, and then the candidate data index is determined from the data index according to at least one of the search field, the word context and the word function characteristics of the word in the word characteristics, so that the number of the data indexes needing to be matched is reduced, the data search efficiency is improved, and meanwhile, the accuracy of the data search is improved by searching according to the search field.
Optionally, according to an embodiment of the present invention, the data search apparatus 300 further includes a dictionary word updating module, after the step of respectively classifying and extracting the plurality of stored data by using the RNN model and the search field to determine the dictionary word corresponding to each stored data, configured to:
determining word frequency corresponding to the dictionary words;
dictionary words are updated according to word frequency.
Specifically, the term frequency may be a historical search frequency corresponding to the term, or may also be a frequency of occurrence of the term in the stored data, and by the above setting, the dictionary term with a higher term frequency is used as a basis for constructing a data index, which significantly improves data search efficiency.
Further, according to an embodiment of the present invention, the data index building module further includes:
determining data sources corresponding to a plurality of storage data, and respectively constructing data indexes corresponding to the storage data of different data sources according to the RNN model and the search field aiming at the storage data of different data sources; and are
And updating the data index in response to a stored data updating request sent by the data source.
Specifically, according to the embodiment of the present invention, the stored data sources are from different databases of the distributed storage system, where the different databases may have differences in access permissions set by different visitors, and the data indexes are respectively constructed for the stored data from the different data sources through the setting, which is helpful for subsequently adjusting the number of the data indexes to be matched according to permission requirements of different data sources when matching the word segmentation processing result with the data indexes, thereby improving the search efficiency and improving the user experience.
Optionally, according to an embodiment of the present invention, the data search apparatus 300 further includes a mapping table constructing module, configured to:
responding to service requirements sent by different data sources, and determining an application number corresponding to each data source according to the service requirements;
and constructing a mapping table for indicating the corresponding relation between the data source and the application number.
Specifically, the service requirement indicates an access right of the database corresponding to the data source to the application corresponding to each application number.
Preferably, after the step of receiving the data to be searched, the matching module 302 is further configured to:
determining an application number corresponding to data to be retrieved;
determining a data source set corresponding to the data to be retrieved according to the application number and the mapping table;
and matching the data to be searched with the data indexes corresponding to the data source set.
Through the setting, the application number corresponding to the data to be retrieved is determined, so that the number of the data indexes needing to be matched is adjusted according to authority requirements of different data sources (determined according to the application number and the mapping table) when the word segmentation processing result is matched with the data indexes, the searching efficiency is improved, and the user experience is improved.
Illustratively, according to an embodiment of the present invention, the matching module 302 is further configured to:
respectively extracting the characteristics of the word segmentation processing result and the data index to obtain a word segmentation processing result characteristic vector and a data index characteristic vector;
and calculating the similarity between the word segmentation processing result feature vector and the data index feature vector to realize the matching between the word segmentation processing result and the data index.
Specifically, the metric for calculating the similarity may be an euclidean distance, a cosine included angle, or the like. According to a specific implementation of the embodiment of the present invention, any matching method in the prior art may also be adopted.
And a target data obtaining module 303, configured to obtain target data from the distributed storage according to the target index.
Specifically, according to the embodiment of the present invention, the number of the target indexes may be one or more, and after the target index is determined, corresponding search data (i.e., target data) is directly obtained according to the distributed database corresponding to the target index.
According to the technical scheme of the embodiment of the invention, the data to be searched is received and word segmentation processing is carried out on the data to be searched; matching the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein, the data index is constructed according to the RNN model and the search field; the technical means for acquiring the target data from the distributed storage device according to the target index overcomes the technical problems of low search accuracy, large difference of search effects in different fields, narrow applicable scene and poor user experience in the existing data search method, and further achieves the technical effects of bringing the search fields into consideration for constructing the data index, improving the accuracy of data search, expanding the applicable scene and improving the user experience.
Fig. 4 shows an exemplary system architecture 400 to which the data search method or the data search apparatus of the embodiments of the present invention may be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405 (this architecture is merely an example, and the components included in a particular architecture may be adapted according to application specific circumstances). The network 404 serves as a medium for providing communication links between the terminal devices 401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have installed thereon various communication client applications, such as a shopping-type application, a web browser application, a search-type application, an instant messaging tool, a data search-type client, data search-type software, etc. (by way of example only).
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 405 may be a server that provides various services, such as a server (for example only) for (performing data search/performing data processing on) users using the terminal devices 401, 402, 403. The server may perform processing such as analysis on the received data to be searched, and feed back the processing result (e.g., target index, target data — just an example) to the terminal device.
It should be noted that the data search method provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the data search apparatus is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use with a terminal device or server implementing an embodiment of the invention is shown. The terminal device or the server shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a word segmentation processing module, a matching module, and a target data acquisition module. The names of the modules do not limit the modules themselves in some cases, for example, the word segmentation processing module may also be described as a "module for receiving data to be searched and performing word segmentation processing on the data to be searched".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: receiving data to be searched, and performing word segmentation processing on the data to be searched; matching the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein, the data index is constructed according to the RNN model and the search field; and acquiring target data from the distributed storage device according to the target index.
According to the technical scheme of the embodiment of the invention, the data to be searched is received and word segmentation processing is carried out on the data to be searched; matching the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein, the data index is constructed according to the RNN model and the search field; the technical means for acquiring the target data from the distributed storage device according to the target index overcomes the technical problems of low search accuracy, large difference of search effects in different fields, narrow applicable scene and poor user experience in the existing data search method, and further achieves the technical effects of bringing the search fields into consideration for constructing the data index, improving the accuracy of data search, expanding the applicable scene and improving the user experience.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (15)
1. A method of searching data, comprising:
receiving data to be searched, and performing word segmentation processing on the data to be searched;
matching the word segmentation processing result with a data index stored in a search engine to determine a target index; wherein the data index is constructed according to an RNN model and a search domain;
and acquiring target data from the distributed storage device according to the target index.
2. The data searching method of claim 1, wherein the step of constructing the data index according to the RNN model and the search domain comprises:
acquiring a plurality of storage data, and respectively classifying and extracting the storage data by adopting an RNN (radio network) model and a search field to determine dictionary words corresponding to each storage data;
and constructing the data index according to the dictionary words and the storage positions of the storage data corresponding to the dictionary words.
3. The data searching method of claim 2, wherein the classifying and extracting the plurality of stored data by using the RNN model and the search domain to determine the dictionary word corresponding to each stored data further comprises:
performing data cleaning and character conversion processing on the plurality of stored data;
and according to the search field and the RNN model, sequentially performing word extraction and word classification on the plurality of storage data after data cleaning to determine dictionary words corresponding to each storage data.
4. The data search method according to claim 3, wherein the step of sequentially performing word classification on the plurality of stored data after the data washing further comprises:
carrying out word classification on the plurality of stored data in sequence by combining word characteristics; wherein the word characteristics comprise at least one of the following characteristics: the term belongs to the search field, the term context and the term functional characteristics.
5. The data searching method of claim 4, wherein the matching the segmentation processing result with a data index stored in a search engine further comprises:
determining at least one item of search fields, word contexts and word functional characteristics of the words corresponding to the word segmentation processing results; and are
Determining a candidate data index according to the word characteristics corresponding to the word segmentation processing result, and matching the word segmentation processing result with the candidate data index.
6. The data searching method of claim 2, wherein after the step of classifying and extracting the plurality of stored data respectively by using the RNN model and the search domain to determine a dictionary word corresponding to each stored data, the method further comprises:
determining word frequency corresponding to the dictionary words;
updating the dictionary words according to the word frequency.
7. The data searching method of claim 2, wherein determining the data sources corresponding to the plurality of stored data, and wherein constructing the data index according to the RNN model and the search domain further comprises:
aiming at the storage data of different data sources, respectively constructing data indexes corresponding to the storage data of different data sources according to the RNN model and the search field; and are
And updating the data index in response to a stored data updating request sent by a data source.
8. The data searching method of claim 7, further comprising:
responding to service requirements sent by different data sources, and determining an application number corresponding to each data source according to the service requirements;
and constructing a mapping table for indicating the corresponding relation between the data source and the application number.
9. The data searching method of claim 8, wherein after the step of receiving data to be searched, the method further comprises:
determining an application number corresponding to the data to be retrieved;
determining a data source set corresponding to the data to be retrieved according to the application number and the mapping table;
and matching the data to be searched with the data index corresponding to the data source set.
10. The data searching method of claim 1, wherein the step of matching the segmentation processing result with a data index stored in a search engine further comprises:
respectively extracting the characteristics of the word segmentation processing result and the data index to obtain a word segmentation processing result characteristic vector and a data index characteristic vector;
and calculating the similarity between the word segmentation processing result feature vector and the data index feature vector so as to realize the matching between the word segmentation processing result and the data index.
11. The data searching method of claim 1, wherein before the step of performing word segmentation processing on the data to be searched, the method further comprises:
and carrying out natural language processing on the data to be searched so as to adjust the data to be retrieved.
12. A data search apparatus, comprising:
the word segmentation processing module is used for receiving data to be searched and carrying out word segmentation processing on the data to be searched;
the matching module is used for matching the word segmentation processing result with a data index stored in a search engine so as to determine a target index; wherein the data index is constructed according to an RNN model and a search domain;
and the target data acquisition module is used for acquiring target data from the distributed storage device according to the target index.
13. The apparatus of claim 12, further comprising a data index building module configured to:
acquiring a plurality of storage data, and respectively classifying and extracting the storage data by adopting an RNN (radio network) model and a search field to determine dictionary words corresponding to each storage data;
and constructing the data index according to the dictionary words and the storage positions of the storage data corresponding to the dictionary words.
14. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-11.
15. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110349226.2A CN112988753B (en) | 2021-03-31 | 2021-03-31 | Data searching method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110349226.2A CN112988753B (en) | 2021-03-31 | 2021-03-31 | Data searching method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112988753A true CN112988753A (en) | 2021-06-18 |
CN112988753B CN112988753B (en) | 2022-10-11 |
Family
ID=76339202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110349226.2A Active CN112988753B (en) | 2021-03-31 | 2021-03-31 | Data searching method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112988753B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779032A (en) * | 2021-09-14 | 2021-12-10 | 广州汇通国信科技有限公司 | Search engine index construction method and device based on recurrent neural network |
CN114564928A (en) * | 2022-02-25 | 2022-05-31 | 北京圣博润高新技术股份有限公司 | File management method, device, equipment and storage medium for office system |
CN116226362A (en) * | 2023-05-06 | 2023-06-06 | 湖南德雅曼达科技有限公司 | Word segmentation method for improving accuracy of searching hospital names |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982034A (en) * | 2011-09-05 | 2013-03-20 | 腾讯科技(深圳)有限公司 | Internet website information search method and search system |
CN104537101A (en) * | 2015-01-12 | 2015-04-22 | 杏树林信息技术(北京)有限公司 | Medical information search engine system and search method |
CN106777250A (en) * | 2016-12-27 | 2017-05-31 | 努比亚技术有限公司 | A kind of word segmentation result system of selection and device |
CN107818130A (en) * | 2017-09-15 | 2018-03-20 | 深圳市电陶思创科技有限公司 | The method for building up and system of a kind of search engine |
CN108055348A (en) * | 2017-12-26 | 2018-05-18 | 广东欧珀移动通信有限公司 | Method for adjusting data transmission priority and related equipment |
CN108804642A (en) * | 2018-06-05 | 2018-11-13 | 中国平安人寿保险股份有限公司 | Search method, device, computer equipment and storage medium |
CN109948149A (en) * | 2019-02-28 | 2019-06-28 | 腾讯科技(深圳)有限公司 | A kind of file classification method and device |
CN111488426A (en) * | 2020-04-17 | 2020-08-04 | 支付宝(杭州)信息技术有限公司 | Query intention determining method and device and processing equipment |
CN111831833A (en) * | 2020-07-27 | 2020-10-27 | 人民卫生电子音像出版社有限公司 | Knowledge graph construction method and device |
CN112256822A (en) * | 2020-10-21 | 2021-01-22 | 平安科技(深圳)有限公司 | Text search method, apparatus, computer equipment and storage medium |
CN112307753A (en) * | 2020-12-29 | 2021-02-02 | 启业云大数据(南京)有限公司 | Word segmentation method supporting large word stock, computer readable storage medium and system |
CN112380420A (en) * | 2020-11-11 | 2021-02-19 | Vidaa美国公司 | Searching method and display device |
-
2021
- 2021-03-31 CN CN202110349226.2A patent/CN112988753B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982034A (en) * | 2011-09-05 | 2013-03-20 | 腾讯科技(深圳)有限公司 | Internet website information search method and search system |
CN104537101A (en) * | 2015-01-12 | 2015-04-22 | 杏树林信息技术(北京)有限公司 | Medical information search engine system and search method |
CN106777250A (en) * | 2016-12-27 | 2017-05-31 | 努比亚技术有限公司 | A kind of word segmentation result system of selection and device |
CN107818130A (en) * | 2017-09-15 | 2018-03-20 | 深圳市电陶思创科技有限公司 | The method for building up and system of a kind of search engine |
CN108055348A (en) * | 2017-12-26 | 2018-05-18 | 广东欧珀移动通信有限公司 | Method for adjusting data transmission priority and related equipment |
CN108804642A (en) * | 2018-06-05 | 2018-11-13 | 中国平安人寿保险股份有限公司 | Search method, device, computer equipment and storage medium |
CN109948149A (en) * | 2019-02-28 | 2019-06-28 | 腾讯科技(深圳)有限公司 | A kind of file classification method and device |
CN111488426A (en) * | 2020-04-17 | 2020-08-04 | 支付宝(杭州)信息技术有限公司 | Query intention determining method and device and processing equipment |
CN111831833A (en) * | 2020-07-27 | 2020-10-27 | 人民卫生电子音像出版社有限公司 | Knowledge graph construction method and device |
CN112256822A (en) * | 2020-10-21 | 2021-01-22 | 平安科技(深圳)有限公司 | Text search method, apparatus, computer equipment and storage medium |
CN112380420A (en) * | 2020-11-11 | 2021-02-19 | Vidaa美国公司 | Searching method and display device |
CN112307753A (en) * | 2020-12-29 | 2021-02-02 | 启业云大数据(南京)有限公司 | Word segmentation method supporting large word stock, computer readable storage medium and system |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779032A (en) * | 2021-09-14 | 2021-12-10 | 广州汇通国信科技有限公司 | Search engine index construction method and device based on recurrent neural network |
CN113779032B (en) * | 2021-09-14 | 2024-03-12 | 广州汇通国信科技有限公司 | Search engine index construction method and device based on cyclic neural network |
CN114564928A (en) * | 2022-02-25 | 2022-05-31 | 北京圣博润高新技术股份有限公司 | File management method, device, equipment and storage medium for office system |
CN114564928B (en) * | 2022-02-25 | 2024-02-27 | 北京圣博润高新技术股份有限公司 | File management method, device, equipment and storage medium for office system |
CN116226362A (en) * | 2023-05-06 | 2023-06-06 | 湖南德雅曼达科技有限公司 | Word segmentation method for improving accuracy of searching hospital names |
Also Published As
Publication number | Publication date |
---|---|
CN112988753B (en) | 2022-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11062089B2 (en) | Method and apparatus for generating information | |
CN107679039B (en) | Method and device for determining statement intention | |
CN107491534B (en) | Information processing method and device | |
CN108628830B (en) | Semantic recognition method and device | |
CN107301170B (en) | Method and device for segmenting sentences based on artificial intelligence | |
CN114840671B (en) | Dialogue generation method, model training method, device, equipment and medium | |
CN112988753B (en) | Data searching method and device | |
CN112860919B (en) | Data labeling method, device, equipment and storage medium based on generation model | |
CN110046254B (en) | Method and apparatus for generating a model | |
CN107203504B (en) | Character string replacing method and device | |
CN113128209B (en) | Method and device for generating word stock | |
WO2023024975A1 (en) | Text processing method and apparatus, and electronic device | |
CN112148841B (en) | Object classification and classification model construction method and device | |
CN114861889A (en) | Deep learning model training method, target object detection method and device | |
CN111368697A (en) | Information identification method and device | |
CN115909376A (en) | Text recognition method, text recognition model training device and storage medium | |
CN113742485A (en) | A method and apparatus for processing text | |
CN107766498B (en) | Method and apparatus for generating information | |
CN111538817A (en) | Man-machine interaction method and device | |
CN110737820B (en) | Method and apparatus for generating event information | |
CN115168622A (en) | Language model training method and device, electronic equipment and storage medium | |
CN113688268A (en) | Picture information extraction method and device, computer equipment and storage medium | |
CN111368693A (en) | Identification method and device for identity card information | |
CN116484826B (en) | Operation ticket generation method, device, equipment and storage medium | |
CN112784596A (en) | Method and device for identifying sensitive words |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220926 Address after: 25 Financial Street, Xicheng District, Beijing 100033 Applicant after: CHINA CONSTRUCTION BANK Corp. Address before: 12 / F, 15 / F, No. 99, Yincheng Road, Shanghai pilot Free Trade Zone, 200120 Applicant before: Jianxin Financial Science and Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right |