[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113051460A - Elasticissearch-based data retrieval method and system, electronic device and storage medium - Google Patents

Elasticissearch-based data retrieval method and system, electronic device and storage medium Download PDF

Info

Publication number
CN113051460A
CN113051460A CN202110336591.XA CN202110336591A CN113051460A CN 113051460 A CN113051460 A CN 113051460A CN 202110336591 A CN202110336591 A CN 202110336591A CN 113051460 A CN113051460 A CN 113051460A
Authority
CN
China
Prior art keywords
information
retrieval
index
elasticsearch
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110336591.XA
Other languages
Chinese (zh)
Inventor
张裴裴
王雪峰
骆飞
李青龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Smart Starlight Information Technology Co ltd
Original Assignee
Beijing Smart Starlight Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Smart Starlight Information Technology Co ltd filed Critical Beijing Smart Starlight Information Technology Co ltd
Priority to CN202110336591.XA priority Critical patent/CN113051460A/en
Publication of CN113051460A publication Critical patent/CN113051460A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于Elasticsearch的数据检索方法、系统、电子设备及存储介质,该方法包括:将互联网信息通过采集系统得到所对应的信息分类和采集时间;根据信息分类和采集时间确定每一个互联网数据在Elasticsearch集群所对应的索引名称;根据索引名称将互联网数据存储至Elasticsearch集群中索引名称所对应的索引;获取待检索信息,待检索信息包括检索关键词、检索信息的信息分类和检索时间范围;将待检索信息依照Elasticsearch的查询语法生成检索语句;根据检索语句得到检索语句所对应的索引检索范围;根据索引检索范围在Elasticsearch集群中进行检索得到检索结果。根据信息分类和采集时间将互联网信息存储在Elasticsearch集群中所对应的索引中,在检索时可进行指定索引的检索,实现多维度全文检索,提高检索效率。

Figure 202110336591

The invention discloses a data retrieval method, system, electronic device and storage medium based on Elasticsearch. The method includes: obtaining corresponding information classification and collection time through Internet information through a collection system; determining each information classification and collection time according to the information classification and collection time The index name corresponding to the Internet data in the Elasticsearch cluster; store the Internet data in the index corresponding to the index name in the Elasticsearch cluster according to the index name; obtain the information to be retrieved, including the retrieval keywords, the information classification and retrieval time of the retrieval information Scope; the information to be retrieved is generated according to the query syntax of Elasticsearch; the index retrieval scope corresponding to the retrieval sentence is obtained according to the retrieval sentence; the retrieval result is obtained by retrieving in the Elasticsearch cluster according to the index retrieval scope. According to the information classification and collection time, the Internet information is stored in the corresponding index in the Elasticsearch cluster, and the specified index can be retrieved during retrieval to realize multi-dimensional full-text retrieval and improve retrieval efficiency.

Figure 202110336591

Description

Elasticissearch-based data retrieval method and system, electronic device and storage medium
Technical Field
The invention relates to the field of internet data processing, in particular to a data retrieval method and system based on an elastic search, electronic equipment and a storage medium.
Background
The current information retrieval is mainly full-text retrieval by using keywords, and mainstream search engines in the market can only search related text information of webpages, which brings great inconvenience to the retrieval, and has the advantages of large retrieval range, long retrieval time and low retrieval efficiency.
Disclosure of Invention
In view of this, embodiments of the present invention provide an elastic search based data retrieval method, system, electronic device and storage medium, so as to solve the disadvantage of low retrieval efficiency in the prior art.
Therefore, the embodiment of the invention provides the following technical scheme:
according to a first aspect, an embodiment of the present invention provides an elastic search based data retrieval method, including: acquiring internet information, wherein the internet information comprises a plurality of internet data; the method comprises the steps that information classification and collection time corresponding to each piece of internet data are obtained through internet information through a collection system, and the information classification is used for representing the source position of the internet data; adapting through an information index adaptation module according to the information classification and acquisition time, and determining an index name corresponding to each internet data in the Elasticissearch cluster; storing the Internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name; acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range; generating a retrieval statement for the information to be retrieved according to the query syntax of the Elasticissearch; searching in the information index adaptation module according to the retrieval statement to obtain an index retrieval range corresponding to the retrieval statement; and searching in the Elasticissearch cluster according to the index searching range to obtain a searching result.
Optionally, after the step of retrieving in the Elasticsearch cluster according to the index retrieval range to obtain the retrieval result, the method further includes: and displaying the retrieval result.
Optionally, the step of performing result display on the search result includes: acquiring display requirement information; identifying the retrieval result according to the display demand information to obtain the identified retrieval result; and displaying the identified retrieval result.
Optionally, the display requirement information includes a keyword color and preset attribute extraction information.
Optionally, after the step of storing the internet data into the index corresponding to the index name in the Elasticsearch cluster according to the index name, the method further includes: determining index deletion time according to service requirements; and deleting the index with the earlier index time according to the preset deletion period according to the index deletion time.
Optionally, before the step of adapting through the information index adaptation module according to the information classification and acquisition time, the method further includes: establishing an index in an Elasticissearch cluster in advance; and mapping the indexes with the information classification and acquisition time one by one.
According to a second aspect, an embodiment of the present invention provides an elastic search based data retrieval system, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring internet information which comprises a plurality of internet data; the first processing module is used for classifying and acquiring information corresponding to each piece of internet data obtained by the internet information through the acquisition system, wherein the information classification is used for representing the source position of the internet data; the second processing module is used for carrying out adaptation through the information index adaptation module according to the information classification and acquisition time and determining the index name corresponding to each piece of internet data in the Elasticissearch cluster; the third processing module is used for storing the internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name; the second acquisition module is used for acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range; the fourth processing module is used for generating a retrieval statement for the information to be retrieved according to the query grammar of the Elasticissearch; the fifth processing module is used for searching in the information index adaptation module according to the retrieval statement to obtain an index retrieval range corresponding to the retrieval statement; and the sixth processing module is used for searching in the Elasticissearch cluster according to the index searching range to obtain a searching result.
Optionally, the method further comprises: and the seventh processing module is used for displaying the result of the retrieval result.
Optionally, the seventh processing module includes: the first acquisition unit is used for acquiring the display requirement information; the first processing unit is used for identifying the retrieval result according to the display requirement information to obtain the identified retrieval result; and displaying the identified retrieval result.
Optionally, the display requirement information includes a keyword color and preset attribute extraction information.
Optionally, the method further comprises: the eighth processing module is used for determining the index deletion time according to the service requirement; and the ninth processing module is used for deleting the indexes with the index time ahead according to the index deletion time and the preset deletion period.
Optionally, the method further comprises: a tenth processing module, configured to establish an index in the Elasticsearch cluster in advance; and the eleventh processing module is used for mapping the indexes with the information classification and acquisition time one by one.
According to a third aspect, an embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method for elistic search based data retrieval as described in any of the above first aspects.
According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer instructions are stored, and the computer instructions are configured to cause a computer to execute the method for retrieving data based on Elasticsearch described in any of the first aspect.
The technical scheme of the embodiment of the invention has the following advantages:
the embodiment of the invention provides a data retrieval method, a system, electronic equipment and a storage medium based on an elastic search, wherein the method comprises the following steps: acquiring internet information, wherein the internet information comprises a plurality of internet data; the method comprises the steps that information classification and collection time corresponding to each piece of internet data are obtained through internet information through a collection system, and the information classification is used for representing the source position of the internet data; adapting through an information index adaptation module according to the information classification and acquisition time, and determining an index name corresponding to each internet data in the Elasticissearch cluster; storing the Internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name; acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range; generating a retrieval statement for the information to be retrieved according to the query syntax of the Elasticissearch; searching in the information index adaptation module according to the retrieval statement to obtain an index retrieval range corresponding to the retrieval statement; and searching in the Elasticissearch cluster according to the index searching range to obtain a searching result. In the steps, the internet information is stored in the index corresponding to the Elasticissearch cluster according to the information classification and acquisition time of the internet information, and the index can be searched according to the information classification and acquisition time in the subsequent search, so that the multi-dimensional full-text search of the information classification and acquisition time is realized, and the search efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a specific example of an Elasticsearch-based data retrieval method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a specific example of an Elasticsearch cluster index of the Elasticsearch-based data retrieval method according to the embodiment of the present invention;
FIG. 3 is a block diagram of a specific example of an Elasticissearch-based data retrieval system according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a data retrieval method based on an Elasticsearch, as shown in fig. 1, the method includes steps S1-S8.
Step S1: internet information is acquired, and the Internet information comprises a plurality of Internet data.
In this embodiment, the internet information includes a plurality of internet data, and the internet information can be provided by the collection system, and the collection system is responsible for information collection, and specifically, the collection system obtains the internet information by the collaborative crawler. This is only illustrated schematically in the present embodiment, and is not limited thereto. The acquired internet information specifically comprises internet data and information classification and acquisition time corresponding to the internet data.
Step S2: and obtaining information classification and acquisition time corresponding to each piece of internet data through the acquisition system for the internet information, wherein the information classification is used for representing the source position of the internet data.
In this embodiment, the information classification is used to characterize the source location of the internet data, and the acquisition time is the time when the acquisition system acquires the information. Specifically, the source location may be different websites such as WeChat, microblog, Baidu, headline, and surf, and the different websites may be different information categories. In this embodiment, the information classification mainly includes: the present invention relates to a network media (web), a microblog (weibo), a Weixin (weixin), a forum (forum), a bar (baidu), a headline (toutiao), a newspaper (printmedia), a video (video), etc., which are only schematically described in this embodiment, and are not limited thereto; in other embodiments, the information classification may also include other classifications, which may be set as appropriate as needed.
Specifically, the acquisition system acquires the acquired information classification by configuring a specific information classification for the collaborative crawler, for example, if a certain group of crawlers is responsible for acquiring microblog data, info _ flag is added to the information, and another group of crawlers is responsible for acquiring hectometer bar data, then info _ flag is added to the information, and by analogy, different data sources have different identifications; the collection time is the time when the crawler collects the information, and ctime is currentTime.
Step S3: and adapting through an information index adaptation module according to the information classification and acquisition time, and determining the index name corresponding to each internet data in the Elasticissearch cluster.
In this embodiment, the information index adaptation module is mainly responsible for adapting the acquired information to the index in the Elasticsearch cluster. The module can map information classification and acquisition time with indexes in the Elasticissearch cluster, and mainly aims to uniformly manage index names so as to store acquired acquisition data to corresponding indexes in the Elasticissearch cluster, and meanwhile, the module can play a decoupling role in a retrieval process.
Specifically, an index is generated in the elastic search cluster in advance, the name of the index is determined according to the index information classification and the index acquisition time, and the specific format may be the index acquisition time _ index information classification. In this embodiment, the index acquisition time may be accurate to the day, and of course, in other embodiments, the index acquisition time may also be set to other values, for example, one week, half a month, and the like, and may be reasonably set according to actual needs. This is only schematically described in the present embodiment, and is not limited thereto. One specific example of the index established in the Elasticsearch cluster is shown in fig. 2.
For example, if the index information is classified as microblog and the index collection time is 20210315 a day, the collection data from the microblog and the collection time 20210315 in the whole day is stored in the folder of "20210315 _ microblog".
The acquisition system pushes the information to the message queue, and the information index adaptation module reads the information in the message queue. The information exists in a json format in the message queue, the information is referred to as data for short, info _ flag in the data is an information classification identifier, gtime is information acquisition time, ctime is information publishing time, and the module maps an index name according to the information classification and acquisition time, wherein the index name is as follows: data.get ("gtime") + "_" + data.get ("info _ flag"), such as information classified as a microblog, with an acquisition time of 20201219, the index name is: 20201219_ weibo.
Step S4: and storing the Internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name.
In this embodiment, an index storage space is established for each index name in the Elasticsearch cluster, so that the collected data is stored according to the index name, and in the subsequent retrieval process, the retrieval range can also be determined according to the retrieval statement. After the index name corresponding to each internet data is obtained, the internet information can be respectively stored in the Elasticsearch cluster according to the index name.
Step S5: and acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range.
In this embodiment, the information to be retrieved is determined according to the retrieval requirement, and may specifically include the retrieval keyword, the information classification of the retrieval information, and the retrieval time range.
Step S6: and generating a retrieval statement for the information to be retrieved according to the query grammar of the Elasticissearch.
In this embodiment, according to the specific search keyword to be searched, the information classification of the search information, and the search time range, the information and index adapter module is called, a specific search statement is generated according to the query syntax of the Elasticsearch, and then the Elasticsearch cluster is used for searching.
The key of the search refers to a keyword that the user wants to search, for example, if the user wants to search the information that the keyword of "two parties" is on the "microblog" platform and the time range is within 20210321 and 20210322, the name of the searched index is: 20210321_ weibo,20210322_ weibo, the retrieved statement is:
{ "query": { "bone": { "filter": { "bone": { "must _ not": { "term": { "data _ type":3} },' must ": {" range ": {" public _ time ": {" gte ": 1615824000000", "lte": 1616428740000"}, {" bone ": {" short ": {" shell ": } } } } }," mut ": {" query ": two parties", "idfields": [ "title", "content" } } } }.
Step S7: and searching in the information index adaptation module according to the retrieval statement to obtain an index retrieval range corresponding to the retrieval statement.
In this embodiment, the index statement includes the information classification and the time range of the information to be retrieved, so that the index retrieval range in the Elasticsearch cluster corresponding to the information to be retrieved can be determined.
Step S8: and searching in the Elasticissearch cluster according to the index searching range to obtain a searching result.
In this embodiment, the index name may be determined according to the index retrieval range, and then the acquired data stored in the index corresponding to the index name is found by searching in the Elasticsearch cluster according to the index name, and the acquired data is retrieved to obtain the retrieval result.
In the steps, the internet information is stored in the index corresponding to the Elasticissearch cluster according to the information classification and acquisition time of the internet information, and the index can be searched according to the information classification and acquisition time in the subsequent search, so that the multi-dimensional full-text search of the information classification and acquisition time is realized, and the search efficiency is improved.
As an exemplary embodiment, the step S8 is further included after the step of retrieving the results of the retrieval in the Elasticsearch cluster according to the index retrieval range, and the step S9 is included.
Step S9: and displaying the retrieval result.
In the present embodiment, step S9 includes steps S91-S93.
Step S91: and acquiring display requirement information.
In this embodiment, the display requirement information is determined according to the user retrieval requirement. Specifically, the display requirement information comprises keyword colors and preset attribute extraction information; this is only schematically illustrated in the present embodiment, which is not limited to this, and the present embodiment may be reasonably configured as required in practical application.
Wherein, the keywords are retrieval keywords input by the user; the preset attribute is a key attribute, the key attribute belongs to the service characteristics of the service system, for example, in the public opinion industry, information publishing time, author figure images, information forwarding chains and the like all belong to the key attribute, and the service system processes information according to the service characteristics of the service system.
Step S92: and identifying the retrieval result according to the display demand information to obtain the identified retrieval result.
Specifically, the search result is identified according to the display requirement information, for example, if the color of the keyword in the display requirement information is set to be red, the keyword in the search result is marked with red.
Step S93: and displaying the identified retrieval result.
Specifically, the identified retrieval result is displayed to the user, so that the user can more visually see the retrieval result.
According to the steps, the retrieval result is identified according to the display requirement information, and the identified retrieval result is displayed, so that the retrieval result is more visual.
As an exemplary embodiment, after the step of storing the internet data in the index corresponding to the index name in the Elasticsearch cluster according to the index name in the step S4, steps S10-S11 are further included.
Step S10: and determining the index deletion time according to the service requirement.
In this embodiment, the service requirement includes a requirement for the retrieval time, and the index deletion time may be determined according to the requirement for the retrieval time. For example, if the retrieval time is about 5 years or about 10 years, data about five years ago or about ten years ago can be deleted to reduce the storage space.
Specifically, the index deletion time may be one day, one week, one month, or the like, and may be determined reasonably according to the service requirement.
Step S11: and deleting the index with the earlier index time according to the preset deletion period according to the index deletion time.
In this embodiment, the preset deletion period may be reasonably set according to actual needs, specifically, the preset deletion period may be one day, one week, one month, and the like, which is only schematically described in this embodiment and is not limited thereto.
For example, if the index deletion time is one week and the preset deletion period is one week, the acquired data of one week with the earliest index time is deleted every week.
In this embodiment, the information classification is actually a fixed dimension, a new index is generated every day as time passes, an index used in the next day is created at 1 point in the morning every day by using a timing script, and meanwhile, the integral deletion of an earlier index can be performed according to actual business requirements, so that the problem of performance degradation of an Elasticsearch cluster when conditional data deletion is performed is solved.
According to the steps, the indexes with earlier time are deleted regularly according to actual service requirements, so that the aims of managing and storing mass data are fulfilled.
As an exemplary embodiment, the step S3 further includes steps S12-S13 before the step of adapting by the information index adaptation module according to the information classification and collection time.
Step S12: indexes are built in advance in the Elasticsearch cluster.
In this embodiment, an index storage space is established for each index name in the Elasticsearch cluster, so that the collected data is stored according to the index name.
Step S13: and mapping the indexes with the information classification and acquisition time one by one.
In the present embodiment, a specific example of the mapping process is as follows.
For example
Figure BDA0002997941720000111
For example, if the information classification is weibo, the acquisition time is 20201219, then the name of the index is 20201219_ weibo; for another example, if the information classification is weixin, the index name is 20201219_ weixin.
The above steps, an index is established in the Elasticsearch cluster in advance, and information classification and acquisition time are mapped so as to store the acquired data into the Elasticsearch cluster.
In the embodiment, the index name of the Elasticissearch is generated according to the information classification and acquisition time of the Internet information, and the data is stored into the corresponding index during storage; during retrieval, retrieval of the designated index can be carried out according to information classification and acquisition time; when the index is deleted, the index of a certain specified classification and date can be completely deleted at one time, so that multi-dimensional full-text retrieval of information classification, acquisition time and the like is realized, and the massive data can be efficiently and quickly managed.
A detailed description is given below with a specific example.
a. Information acquisition system
The method mainly provides basic internet information for the embodiment, performs classification identification on the information, namely information classification, realizes interaction with the embodiment through a message queue, and comprises the steps of pushing the information to the message queue by an acquisition system and reading the information in the message queue by a processing system.
b. An information processing system (processing system) mainly comprises the following sub-modules
1) Information and index adapter module
This module is mainly responsible for the adaptation of information to the indexes in the Elasticsearch cluster. The module can map information classification and acquisition time with indexes in the Elasticissearch cluster, and the main purpose is to uniformly manage index names and play a decoupling role.
2) Elasticissearch cluster index management module
This module is mainly responsible for the management of the Elasticsearch cluster index. The module can call an information and index adapter module, and generate an index in the Elasticissearch cluster in advance, wherein the index name is as follows: the time of acquisition _ information class (acquisition time is accurate to days), for example, if the information class is weibo, the acquisition time is 20201219, the name of the index is 20201219_ weibo, if the information class is weixin, the name of the index is 20201219_ weixin, and so on. Meanwhile, the module can delete the index with earlier time regularly according to the actual service requirement so as to achieve the purpose of managing mass data.
3) Information warehousing management module
The module is mainly responsible for storing information into the elastic search cluster, and when the information processing system receives data pushed by the acquisition system, the information and index adapter module is called to adapt the information and the index, and then the data is stored into the corresponding index.
c. Information retrieval system (short for retrieval system)
After the system or the module finishes classifying and storing the information, the retrieval system provides a standard interface for the outside to serve each business system, the retrieval system calls the information and index adapter module according to the specific key words to be retrieved, the information classification and the time range, a specific retrieval statement is generated according to the query grammar of the Elasticissearch, and then the client of the Elasticissearch cluster is used for retrieval.
d. Information display system (business system for short)
The service system is a user-oriented system, which mainly provides some convenient interactive operations for users, the users can input search keywords, select information classification, time range or other search conditions, the service system sends a search request to the search system for information search, and finally the information is displayed to the users after keyword red marking and key attribute extraction are carried out in the service system.
The embodiment also provides a data retrieval system based on the elastic search, which is used for implementing the above embodiments and preferred embodiments, and the description of the system already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
The embodiment also provides an Elasticsearch-based data retrieval system, as shown in fig. 3, including:
the system comprises a first acquisition module 1, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring internet information which comprises a plurality of internet data;
the first processing module 2 is used for classifying and acquiring information corresponding to each piece of internet data obtained by the internet information through an acquisition system, wherein the information classification is used for representing the source position of the internet data;
the second processing module 3 is used for performing adaptation through the information index adaptation module according to the information classification and acquisition time, and determining an index name corresponding to each internet data in the Elasticissearch cluster;
the third processing module 4 is configured to store the internet data into an index corresponding to the index name in the Elasticsearch cluster according to the index name;
the second obtaining module 5 is used for obtaining information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range;
the fourth processing module 6 is configured to generate a retrieval statement for the information to be retrieved according to the query syntax of the Elasticsearch;
the fifth processing module 7 is configured to search in the information index adaptation module according to the search statement to obtain an index search range corresponding to the search statement;
and the sixth processing module 8 is configured to perform retrieval in the Elasticsearch cluster according to the index retrieval range to obtain a retrieval result.
Optionally, the method further comprises: and the seventh processing module is used for displaying the result of the retrieval result.
Optionally, the seventh processing module includes: the first acquisition unit is used for acquiring the display requirement information; the first processing unit is used for identifying the retrieval result according to the display requirement information to obtain the identified retrieval result; and displaying the identified retrieval result.
Optionally, the display requirement information includes a keyword color and preset attribute extraction information.
Optionally, the method further comprises: the eighth processing module is used for determining the index deletion time according to the service requirement; and the ninth processing module is used for deleting the indexes with the index time ahead according to the index deletion time and the preset deletion period.
Optionally, the method further comprises: a tenth processing module, configured to establish an index in the Elasticsearch cluster in advance; and the eleventh processing module is used for mapping the indexes with the information classification and acquisition time one by one.
The Elasticsearch based data retrieval system in this embodiment is presented in the form of functional units, where a unit refers to an ASIC circuit, a processor and a memory executing one or more software or fixed programs, and/or other devices that can provide the above-described functionality.
Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.
An embodiment of the present invention further provides an electronic device, as shown in fig. 4, the electronic device includes one or more processors 71 and a memory 72, where one processor 71 is taken as an example in fig. 4.
The controller may further include: an input device 73 and an output device 74.
The processor 71, the memory 72, the input device 73 and the output device 74 may be connected by a bus or other means, as exemplified by the bus connection in fig. 4.
The processor 71 may be a Central Processing Unit (CPU). The Processor 71 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof. A general purpose processor may be a microprocessor or any conventional processor or the like.
The memory 72 is a non-transitory computer readable storage medium, and can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the Elasticsearch-based data retrieval method in the embodiment of the present application. The processor 71 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 72, namely, implements the Elasticsearch-based data retrieval method of the above-described method embodiment.
The memory 72 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a processing device operated by the server, and the like. Further, the memory 72 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 72 may optionally include memory located remotely from the processor 71, which may be connected to a network connection device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 73 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing device of the server. The output device 74 may include a display device such as a display screen.
One or more modules are stored in the memory 72, which when executed by the one or more processors 71 perform the method shown in FIG. 1.
It will be understood by those skilled in the art that all or part of the processes in the method according to the above embodiments may be implemented by instructing relevant hardware through a computer program, and the executed program may be stored in a computer-readable storage medium, and when executed, may include the processes according to the embodiments of the data retrieval method based on the Elasticsearch. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (9)

1.一种基于Elasticsearch的数据检索方法,其特征在于,包括:1. a data retrieval method based on Elasticsearch, is characterized in that, comprises: 获取互联网信息,所述互联网信息包括多个互联网数据;Obtain Internet information, the Internet information includes a plurality of Internet data; 将互联网信息通过采集系统得到每一个互联网数据所对应的信息分类和采集时间,所述信息分类用于表征互联网数据的来源位置;Obtain the information classification and collection time corresponding to each Internet data through the Internet information through the collection system, where the information classification is used to characterize the source location of the Internet data; 根据信息分类和采集时间通过信息索引适配模块进行适配,确定每一个互联网数据在Elasticsearch集群中所对应的索引名称;According to the information classification and collection time, the information index adaptation module is adapted to determine the index name corresponding to each Internet data in the Elasticsearch cluster; 根据索引名称将互联网数据存储至Elasticsearch集群中索引名称所对应的索引;Store Internet data in the index corresponding to the index name in the Elasticsearch cluster according to the index name; 获取待检索信息,所述待检索信息包括检索关键词、检索信息的信息分类和检索时间范围;Obtaining information to be retrieved, the information to be retrieved includes retrieval keywords, information classification of the retrieval information, and retrieval time range; 将待检索信息依照Elasticsearch的查询语法生成检索语句;Generate a retrieval statement from the information to be retrieved according to the query syntax of Elasticsearch; 根据检索语句在信息索引适配模块中进行查找,得到检索语句所对应的索引检索范围;Search in the information index adaptation module according to the retrieval sentence, and obtain the index retrieval range corresponding to the retrieval sentence; 根据索引检索范围在Elasticsearch集群中进行检索,得到检索结果。Perform retrieval in the Elasticsearch cluster according to the index retrieval range to obtain retrieval results. 2.根据权利要求1所述的基于Elasticsearch的数据检索方法,其特征在于,根据索引检索范围在Elasticsearch集群中进行检索,得到检索结果的步骤之后,还包括:2. the data retrieval method based on Elasticsearch according to claim 1, is characterized in that, according to index retrieval scope, carries out retrieval in Elasticsearch cluster, after the step of obtaining retrieval result, also comprises: 对检索结果进行结果展示。Display the search results. 3.根据权利要求2所述的基于Elasticsearch的数据检索方法,其特征在于,对检索结果进行结果展示的步骤中,包括:3. the data retrieval method based on Elasticsearch according to claim 2, is characterized in that, in the step that retrieval result is carried out result display, comprising: 获取展示需求信息;Obtain information on display requirements; 根据展示需求信息对检索结果进行标识,得到标识后的检索结果;Mark the retrieval results according to the display demand information, and obtain the marked retrieval results; 将标识后的检索结果进行展示。Display the marked search results. 4.根据权利要求1所述的基于Elasticsearch的数据检索方法,其特征在于,4. the data retrieval method based on Elasticsearch according to claim 1, is characterized in that, 所述展示需求信息包括关键词颜色,预设属性提取信息。The display requirement information includes keyword color and preset attribute extraction information. 5.根据权利要求1-4中任一所述的基于Elasticsearch的数据检索方法,其特征在于,根据索引名称将互联网数据存储至Elasticsearch集群中索引名称所对应的索引的步骤之后,还包括:5. the data retrieval method based on Elasticsearch according to any one of the claims 1-4, is characterized in that, after the step of storing Internet data to the index corresponding to the index name in the Elasticsearch cluster according to the index name, also comprises: 根据业务需求确定索引删除时间;Determine the index deletion time according to business needs; 根据索引删除时间按照预设删除周期删除索引时间靠前的索引。According to the index deletion time, the index with the earlier index time is deleted according to the preset deletion cycle. 6.根据权利要求1-4中任一所述的基于Elasticsearch的数据检索方法,其特征在于,根据信息分类和采集时间通过信息索引适配模块进行适配的步骤之前,还包括:6. the data retrieval method based on Elasticsearch according to any one of the claims 1-4, is characterized in that, before the step that is adapted by information index adaptation module according to information classification and collection time, also comprises: 预先在Elasticsearch集群中建立索引;Build indexes in the Elasticsearch cluster in advance; 将索引与信息分类和采集时间进行一一映射。One-to-one mapping of indexes to information categories and collection times. 7.一种基于Elasticsearch的数据检索系统,其特征在于,包括:7. a data retrieval system based on Elasticsearch, is characterized in that, comprises: 第一获取模块,用于获取互联网信息,所述互联网信息包括多个互联网数据;a first acquisition module, configured to acquire Internet information, where the Internet information includes a plurality of Internet data; 第一处理模块,用于将互联网信息通过采集系统得到每一个互联网数据所对应的信息分类和采集时间,所述信息分类用于表征互联网数据的来源位置;The first processing module is used to obtain the information classification and collection time corresponding to each Internet data through the collection system of Internet information, and the information classification is used to represent the source position of the Internet data; 第二处理模块,用于根据信息分类和采集时间通过信息索引适配模块进行适配,确定每一个互联网数据在Elasticsearch集群中所对应的索引名称;The second processing module is configured to perform adaptation through the information index adaptation module according to the information classification and collection time, and determine the index name corresponding to each Internet data in the Elasticsearch cluster; 第三处理模块,用于根据索引名称将互联网数据存储至Elasticsearch集群中索引名称所对应的索引;The third processing module is used to store the Internet data in the index corresponding to the index name in the Elasticsearch cluster according to the index name; 第二获取模块,用于获取待检索信息,所述待检索信息包括检索关键词、检索信息的信息分类和检索时间范围;a second acquisition module, configured to acquire information to be retrieved, where the information to be retrieved includes retrieval keywords, information classification of the retrieval information, and retrieval time range; 第四处理模块,用于将待检索信息依照Elasticsearch的查询语法生成检索语句;The fourth processing module is used to generate a retrieval statement according to the query syntax of Elasticsearch for the information to be retrieved; 第五处理模块,用于根据检索语句在信息索引适配模块中进行查找,得到检索语句所对应的索引检索范围;The fifth processing module is used for searching in the information index adaptation module according to the retrieval sentence to obtain the index retrieval range corresponding to the retrieval sentence; 第六处理模块,用于根据索引检索范围在Elasticsearch集群中进行检索,得到检索结果。The sixth processing module is used to perform retrieval in the Elasticsearch cluster according to the index retrieval range to obtain retrieval results. 8.一种电子设备,其特征在于,包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的计算机程序,所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器执行权利要求1-6任意一项所述的基于Elasticsearch的数据检索方法。8. An electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores a computer program executable by the at least one processor , the computer program is executed by the at least one processor, so that the at least one processor executes the Elasticsearch-based data retrieval method according to any one of claims 1-6. 9.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于使所述计算机执行权利要求1-6任意一项所述的基于Elasticsearch的数据检索方法。9. A computer-readable storage medium, characterized in that, the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the Elasticsearch-based elasticsearch according to any one of claims 1-6. data retrieval method.
CN202110336591.XA 2021-03-29 2021-03-29 Elasticissearch-based data retrieval method and system, electronic device and storage medium Pending CN113051460A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110336591.XA CN113051460A (en) 2021-03-29 2021-03-29 Elasticissearch-based data retrieval method and system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110336591.XA CN113051460A (en) 2021-03-29 2021-03-29 Elasticissearch-based data retrieval method and system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN113051460A true CN113051460A (en) 2021-06-29

Family

ID=76516243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110336591.XA Pending CN113051460A (en) 2021-03-29 2021-03-29 Elasticissearch-based data retrieval method and system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113051460A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486138A (en) * 2021-07-20 2021-10-08 北京明略软件系统有限公司 Elasticissearch-based retrieval method, system and computer-readable storage medium
CN114090505A (en) * 2021-11-23 2022-02-25 成都深思科技有限公司 Intelligent resource scheduling and efficient concurrent data classification method
CN114153845A (en) * 2021-11-24 2022-03-08 北京皮尔布莱尼软件有限公司 Data storage and reading method, device, equipment and medium
CN114817644A (en) * 2022-04-21 2022-07-29 山东省计算中心(国家超级计算济南中心) A method and system for classification and intelligent search of government information resources based on Elasticsearch
CN115292370A (en) * 2022-08-15 2022-11-04 招银云创信息技术有限公司 Business document data processing method, device and medium
CN116401259A (en) * 2023-06-08 2023-07-07 北京江融信科技有限公司 Automatic pre-creation index method and system for elastic search database
CN117891897A (en) * 2023-12-23 2024-04-16 曙光云计算集团股份有限公司 Retrieval method, device, computer equipment and storage medium
CN118467860A (en) * 2024-07-15 2024-08-09 北斗伏羲信息技术有限公司 Spatiotemporal data engine and grid data access and retrieval method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960037A (en) * 2017-03-22 2017-07-18 河海大学 A kind of distributed index the resources integration and share method across intranet and extranet
CN110222054A (en) * 2019-05-22 2019-09-10 福建大屏网络科技有限公司 A kind of method, apparatus, terminal device and storage medium improving retrieval rate
CN111026710A (en) * 2019-12-11 2020-04-17 华南师范大学 Data set retrieval method and system
CN111339244A (en) * 2020-02-29 2020-06-26 山东浪潮通软信息科技有限公司 Tax policy and regulation inquiry method, computer equipment and storage medium
CN111563095A (en) * 2020-04-30 2020-08-21 上海新炬网络信息技术股份有限公司 Data retrieval device based on HBase

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960037A (en) * 2017-03-22 2017-07-18 河海大学 A kind of distributed index the resources integration and share method across intranet and extranet
CN110222054A (en) * 2019-05-22 2019-09-10 福建大屏网络科技有限公司 A kind of method, apparatus, terminal device and storage medium improving retrieval rate
CN111026710A (en) * 2019-12-11 2020-04-17 华南师范大学 Data set retrieval method and system
CN111339244A (en) * 2020-02-29 2020-06-26 山东浪潮通软信息科技有限公司 Tax policy and regulation inquiry method, computer equipment and storage medium
CN111563095A (en) * 2020-04-30 2020-08-21 上海新炬网络信息技术股份有限公司 Data retrieval device based on HBase

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486138A (en) * 2021-07-20 2021-10-08 北京明略软件系统有限公司 Elasticissearch-based retrieval method, system and computer-readable storage medium
CN114090505A (en) * 2021-11-23 2022-02-25 成都深思科技有限公司 Intelligent resource scheduling and efficient concurrent data classification method
CN114153845A (en) * 2021-11-24 2022-03-08 北京皮尔布莱尼软件有限公司 Data storage and reading method, device, equipment and medium
CN114817644A (en) * 2022-04-21 2022-07-29 山东省计算中心(国家超级计算济南中心) A method and system for classification and intelligent search of government information resources based on Elasticsearch
CN115292370A (en) * 2022-08-15 2022-11-04 招银云创信息技术有限公司 Business document data processing method, device and medium
CN116401259A (en) * 2023-06-08 2023-07-07 北京江融信科技有限公司 Automatic pre-creation index method and system for elastic search database
CN116401259B (en) * 2023-06-08 2023-08-22 北京江融信科技有限公司 Automatic pre-creation index method and system for elastic search database
CN117891897A (en) * 2023-12-23 2024-04-16 曙光云计算集团股份有限公司 Retrieval method, device, computer equipment and storage medium
CN118467860A (en) * 2024-07-15 2024-08-09 北斗伏羲信息技术有限公司 Spatiotemporal data engine and grid data access and retrieval method

Similar Documents

Publication Publication Date Title
CN113051460A (en) Elasticissearch-based data retrieval method and system, electronic device and storage medium
CN110362544B (en) Log processing system, log processing method, terminal and storage medium
US11410087B2 (en) Dynamic query response with metadata
CN111259006A (en) A general distributed heterogeneous data integration physical aggregation, organization, publishing and service method and system
CN113010476B (en) Metadata searching method, device, equipment and computer readable storage medium
CN106982150B (en) A Hadoop-based mobile internet user behavior analysis method
WO2017166644A1 (en) Data acquisition method and system
CN108509658A (en) An XML file parsing method and device
CN103955529A (en) Internet information searching and aggregating presentation method
CN112148701A (en) Method and device for document retrieval
US20150213066A1 (en) System and method for creating data models from complex raw log files
CN105095211A (en) Acquisition method and device for multimedia data
WO2015096609A1 (en) Method and system for creating inverted index file of video resource
CN103399855B (en) Behavior intention determining method and device based on multiple data sources
CN101477527A (en) Multimedia resource retrieval method and apparatus
CN105183916A (en) Device and method for managing unstructured data
CN106528688B (en) Analysis evidence obtaining method aiming at Twitter
CN112559913B (en) Data processing method, device, computing equipment and readable storage medium
CN108255963A (en) A control method and device for Internet-based news information retrieval
US10491606B2 (en) Method and apparatus for providing website authentication data for search engine
CN112307318A (en) Content publishing method, system and device
CN111680072A (en) Social information data-based partitioning system and method
CN107291875B (en) Method and system for metadata organization and management based on metadata graph
CN110263082B (en) Data distribution analysis method and device of database, electronic equipment and storage medium
CN115630170A (en) Document recommendation method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210629

RJ01 Rejection of invention patent application after publication