CN111797096A - Data indexing method and device based on ElasticSearch, computer equipment and storage medium - Google Patents
Data indexing method and device based on ElasticSearch, computer equipment and storage medium Download PDFInfo
- Publication number
- CN111797096A CN111797096A CN202010610262.5A CN202010610262A CN111797096A CN 111797096 A CN111797096 A CN 111797096A CN 202010610262 A CN202010610262 A CN 202010610262A CN 111797096 A CN111797096 A CN 111797096A
- Authority
- CN
- China
- Prior art keywords
- data
- index
- indexing
- main
- fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a data indexing method, a data indexing device, computer equipment and a storage medium based on an elastic search, wherein the method comprises the following steps: setting a main fragment based on the distributed search engine ElasticSearch, and writing data into the main fragment; respectively creating an index task aiming at the data of each main fragment, and importing the corresponding index fragment into the main fragment according to the type of the index task to create an index; if a data query request instruction is detected, index query is carried out on the index fragments through each node server to obtain the index data volume of each node server; combining the index data volume of each node server to obtain the total index data volume; and if the total index data quantity is detected to exceed a preset value, indexing in a preset mode. The application also relates to a blockchain technique, wherein index data is stored in the blockchain. The data indexing method and device improve data indexing efficiency aiming at mass data of more than hundred T levels.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to an elastic search based data indexing method, apparatus, computer device, and storage medium.
Background
In the current age of big data, the data warehouse for storing massive data is much more, and a distributed search Engine (ES) belongs to one of them. The distributed search engine ElasticSearch is an open-source, distributed and Restful search server constructed based on Lucene and is generally used in cloud computing. It can conveniently make a large amount of data have the capability of searching, analyzing and exploring. The horizontal flexibility of the distributed search engine ElasticSearch is fully utilized, so that the data can become more valuable in a production environment.
For the indexing scheme of mass data of more than hundred T level in the distributed search engine ElasticSearch, the indexing speed is greatly reduced if no corresponding indexing management scheme exists due to the huge data volume. In the prior art, a distributed search engine ElasticSearch is subjected to distributed search, and search result data are combined to obtain final data, wherein the indexing efficiency is very low before the indexing scheme faces mass data of more than hundred T levels, and the indexing requirement is difficult to meet; a scheme capable of improving data indexing efficiency for mass data of more than hundred T levels is needed.
Disclosure of Invention
The embodiment of the application aims to provide a data indexing method based on elastic search, and aims to improve data indexing efficiency for mass data of more than hundred T levels.
In order to solve the above technical problem, an embodiment of the present application provides a data indexing method based on an elastic search, including:
setting a main fragment based on an elastic search of a distributed search engine, and writing data into the main fragment;
respectively creating an index task aiming at the data of each main fragment, and importing the corresponding index fragment into the main fragment according to the type of the index task to create an index;
if a data query request instruction is detected, performing index query on the index fragments through each node server to obtain the index data volume of each node server;
merging the index data volume of each node server to obtain the total index data volume;
and if the total index data quantity is detected to exceed a preset value, indexing in a preset mode.
Further, before setting a main partition based on the distributed search engine elastic search and writing data to the main partition, the method further includes:
and setting a number for each piece of data, and writing the number into the distributed search engine ElasticSearch.
Further, the setting a main slice based on the distributed search engine elastic search, and writing data to the main slice includes:
extracting data from the distributed search engine ElasticSearch, and establishing a reading task according to the extracted data;
and writing the data into the main fragment according to the reading task.
Further, after setting a main shard based on the distributed search engine elastic search and writing data to the main shard, the method further includes:
when the fact that the data of the main fragments are completely written is detected, a copy is created according to the number of the main fragments, wherein the number of the copy is set according to the number of the main fragments and a preset proportion.
Further, the creating an index task for each piece of data of the main slice, and directing the main slice to import into a corresponding index slice according to the type of the index task, where creating an index includes:
creating a type of index task by using the same type of main fragment data through a distributed system infrastructure hadoop;
and aiming at the index tasks of the same type, the main fragment is imported into the corresponding index fragment to create an index.
Further, if it is detected that the total index data amount exceeds a preset value, indexing in a preset manner includes:
and if the total index data quantity is detected to exceed a preset value, adding a new node server for distributed index.
Further, if it is detected that the total index data amount exceeds a preset value, indexing in a preset manner further includes:
and if the total index data volume is detected to exceed a preset value, indexing in batches through the node server.
In order to solve the technical problems, the invention adopts a technical scheme that: provided is an elastic search based data indexing device, including:
the data writing module is used for setting a main fragment based on the distributed search engine ElasticSearch and writing data into the main fragment;
the index creating module is used for respectively creating an index task for the data of each main fragment, and importing the corresponding index fragment into the main fragment according to the type of the index task to create an index;
the index query module is used for performing index query on the index fragments through each node server to obtain the index data volume of each node server if a data query request instruction is detected;
the data merging module is used for merging the index data volume of each node server to obtain the total index data volume;
and the fragment indexing module is used for indexing in a preset mode if the total index data volume is detected to exceed a preset value.
In order to solve the technical problems, the invention adopts a technical scheme that: a computer device is provided that includes, one or more processors; a memory for storing one or more programs for causing the one or more processors to implement the ElasticSearch-based data indexing scheme as described in any one of the above.
In order to solve the technical problems, the invention adopts a technical scheme that: a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an ElasticSearch-based data indexing scheme as described in any of the above.
In the scheme, main fragment data is set through a distributed search engine, data is written into main fragments, then an index task is created for the data of each main fragment, and indexing is performed on different node servers in a distributed indexing mode, so that distributed indexing is facilitated, over-concentration of indexes is avoided, data loss is prevented, and indexing efficiency is effectively improved; if the total index data amount exceeds the preset value, indexing is carried out in batches through the fragment node servers, and therefore sufficient memory is provided for data indexing when mass data of more than hundred T levels are aimed at, and the efficiency of data indexing is improved.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is a schematic diagram of an application environment of an ElasticSearch-based data indexing method provided in an embodiment of the present application;
FIG. 2 is a flowchart of an implementation of the data indexing method based on ElasticSearch according to the embodiment of the present application;
fig. 3 is a flowchart of an implementation of step S1 in the data indexing method based on ElasticSearch according to the embodiment of the present application;
fig. 4 is a flowchart of an implementation of step S2 in the data indexing method based on ElasticSearch according to the present application;
FIG. 5 is a schematic diagram of an elastic search based data indexing apparatus provided in an embodiment of the present application;
fig. 6 is a schematic diagram of a computer device provided in an embodiment of the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
Referring to fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a web browser application, a search-type application, an instant messaging tool, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the data indexing method based on the ElasticSearch provided by the embodiment of the present application is generally executed by a server, and accordingly, a data indexing apparatus based on the ElasticSearch is generally disposed in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to fig. 2, fig. 2 shows an embodiment of the data indexing method based on the ElasticSearch.
It should be noted that, if the result is substantially the same, the method of the present invention is not limited to the flow sequence shown in fig. 2, and the method includes the following steps:
s1: setting a main fragment based on the distributed search engine ElasticSearch, and writing data into the main fragment.
Specifically, the scheme is directed to an indexing scheme of mass data with a level of over hundred T in a distributed search engine-based ElasticSearch. When the data volume reaches more than 100T, because the data volume is huge, if no corresponding index management scheme exists, the indexing speed is greatly reduced, and the indexing efficiency is further influenced.
Further, the number of master partitions (per index) is limited to no more than 7, of which no less than 3 is optimal. By limiting the number of the main fragments, each fragment can be favorably and fully arranged in a plurality of ElasticSearch node servers, and data loss is prevented; therefore, the fragments can not cause too much resource consumption in metadata management due to too much fragments, and burden is added to the whole cluster.
The distributed search Engine (ES) is an open-source, distributed, Restful search server constructed based on Lucene. The distributed high-expansion high-real-time search and data analysis engine is a distributed high-expansion high-real-time search and data analysis engine and can conveniently enable a large amount of data to have the capabilities of searching, analyzing and exploring. By taking full advantage of the horizontal scalability of the ElasticSearch, data can be made more valuable in a production environment. In the invention, when the data volume reaches the level above 100T, the data index is carried out based on the distributed search engine ElasticSearch by building a related index.
S2: and respectively creating an index task aiming at the data of each main fragment, and importing the corresponding index fragment into the main fragment according to the type of the index task to create an index.
Specifically, index tasks are respectively created for data of each main fragment, different index tasks are determined according to data types, and the index tasks of the same data type are introduced into corresponding index fragments, so that indexes are established.
The data type refers to dividing data into different service data, and the data type for processing the same service data is the same data type. Different types of service data, such as folder data, qualification picture information, picture data and the like, can be processed by adopting different indexing tasks, such as a folder information table, a qualification picture table, a picture information table and the like. And an index creating task is established aiming at the same type of main fragment data by adopting a distributed task based on Hadoop, different types of service data tasks are processed in parallel, and the tasks are finally classified into the same index fragment after the processing is finished, wherein each main fragment corresponds to one index fragment of the search cluster, and the indexing speed is further improved.
The index sharding divides a complete index into a plurality of shards for the distributed search engine ElasticSearch. The advantage of creating index shards is that a large index can be split into multiple pieces, distributed to different node servers, to form a distributed search. The number of index shards can only be specified before index creation and cannot be changed after index creation.
Further, when a data index is created, each batch is limited to not more than 5000 pieces of data, and due to the fact that data are increased upwards, a certain probability can cause that the data cannot be processed in time inside the distributed search engine elastic search, so that data loss is caused.
S3: and if the data query request instruction is detected, performing index query on the index fragment through each node server to obtain the index data volume of each node server.
Specifically, with the index created in step S2, if a data query request instruction is detected, each node server can obtain the index data amount by creating each node server to perform index query on the index fragments.
The node server is a node of the distributed search engine elastic search, each running instance is called a node, and each running instance can be on the same machine or different machines. The runtime instance is a server process.
Further, in this embodiment, the number of distributed node servers is not more than 50, and if the data amount exceeds the data amount and must be added, the distributed node servers are divided into a plurality of small clusters, and finally the results are subjected to a central aggregation, so that if the distributed node servers of the distributed search engine ElasticSearch are too many, the internal resource allocation and metadata management of the distributed search engine ElasticSearch will consume a large cost, thereby increasing the server load.
Further, the number of the parallel threads allowed by the system is increased to more than 6 million, which is to allow enough threads to exist in the system, so that the number of the data index threads of the distributed search engine ElasticSearch is not limited too much, and if the number of the threads of linux is too much, the performance of the system is greatly reduced.
Further, by limiting the exchange of system memory of each node server, most of the memory is used for indexing and querying the distributed search engine ElasticSearch, and a part of the memory is allowed to be used for other management of the system. Therefore, the distributed search engine ElasticSearch can obtain enough memories for use, the memory occupation by other processes is reduced when the distributed search engine ElasticSearch indexes, and the indexing efficiency of the distributed search engine ElasticSearch is greatly improved.
S4: and combining the index data volume of each node server to obtain the total index data volume.
Further, the number x of indexes and the size y of the total amount of each index are related as follows,
when the data amount is more than 100T, the index data is controlled by 5 x-y. Because under the invention, the data volume of the distributed search engine ElasticSearch is not too much, the query is too slow, and the index is updated slowly when the data index (data write) is obtained; and the storage space is not wasted due to too small index data amount, and the index is not too much and is difficult to manage.
S5: and if the total index data quantity is detected to exceed a preset value, indexing in a preset mode.
Specifically, if the total index data amount is too large, the server load is easily caused to be too large, so when the total index data amount exceeds a preset value in detection, the indexing is performed in a preset manner.
Wherein, the preset mode includes: adding new node servers, batching through node servers, and the like.
The preset value is set according to actual needs. The preset numerical values are preferably in proportion: the size of the total index data amount does not exceed 3/5 of the total memory, and the size of the total index data amount in each query does not exceed 1/5 of the total memory; when the detection exceeds the preset value, batch indexing and batch query are required or the data volume of the query is limited. The distributed search engine elastic search is a system consuming memory, so that the purpose is to reserve enough memory for data indexing (data writing), and the problem that the data indexing is too slow because all the memory is used for query, and the problem that the query performance is too slow because the memory occupied by the index is too much is solved.
For example, if the total index data amount is 12G and the total memory is 15G, the preset value may be set to 9G, so that the total index data amount cannot exceed 9G, and if 12G of the total index data amount exceeds 9G of the preset value, the batch index may be performed. The index is set to be 6 batches, so that the index data volume of each batch is 2G, the problem that the index speed is low when the index volume is too large is solved, and the purpose of improving the index efficiency is achieved.
Further, if it is detected that the total index data amount does not exceed the preset value, the current index is performed through the current node server.
In the embodiment, main fragment data is set through a distributed search engine, data is written into the main fragments, then an index task is created for the data of each main fragment, and indexes are performed on different node servers in a distributed index mode, so that distributed indexes are facilitated, the phenomenon that the indexes are too concentrated is avoided, data loss is prevented, and the index efficiency is effectively improved; if the total index data volume exceeds the preset proportion of the total memory, indexing is carried out in batches through the fragment node servers, so that sufficient memory is provided for data indexing when mass data of more than hundred T levels are aimed at, and the efficiency of data indexing is improved.
Before step S1, the method for indexing data based on ElasticSearch further includes:
and setting a number for each piece of data, and writing the number into the distributed search engine ElasticSearch.
Specifically, a corresponding number is set for the data, the data is written into the distributed search engine elastic search, and the corresponding data can be queried through the number.
Furthermore, the data is divided into two parts, namely common data and less data, each part is provided with a common number, the frequently inquired data is written into a distributed search engine ElasticSearch, the less inquired data is written into a database hbase, and when the less inquired data needs to be inquired, the numbers of the data are inquired in the distributed search engine ElasticSearch firstly, and then the inquiry is carried out through the database hbase. The method has the advantages that the data stored in the distributed search engine ElasticSearch is reduced, resources are saved, and the indexing speed is improved.
Furthermore, data preheating can be carried out on the hotspot data during query, and meanwhile, the hotspot data can be combined with the database hbase for use, and the index can be carried out according to the query condition and the serial number.
The data is preheated by retrieving data at night or in a short access time through a timing task, so that the data is loaded into a filesys cache memory, and the data can be directly inquired from the memory when being inquired next time, and the inquiry speed is increased.
In the embodiment, a number is set for each piece of data, and the number is written into the distributed search engine elastic search, so that corresponding data can be queried by inputting the corresponding number, and the data indexing efficiency is improved.
Referring to fig. 3, fig. 3 shows an embodiment of step S1, and in step S1, a specific implementation process of setting a main segment and writing data into the main segment based on an elastic search of a distributed search engine is described as follows:
s11: and extracting data from the distributed search engine ElasticSearch, and establishing a reading task according to the extracted data.
Specifically, before the step is performed, the data is numbered correspondingly and written into the distributed search engine elastic search, so that the corresponding data is extracted from the distributed search engine elastic search, and a reading task is established, which is convenient for writing the data into the main partition in the subsequent steps.
The reading task is to set a data writing program step according to data characteristics, so that the purpose of writing data into the fragments is achieved.
S12: according to the read task, data is written to the main slice.
Specifically, the data is written into the main fragment according to the preset data.
In the implementation, data is extracted from the distributed search engine elastic search, a reading task is established according to the extracted data, and the data is written into the main fragment according to the reading task, so that the data is written into the main fragment, the indexing of the data is facilitated, and the efficiency of data indexing is improved.
After step S1, the method for indexing data based on ElasticSearch further includes:
when the fact that the data of the main fragments are completely written is detected, a copy is created according to the number of the main fragments, wherein the number of the copy is set according to the number of the main fragments and a preset proportion.
Specifically, the slave distributed search engine ElasticSearch has a fragmentation function, the slave distributed search engine ElasticSearch is used for setting the master fragment and writing data into the master fragment, and when the master fragment data is completely written, a copy is created. The purpose of this is to set a copy at the beginning, and when each data index is finished, the next data needs to be started after the data is written into the copy, which increases the time of indexing, and causes the data index to become very slow, thereby affecting the efficiency of data indexing.
Furthermore, by setting the relationship between the main fragment and the copy, the problem that the main fragment has insufficient memory due to excessive memory occupied by the copy, and further the query is slowed down, and the index is slowed down due to too few copies is avoided.
It should be noted that the preset ratio of the number of the main fragments to the number of the copies is set according to the actual situation. The preset ratio of the number of the main fragments to the number of the copies is 2:1, so that the advantages that the memory is occupied too much due to too many copies, and the query is slow due to too few copies are avoided.
In the embodiment, when it is detected that the data of the main fragment is completely written, the copy is created according to the preset proportion of the number of the main fragment and the copy, so that the data writing speed is increased, the memory occupation is reduced, and the data indexing efficiency is improved.
Referring to fig. 4, fig. 4 shows a specific implementation manner of step S2, in step S2, an index task is created for each piece of data of a main slice, and the main slice is guided to import a corresponding index slice according to the type of the index task, so as to implement a specific implementation process of creating an index, which is described as follows:
s21: aiming at the main fragment data of the same type, a type of index task is created through a distributed system infrastructure hadoop.
Specifically, index tasks are created according to the types of the main fragment data, index classification is achieved, and a type of index task is created through the distributed system infrastructure hadoop.
The distributed system infrastructure hadoop is a distributed system infrastructure developed by the Apache foundation. A user can develop a distributed program without knowing the distributed underlying details. The power of the cluster is fully utilized to carry out high-speed operation and storage. In the invention, one type of index task is created through the hadoop distributed system infrastructure, and a basis is provided for subsequently creating indexes.
S22: and aiming at the index tasks of the same type, leading the main fragment into the corresponding index fragment and creating the index.
Specifically, for the same type of index task, the main partition is imported into the corresponding index partition, so as to achieve the purpose of quickly establishing the index.
In this embodiment, one type of index task is created through hadoop of a distributed system infrastructure for the same type of main fragment data, and the main fragment is imported into the corresponding index fragment for the same type of index task to create an index, thereby achieving the purpose of quickly creating an index and improving the efficiency of data indexing.
Further, step S5 further includes:
and if the total index data quantity is detected to exceed a preset value, adding a new point node server for distributed index.
Further, the number of node server data is not more than 50. If the total data amount exceeds and must be added, the result is finally subjected to a central aggregation by being divided into a plurality of small clusters. The purpose of this is that if the node servers of the distributed search engine ElasticSearch are too many, the internal resource allocation and metadata management of the distributed search engine ElasticSearch will consume a large cost, so that the indexing speed is slow.
In this embodiment, if it is detected that the total index data amount exceeds the preset threshold, a new node server is added to perform distributed indexing, and an indexing approach is added, so that the efficiency of data indexing is improved.
Further, step S5 further includes:
and if the total index data volume is detected to exceed a preset value, indexing in batches through the node server.
Specifically, the node servers are used for indexing in batches, so that the load of the server for indexing at the current time is reduced, and the data indexing efficiency is improved.
Further, after step S4, the method for indexing data based on ElasticSearch further includes:
and if the total index data amount is detected to exceed a preset value, clearing the historical data.
Specifically, if the total index data amount is detected to exceed a preset value, historical data is cleared, so that the index speed and query speed of the distributed search engine ElasticSearch can reach the optimal state. When the total index data volume exceeds the preset threshold, the index and query performance of the distributed search engine ElasticSearch is affected due to the fact that the number of the added machines is too large, the coordination time in the machine cluster is too long, and the occupation of the memory and the cpu is too much, so that if the total index data volume is detected to exceed the preset threshold, historical data are cleared.
The historical data is an index record and a data query record which are created according to a distributed search engine ElasticSearch.
In this embodiment, if it is detected that the total index data amount exceeds the preset threshold, the historical data is cleared, so that the memory is reduced, and the purpose of improving the efficiency of data indexing is achieved.
Further, the index data is stored in a block chain.
It is emphasized that, in order to further ensure the privacy and security of the index data, the index data may also be stored in the nodes of the blockchain.
The blockchain referred to herein is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-only Memory (ROM), or a Random Access Memory (RAM).
Referring to fig. 5, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an elastic search based data indexing apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.
As shown in fig. 5, the data indexing apparatus based on the elastic search of the present embodiment includes: a data writing module 61, an index creating module 62, an index querying module 63, a merged data module 64, and a fragment indexing module 65, wherein:
the data writing module 61 is used for setting the main fragment based on the distributed search engine elastic search and writing data into the main fragment;
an index creating module 62, configured to create index tasks for the data of each main slice, and introduce corresponding index slices into the main slices according to the types of the index tasks to create indexes;
an index query module 63, configured to perform index query on the index fragments through each node server if a data query request instruction is detected, to obtain an index data amount of each node server;
a merge data module 64, configured to merge the index data volumes of each node server to obtain a total index data volume;
and the fragment indexing module 65 is configured to index in a preset manner if it is detected that the total index data amount exceeds a preset value.
Further, the data indexing device based on the elastic search further comprises:
and the ID setting module is used for setting a number for each piece of data and writing the number into the distributed search engine ElasticSearch.
Further, the data writing module 61 includes:
the reading task creating unit is used for extracting data from a distributed search engine ElasticSearch and creating a reading task according to the extracted data;
and the reading task reading unit is used for writing the data into the main fragment according to the reading task.
Further, the data indexing device based on the elastic search further comprises:
and the copy creating module is used for creating copies according to the number of the main fragments when the completion of the writing of the data of the main fragments is detected, wherein the number of the copies is set according to the number of the main fragments and a preset proportion.
Further, the index creation module 62 includes:
the index task creating unit is used for forming the main fragment data of the same type into a type of created index task through a distributed system infrastructure hadoop;
and the index fragment importing unit is used for importing the main fragment into the corresponding index fragment according to the same type of index task and creating an index.
Further, the data indexing device based on the elastic search further comprises:
and the distributed index module is used for adding a new node server to perform distributed index if the total index data volume is detected to exceed a preset value.
Further, the data indexing device based on the elastic search further comprises:
and the batch indexing module is used for indexing in batches through the node server if the total index data volume is detected to exceed a preset numerical value.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 6, fig. 6 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 7 comprises a memory 71, a processor 72, a network interface 73, communicatively connected to each other by a system bus. It is noted that only a computer device 7 having three components memory 71, processor 72, network interface 73 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 71 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 71 may be an internal storage unit of the computer device 7, such as a hard disk or a memory of the computer device 7. In other embodiments, the memory 71 may also be an external storage device of the computer device 7, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device 7. Of course, the memory 71 may also comprise both an internal storage unit of the computer device 7 and an external storage device thereof. In this embodiment, the memory 71 is generally used for storing an operating system installed in the computer device 7 and various types of application software, such as program codes of the data indexing method based on the ElasticSearch. Further, the memory 71 may also be used to temporarily store various types of data that have been output or are to be output.
The network interface 73 may comprise a wireless network interface or a wired network interface, and the network interface 73 is typically used to establish a communication connection between the computer device 7 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing a server maintenance program, where the server maintenance program is executable by at least one processor to cause the at least one processor to perform the steps of an ElasticSearch-based data indexing method as described above.
The blockchain referred to herein is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method of the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.
Claims (10)
1. A data indexing method based on an elastic search is characterized by comprising the following steps:
setting a main fragment based on an elastic search of a distributed search engine, and writing data into the main fragment;
respectively creating an index task aiming at the data of each main fragment, and importing the corresponding index fragment into the main fragment according to the type of the index task to create an index;
if a data query request instruction is detected, performing index query on the index fragments through each node server to obtain the index data volume of each node server;
merging the index data volume of each node server to obtain the total index data volume;
and if the total index data quantity is detected to exceed a preset value, indexing in a preset mode.
2. The method for indexing ElasticSearch-based data according to claim 1, wherein before setting a main slice and writing data to the main slice based on the distributed search engine ElasticSearch, the method further comprises:
and setting a number for each piece of data, and writing the number into the distributed search engine ElasticSearch.
3. The method for indexing ElasticSearch-based data according to claim 1, wherein the setting a main slice and writing data to the main slice based on the distributed search engine ElasticSearch comprises:
extracting data from the distributed search engine ElasticSearch, and establishing a reading task according to the extracted data;
and writing the data into the main fragment according to the reading task.
4. The method for indexing data based on an elastic search according to claim 1, wherein after setting a main shard and writing data to the main shard based on the distributed search engine elastic search, the method further comprises:
when the fact that the data of the main fragments are completely written is detected, a copy is created according to the number of the main fragments, wherein the number of the copy is set according to the number of the main fragments and a preset proportion.
5. The elastic search-based data indexing method according to claim 1, wherein the creating an index task separately for each piece of data of the main slice, and directing the main slice to import into a corresponding index slice according to a type of the index task, and creating an index comprises:
creating a type of index task by using the same type of main fragment data through a distributed system infrastructure hadoop;
and aiming at the index tasks of the same type, the main fragment is imported into the corresponding index fragment to create an index.
6. The method for indexing data based on an ElasticSearch according to any of claims 1 to 5, wherein the indexing in a preset manner if the total index data amount is detected to exceed a preset value comprises:
and if the total index data quantity is detected to exceed a preset value, adding a new node server for distributed index.
7. The method for indexing data based on an ElasticSearch according to any of claims 1 to 5, wherein the indexing in a preset manner if it is detected that the total index data amount exceeds a preset value further comprises:
and if the total index data volume is detected to exceed a preset value, indexing in batches through the node server.
8. An elastic search based data indexing device, comprising:
the data writing module is used for setting a main fragment based on the distributed search engine ElasticSearch and writing data into the main fragment;
the index creating module is used for respectively creating an index task for the data of each main fragment, and importing the corresponding index fragment into the main fragment according to the type of the index task to create an index;
the index query module is used for performing index query on the index fragments through each node server to obtain the index data volume of each node server if a data query request instruction is detected;
the data merging module is used for merging the index data volume of each node server to obtain the total index data volume;
and the fragment indexing module is used for indexing in a preset mode if the total index data volume is detected to exceed a preset value.
9. A computer device comprising a memory in which a computer program is stored and a processor which, when executing the computer program, implements the ElasticSearch-based data indexing method as claimed in any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the ElasticSearch-based data indexing method according to any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010610262.5A CN111797096A (en) | 2020-06-29 | 2020-06-29 | Data indexing method and device based on ElasticSearch, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010610262.5A CN111797096A (en) | 2020-06-29 | 2020-06-29 | Data indexing method and device based on ElasticSearch, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111797096A true CN111797096A (en) | 2020-10-20 |
Family
ID=72809685
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010610262.5A Pending CN111797096A (en) | 2020-06-29 | 2020-06-29 | Data indexing method and device based on ElasticSearch, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111797096A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112181993A (en) * | 2020-10-27 | 2021-01-05 | 广州市网星信息技术有限公司 | Service data query method, device, server and storage medium |
CN112463825A (en) * | 2020-11-02 | 2021-03-09 | 中国建设银行股份有限公司 | Elasticisearch cluster protection device, method, storage medium and computer equipment |
CN112612865A (en) * | 2020-12-17 | 2021-04-06 | 杭州迪普科技股份有限公司 | Document storage method and device based on elastic search |
CN112800104A (en) * | 2020-12-08 | 2021-05-14 | 江苏苏宁云计算有限公司 | Method and device for optimizing ES query request link |
CN112883252A (en) * | 2021-02-05 | 2021-06-01 | 成都新希望金融信息有限公司 | Service query method, device, computer equipment and readable storage medium |
CN113010526A (en) * | 2021-04-19 | 2021-06-22 | 星辰天合(北京)数据科技有限公司 | Storage method and device based on object storage service |
CN113094395A (en) * | 2021-03-19 | 2021-07-09 | 杭州复杂美科技有限公司 | Data query method, computer device and storage medium |
CN113407749A (en) * | 2021-06-28 | 2021-09-17 | 北京百度网讯科技有限公司 | Picture index construction method and device, electronic equipment and storage medium |
CN114490523A (en) * | 2021-12-31 | 2022-05-13 | 医渡云(北京)技术有限公司 | Data writing method and device, storage medium and equipment |
CN117632953A (en) * | 2023-11-20 | 2024-03-01 | 广州致远电子股份有限公司 | Data cycle storage method, device, server and storage medium |
-
2020
- 2020-06-29 CN CN202010610262.5A patent/CN111797096A/en active Pending
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112181993A (en) * | 2020-10-27 | 2021-01-05 | 广州市网星信息技术有限公司 | Service data query method, device, server and storage medium |
CN112463825A (en) * | 2020-11-02 | 2021-03-09 | 中国建设银行股份有限公司 | Elasticisearch cluster protection device, method, storage medium and computer equipment |
CN112800104A (en) * | 2020-12-08 | 2021-05-14 | 江苏苏宁云计算有限公司 | Method and device for optimizing ES query request link |
CN112612865A (en) * | 2020-12-17 | 2021-04-06 | 杭州迪普科技股份有限公司 | Document storage method and device based on elastic search |
CN112883252A (en) * | 2021-02-05 | 2021-06-01 | 成都新希望金融信息有限公司 | Service query method, device, computer equipment and readable storage medium |
CN113094395A (en) * | 2021-03-19 | 2021-07-09 | 杭州复杂美科技有限公司 | Data query method, computer device and storage medium |
CN113010526A (en) * | 2021-04-19 | 2021-06-22 | 星辰天合(北京)数据科技有限公司 | Storage method and device based on object storage service |
CN113407749A (en) * | 2021-06-28 | 2021-09-17 | 北京百度网讯科技有限公司 | Picture index construction method and device, electronic equipment and storage medium |
CN113407749B (en) * | 2021-06-28 | 2024-04-30 | 北京百度网讯科技有限公司 | Picture index construction method and device, electronic equipment and storage medium |
CN114490523A (en) * | 2021-12-31 | 2022-05-13 | 医渡云(北京)技术有限公司 | Data writing method and device, storage medium and equipment |
CN117632953A (en) * | 2023-11-20 | 2024-03-01 | 广州致远电子股份有限公司 | Data cycle storage method, device, server and storage medium |
CN117632953B (en) * | 2023-11-20 | 2024-07-16 | 广州致远电子股份有限公司 | Data cycle storage method, device, server and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111797096A (en) | Data indexing method and device based on ElasticSearch, computer equipment and storage medium | |
US10452691B2 (en) | Method and apparatus for generating search results using inverted index | |
CN110795499B (en) | Cluster data synchronization method, device, equipment and storage medium based on big data | |
CN112199442B (en) | Method, device, computer equipment and storage medium for distributed batch downloading files | |
CN110427386B (en) | Data processing method, device and computer storage medium | |
CN105677904B (en) | Small documents storage method and device based on distributed file system | |
CN112507020A (en) | Data synchronization method and device, computer equipment and storage medium | |
WO2019041500A1 (en) | Pagination realization method and device, computer equipment and storage medium | |
CN112925792A (en) | Data storage control method, device, computing equipment and medium | |
CN111813517B (en) | Task queue allocation method and device, computer equipment and medium | |
US8543722B2 (en) | Message passing with queues and channels | |
CN113254106B (en) | Task execution method and device based on Flink, computer equipment and storage medium | |
CN115757492A (en) | Hotspot data processing method and device, computer equipment and storage medium | |
US11509662B2 (en) | Method, device and computer program product for processing access management rights | |
CN113282591B (en) | Authority filtering method, authority filtering device, computer equipment and storage medium | |
CN110162395B (en) | Memory allocation method and device | |
US20240220334A1 (en) | Data processing method in distributed system, and related system | |
WO2022011946A1 (en) | Data prediction method, apparatus, computer device, and storage medium | |
CN116842012A (en) | Method, device, equipment and storage medium for storing Redis cluster in fragments | |
CN112182107A (en) | Method and device for acquiring list data, computer equipment and storage medium | |
US20150106884A1 (en) | Memcached multi-tenancy offload | |
CN111143232B (en) | Method, apparatus and computer readable medium for storing metadata | |
CN115203672A (en) | Information access control method and device, computer equipment and medium | |
CN114020745A (en) | Index construction method and device, electronic equipment and storage medium | |
CN112667682A (en) | Data processing method, data processing device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |