KR101770271B1 - Field-Indexing Method for Message - Google Patents
Field-Indexing Method for Message Download PDFInfo
- Publication number
- KR101770271B1 KR101770271B1 KR1020150113619A KR20150113619A KR101770271B1 KR 101770271 B1 KR101770271 B1 KR 101770271B1 KR 1020150113619 A KR1020150113619 A KR 1020150113619A KR 20150113619 A KR20150113619 A KR 20150113619A KR 101770271 B1 KR101770271 B1 KR 101770271B1
- Authority
- KR
- South Korea
- Prior art keywords
- query
- indexing
- field
- message
- present
- Prior art date
Links
Images
Classifications
-
- G06F17/30613—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G06F17/30616—
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a message indexing method performed by a computer, and includes a first step of executing a query tokenizer module for a message, and a second step of performing indexing on the tokenized information in the first step do
Description
Field of the Invention The present invention relates to a method of indexing a field of a message, and more particularly, to an indexing method capable of performing a low-speed and high-speed search.
As the Internet develops and the online service using the Internet spreads, the amount of data such as a work history, for example, a web log, a firewall log, and a transaction log, generated in the operation process of a server that provides a service on- Generally, a plurality of log lines are recorded in the log. In each log line, a visitor's access internet protocol (IP), visit time information, visit web page information, visit status, and the like can be recorded.
In order to analyze such logs, it is often necessary to search for specific words or strings, and the importance of indexing process for management or retrieval of log data is emerging.
Patent No. 1112568, which the inventor of the present invention holds as a patentee, discloses a log indexing method for efficient retrieval of vast amounts of data. According to the method disclosed in this patent, it is possible to perform reliable and quick log search through indexing without a normalization process of converting the log into a database. The entire contents of this patent are hereby incorporated herein by reference and may be used for the purposes of illustration of the present invention without any explicit reference herein. However, the contents of this patent should not be construed as limiting the scope of the present invention and should be used only for the purpose of helping understanding of the present invention.
It is an object of the present invention to provide an indexing method that is quicker and less burdensome than the log indexing method of the above-mentioned patent.
The present invention relates to a message indexing method performed by a computer, and includes a first step of executing a query tokenizer module for a message, and a second step of performing indexing on the tokenized information in the first step do.
Preferably, the query when executing the query tokenizer module in the first step includes a syntax for designating some fields of the message, and the second step is performed only for the specified certain fields.
The query when executing the query tokenizer module in the first step preferably also includes a syntax defining a field type of a specified field of a message.
According to the present invention, high-speed indexing is possible, input / output load is reduced, and range and magnitude comparison can be performed quickly in search.
1 is a flowchart of an indexing method according to the present invention;
2 is a flowchart of a method of searching for indexed data by the indexing method according to the present invention.
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. The indexing method according to the present invention is performed by a computer, and in this specification, a computer is defined as performing an electronic operation and covering an electronic device operable by a program. For example, it may include a personal computer (PC) as well as a server computer or a mobile device if it is suitable for data processing by the present invention.
FIG. 1 shows a flowchart of the indexing method according to the present invention. First, the original data to be indexed is inserted (100). The type of the original data may be Web log data, firewall log data, financial transaction log data, and the like. Any type of data applicable to the indexing method according to the present invention may be included, Is used interchangeably.
When the original data is inserted, the query tokenizer module is executed for the original data (message) (110). The query tokenizer module is defined as a logical combination of general purpose hardware and software that performs its functions, based on a query based on a query, unlike a conventional tokenizer.
Execution of the query tokenizer module allows some fields of the original data to be specified and torque-ned based on the query.
The token and token meta information is extracted 120 by the execution of the tokenizer module. The token meta information may include a field name, a field type (int, string, long, ip).
Next, an index is created by indexing the torque-based information (130). There are various specific indexing methods, for example, the indexing method disclosed in the above-mentioned patent of the present inventor can be applied. However, the scope of the present invention is not limited to the specific method of indexing the data after torque-aging by the execution of the query tokenizer module according to the present invention, and various publicly available indexing methods can be applied.
When the query tokenizer module is executed and the field type is designated for a desired field, the indexing can be performed at a higher speed than the conventional indexing method. According to the indexing method of the present invention, since indexing is performed only on a specific (desired) field of data, for example, port information, a file size, a file name and the like, compared with the conventional indexing method of indexing an entire field up to an unnecessary field, Is reduced, so that a very fast search is possible. In addition, since the type of field can be specified before indexing by execution of the query tokenizer module, the indexing can be performed with a smaller capacity than the conventional method of indexing in the form of a string, and the I / O load during retrieval decreases, . For example, storing an IP address as a string can be up to 15 bytes, but if the field type is stored as an ip type, it is reduced to 4 bytes, which reduces the time required to read the data. In addition, since there is information on the field type (string, int, ip) after indexing, it is possible to perform an operation corresponding to the type. Therefore, it is very useful when searching for a range or comparing small and large. Also, it is possible to perform range search at high speed by directly performing an operation corresponding to a corresponding type instead of an OR conditional expression for an existing string.
FIG. 2 is a flowchart for searching information indexed by the indexing method according to the present invention.
First, the header of the segment is referred to, and it is confirmed whether or not the search period is applicable, and the existence of the corresponding field to be searched is checked by referring to the field entry (200). Next, a posting group (a log identification information list including the token) corresponding to the expression is extracted (210). Then, a Boolean operation is performed on the posting (220). For example, when searching for a plurality of search words.
And extracts the original log based on the list of the log identification information (log ID) searched for (230).
While the present invention has been described with reference to the accompanying drawings, it is to be understood that the scope of the present invention is defined by the claims that follow, and should not be construed as limited to the above-described embodiments and / or drawings. It is to be expressly understood that improvements, changes and modifications that are obvious to those skilled in the art are also within the scope of the present invention as set forth in the claims.
Claims (3)
A first step of executing a query tokenizer module for the message,
And a second step of performing indexing on the torque-aged information in the first step,
Wherein the first step is a step of the query tokenizer module torque-based on a query based on a query, the query including a statement defining a field type of a specified field of a message,
How to index messages.
Wherein the query when executing the query tokenizer module in the first step includes a syntax for specifying a field of a message,
Wherein the second step is performed only for the designated partial field,
How to index messages.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150113619A KR101770271B1 (en) | 2015-08-12 | 2015-08-12 | Field-Indexing Method for Message |
PCT/KR2016/006781 WO2017026647A1 (en) | 2015-08-12 | 2016-06-24 | Field indexing method for message |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150113619A KR101770271B1 (en) | 2015-08-12 | 2015-08-12 | Field-Indexing Method for Message |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020170088290A Division KR101921123B1 (en) | 2017-07-12 | 2017-07-12 | Field-Indexing Method for Message |
Publications (2)
Publication Number | Publication Date |
---|---|
KR20170019603A KR20170019603A (en) | 2017-02-22 |
KR101770271B1 true KR101770271B1 (en) | 2017-08-22 |
Family
ID=57984373
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150113619A KR101770271B1 (en) | 2015-08-12 | 2015-08-12 | Field-Indexing Method for Message |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR101770271B1 (en) |
WO (1) | WO2017026647A1 (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140136513A1 (en) * | 2012-11-15 | 2014-05-15 | Ecole polytechnique fédérale de Lausanne (EPFL) | Query management system and engine allowing for efficient query execution on raw details |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AUPR645701A0 (en) * | 2001-07-18 | 2001-08-09 | Tralee Investments Ltd | Database adapter |
US7152056B2 (en) * | 2002-04-19 | 2006-12-19 | Dow Jones Reuters Business Interactive, Llc | Apparatus and method for generating data useful in indexing and searching |
US9047326B2 (en) * | 2012-10-12 | 2015-06-02 | A9.Com, Inc. | Index configuration for searchable data in network |
US10387491B2 (en) * | 2013-07-16 | 2019-08-20 | Semantic Technologies Pty Ltd | Ontology index for content mapping |
-
2015
- 2015-08-12 KR KR1020150113619A patent/KR101770271B1/en active IP Right Grant
-
2016
- 2016-06-24 WO PCT/KR2016/006781 patent/WO2017026647A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140136513A1 (en) * | 2012-11-15 | 2014-05-15 | Ecole polytechnique fédérale de Lausanne (EPFL) | Query management system and engine allowing for efficient query execution on raw details |
Also Published As
Publication number | Publication date |
---|---|
WO2017026647A1 (en) | 2017-02-16 |
KR20170019603A (en) | 2017-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10452691B2 (en) | Method and apparatus for generating search results using inverted index | |
US8972372B2 (en) | Searching code by specifying its behavior | |
US9811321B1 (en) | Script compilation | |
US8560519B2 (en) | Indexing and searching employing virtual documents | |
US10754628B2 (en) | Extracting web API endpoint data from source code to identify potential security threats | |
US20200089674A1 (en) | Executing conditions with negation operators in analytical databases | |
CN102915344B (en) | SQL (structured query language) statement processing method and device | |
US20160019266A1 (en) | Query generating method and query generating device | |
US20120166412A1 (en) | Super-clustering for efficient information extraction | |
CN112860730A (en) | SQL statement processing method and device, electronic equipment and readable storage medium | |
CN111368227A (en) | URL processing method and device | |
CN103914479A (en) | Resource request matching method and device | |
CN114297204A (en) | Data storage and retrieval method and device for heterogeneous data source | |
CN110674383B (en) | Public opinion query method, device and equipment | |
KR101921123B1 (en) | Field-Indexing Method for Message | |
KR101770271B1 (en) | Field-Indexing Method for Message | |
US10262056B2 (en) | Method and system for performing search queries using and building a block-level index | |
CN110968763A (en) | Data processing method and device | |
US9104730B2 (en) | Indexing and retrieval of structured documents | |
US10915594B2 (en) | Associating documents with application programming interfaces | |
US20150127624A1 (en) | Framework for removing non-authored content documents from an authored-content database | |
CN108073607B (en) | URL processing method and device | |
Jain et al. | Sampling semantic data stream: Resolving overload and limited storage issues | |
JP2014503916A (en) | Universal plug and play search condition conversion | |
CN113553347B (en) | Block chain-based data processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
E902 | Notification of reason for refusal | ||
AMND | Amendment | ||
E601 | Decision to refuse application | ||
AMND | Amendment | ||
GRNT | Written decision to grant | ||
X701 | Decision to grant (after re-examination) |