Disclosure of Invention
In order to solve the above problem, the present invention provides an index system for network session packets, which comprises the following modules:
the system comprises at least one index space for storing retrieval information of certain information of a network session packet, an information compression module, a retrieval information projection module and a reading module.
The index space is internally provided with a plurality of storage bits, one storage bit corresponds to one retrieval information, and all the storage bits have the same initial value.
The information compression module is used for carrying out compression operation on some or several kinds of information of the specified network session packet to obtain retrieval information corresponding to each piece of information.
The retrieval information projection module is used for projecting and mapping the related information of the network session packet to corresponding storage bits of an index space for storing the retrieval information of the information, and the projected storage bits are marked as projected when being projected for the first time; the related information of the network session packets with the same retrieval information is projected and mapped to the same storage bit.
The reading module is used for finding out the corresponding storage bit according to the retrieval information corresponding to one or more information of the network session packet to be read, and further finding out the corresponding network session packet.
Further, the retrieval information projection module divides time into a plurality of time period layers, each time period layer divides the time into a plurality of time units which are arranged in time sequence and have the same length, each time unit is divided into a plurality of time units which are arranged in time sequence and have the same length to form the next time period layer, the time units are divided in a circulating mode until the preset minimum time unit is reached, the retrieval information projection module circularly records the storage value condition of all storage bits in each index space according to the preset period, and each recording moment corresponds to the time unit corresponding to each time period layer.
Further, the information type of the network session packet includes IP address information and/or port number information and/or network protocol ID number information.
Further, the method for compressing certain information of the specified network session packet by the information compression module to obtain the retrieval information comprises the following steps:
the method comprises the following steps: and taking the parameters of the information as input parameters of a Hash function to carry out Hash operation to obtain a Hash value.
Step two: and dividing the hash value by the size of the index space corresponding to the information to carry out remainder, wherein the obtained remainder is the retrieval information.
Furthermore, the retrieval information projection module projects and maps the information subjected to Hash operation in the information compression module to the storage bit with the bit number as the rest value in the corresponding index space.
The working method of the index system of the network session packet is characterized in that in the method for compressing the information to obtain the retrieval information, if the compression operation is carried out on the IP address, each byte of the IP address is independently used as an input parameter of the Hash.
Furthermore, in the method for compressing information to obtain the retrieval information, if the port number is compressed, each byte of the port number is independently used as an input parameter of the Hash.
Furthermore, in the process of compressing the information to obtain the retrieval information, the number of bits corresponding to the size of the index space is the denominator of the remainder operation formula.
Further, the original value of each memory bit is 0, and is marked as being 1 after being projected.
Further, the length of the time unit is 24 hours at the maximum.
The invention has the beneficial effects that:
the invention can promote the processing scale of the index to infinity, write and inquire at high speed by defining a novel indexing technology, namely a projection index, and almost has the cost of 0 when two or more indexes are combined into a higher-level index. The projection index of the invention can reside in the memory because of small volume, the cost of hard disk IO is saved, and the cost of writing and reading is basically close to 0 by bit operation, but the performance can be greatly improved.
Detailed Description
The invention provides an index system of a network session packet, which comprises at least one index space for storing retrieval information of certain information of the network session packet, an information compression module, a retrieval information projection module and a reading module.
The index space is internally provided with a plurality of storage bits, one storage bit corresponds to one retrieval information, and all the storage bits have the same initial value.
The information compression module is used for carrying out compression operation on some or several kinds of information of the specified network session packet to obtain retrieval information corresponding to each piece of information.
The retrieval information projection module is used for projecting and mapping the related information of the network session packet to corresponding storage bits of an index space for storing the retrieval information of the information, and the projected storage bits are marked as projected when being projected for the first time; the related information of the network session packets with the same retrieval information is projected and mapped to the same storage bit.
The reading module is used for finding out the corresponding storage bit according to the retrieval information corresponding to one or more information of the network session packet to be read, and further finding out the corresponding network session packet.
Further, the retrieval information projection module divides time into a plurality of time period layers, each time period layer divides the time into a plurality of time units which are arranged in time sequence and have the same length, each time unit is divided into a plurality of time units which are arranged in time sequence and have the same length to form the next time period layer, the time units are divided in a circulating mode until the preset minimum time unit is reached, the retrieval information projection module circularly records the storage value condition of all storage bits in each index space according to the preset period, and each recording moment corresponds to the time unit corresponding to each time period layer. An example of time division is shown in fig. 2. In the figure, the time unit length of the first layer time period layer is 1 day, the 1 day is divided into time units with the length of an integer number of hours, each hour is divided into time units with the length of an integer number of minutes, and the time units with the length of each minute are divided into time units with the length of an integer number of seconds. It should be understood that the value of the specific time unit length is determined by itself according to the actual situation. The present embodiment preferably takes 24 hours as the longest time unit to enable the query to be accurate to the date.
The information type of the network session packet comprises IP address information and/or port number information and/or network protocol ID number information. Of course, the information type is not limited to this, and the information type may be expanded.
The method for compressing certain information of the appointed network session packet by the information compression module to obtain the retrieval information comprises the following steps:
the method comprises the following steps: and taking the parameters of the information as input parameters of a Hash function to carry out Hash operation to obtain a Hash value.
Step two: and dividing the hash value by the size of the index space corresponding to the information to carry out remainder, wherein the obtained remainder is the retrieval information.
For example, assume that the index space storing the IP address information has a space of N MB. Assume that the IP addresses are IP1, IP2, IP3, IP4(1 byte 8 bits). If the retrieval information is POS, the calculation steps are as follows:
HashKey=HASH(IP1,IP2,IP3,IP4);
POS=HashKey%(N*1024*1024*8);
the origins of N × 1024 × 8 are: in the process of compressing information to obtain retrieval information, the number of bits corresponding to the size of the index space is the denominator of the remainder operation formula.
It is emphasized that the retrieved information projection module projects and maps the information subjected to the Hash operation in the information compression module to the storage bit with the bit number of the corresponding index space as the rest value. The storage bits possibly calculated by the same information of different network session packets are the same, so that the information belongs to the same information and is projected and mapped to the same storage bits, and the size of the index space is greatly saved. If the index space is larger, one storage bit corresponds to less same information, the information classification is finer, and if the index space is smaller, the storage bit corresponds to more same information, and the information classification is coarser. In general, this design can accommodate any size of index space.
The working method of the present invention is explained below. The working method can be divided into a flow with time divided into a plurality of time period layers, a storage flow, a storage value condition recording flow and a reading flow.
The process of time-dividing into a plurality of time period layers specifically comprises the following steps: each time period layer divides time into a plurality of time units which are arranged in time sequence and have the same length, each time unit is divided into a plurality of time units which are arranged in time sequence and have the same length to form a next time period layer, the division is circulated until a preset minimum time unit is reached, the retrieval information projection module circularly records the storage value condition of all storage bits of each index space according to a preset period, and each recording moment corresponds to the time unit corresponding to each time period layer.
The storage process comprises the following steps:
step 1: at least one index space used for storing retrieval information of certain information of a network session packet is pre-designated, a plurality of storage bits are arranged in the index space, one storage bit corresponds to one retrieval information, and all the storage bits have the same initial value.
Step 2: extracting some or several kinds of information of the appointed network session packet, and determining the index space and the index space size corresponding to various kinds of information.
And step 3: and respectively carrying out Hash operation on each information of each network session packet and then carrying out projection mapping.
The Hash operation of each message comprises the following steps:
step 3.1: and extracting the parameters of the information.
Step 3.2: and taking the parameters as the input value of the Hash to carry out operation to obtain a Hash value.
Step 3.3: and dividing the hash value by the size of the index space corresponding to the information to carry out remainder, thereby obtaining a remainder.
Step 3.4: the retrieval information projection module maps the information to the storage bit with the bit number as the residual value in the corresponding index space, the residual value is used as the index information of the information, and the mapped storage bit is marked as projected.
The storage value condition recording flow specifically comprises the following steps: and circularly recording the storage value conditions of all storage bits in each index space according to a preset period, and corresponding each recording time to the time unit corresponding to each time period layer.
The reading process comprises the following steps:
step 1: a certain information or several information of the network session stream to be read is predefined.
Step 2: and (3) the information compression module performs compression operation on the information in the step (1) one by one to obtain corresponding retrieval information.
And step 3: and the reading module finds the corresponding storage bit according to the retrieval information, and further searches the corresponding network session packet.
The step 3 specifically comprises the following steps:
step 3.1: and taking the time period layer with the longest time unit length as a first layer time period layer, starting to read whether storage bits corresponding to all retrieval information are marked as projected or not by the system from a time unit selected in advance in the first layer time period layer, if so, locking the time unit, executing the next step, otherwise, traversing and reading the storage bits corresponding to all the retrieval information in other time units in the first layer time period layer until the storage bits can be locked and the storage bits are marked as the projected time units, and if not, ending the reading process.
Step 3.2: the system starts to search the time unit which is marked as the projected storage position corresponding to each retrieval information in each time unit of the next time period layer divided by the locked time unit in a traversing way, and the time unit is locked.
And 3.3, repeating the step 3.2 until the corresponding time unit of the time period layer with the shortest locking time unit is obtained.
Step 3.4: and reading the network session flow in the last locked time unit, and extracting the network session flow meeting the requirement.