[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN110266316B - Data compression and decompression method, device and equipment - Google Patents

Data compression and decompression method, device and equipment Download PDF

Info

Publication number
CN110266316B
CN110266316B CN201910380862.4A CN201910380862A CN110266316B CN 110266316 B CN110266316 B CN 110266316B CN 201910380862 A CN201910380862 A CN 201910380862A CN 110266316 B CN110266316 B CN 110266316B
Authority
CN
China
Prior art keywords
array
coding
result
data
elements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910380862.4A
Other languages
Chinese (zh)
Other versions
CN110266316A (en
Inventor
唐德荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Advanced New Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced New Technologies Co Ltd filed Critical Advanced New Technologies Co Ltd
Priority to CN201910380862.4A priority Critical patent/CN110266316B/en
Publication of CN110266316A publication Critical patent/CN110266316A/en
Application granted granted Critical
Publication of CN110266316B publication Critical patent/CN110266316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3068Precoding preceding compression, e.g. Burrows-Wheeler transformation
    • H03M7/3071Prediction
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4093Variable length to variable length coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the specification discloses a data compression method, a data decompression method, a data compression device, a data decompression device and equipment. The data compression method comprises the following steps: generating a first array according to an original sequence, wherein elements in the first array are different from each other and contain all numerical values in the original sequence; generating a second array according to the original sequence, wherein elements in the second array are position indexes of the elements in the original sequence in the first array; coding the first array by adopting a coding method based on numerical prediction to obtain a first coding result; coding the second array by adopting a variable length coding method to obtain a second coding result; and storing the first encoding result and the second encoding result according to a preset mapping relation.

Description

Data compression method, data decompression method, device and equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for data compression and decompression.
Background
In a scientific computing environment, it is often necessary to store large amounts of data on computers or to transfer data between computers. The basic idea of the data compression technology is to replace repeated data in the data to be compressed with symbols or codes which occupy less space, so that the compressed data occupies less disk storage control or shorter transmission time.
For the compression of long type, double type or float type data, the traditional method is to directly use variable length coding to compress, when the value in the sequence is generally small, the method has better compression effect, but under the condition that the value distribution in the sequence is not uniform, the method has poor compression effect. In view of the complement of the prior art, it is necessary to provide a more effective data compression method for data types with large occupation such as long type, double type or float type, so as to improve the compression rate of data in a numerical sequence with non-uniform numerical distribution.
Disclosure of Invention
In view of this, embodiments of the present application provide a method, an apparatus, and a device for data compression and decompression, which are used to increase the data compression degree for data of types that occupy more space, such as long type, double type, or float type, and improve the data compression rate for a numerical sequence with any number distribution or uneven number distribution.
In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:
an embodiment of the present specification provides a data compression method, including: generating a first array according to an original sequence, wherein elements in the first array are different from one another and contain all numerical values in the original sequence; generating a second array according to the original sequence, wherein elements in the second array are position indexes of the elements in the original sequence in the first array; coding the first array by adopting a coding method based on numerical prediction to obtain a first coding result; coding the second array by adopting a variable length coding method to obtain a second coding result; and storing the first coding result and the second coding result according to a preset mapping relation.
An embodiment of the present specification provides a data decompression method, including: acquiring compressed data, wherein the compressed data comprises first compressed data and second compressed data with a preset mapping relation; based on the data characteristics of the first compressed data, decompressing the first compressed data by adopting a numerical prediction-based coding method to obtain a first array; decompressing the second compressed data according to a preset variable length coding rule based on the data characteristics of the first compressed data to obtain a second array; and obtaining decompressed data according to the preset mapping relation based on the first array and the second array.
An embodiment of this specification provides a data compression apparatus, including: a generating module, configured to generate a first array and a second array according to an original sequence, where elements in the first array are different from each other and include all values in the original sequence, and an element in the second array is a position index of an element in the original sequence in the first array; the first coding module is used for coding the first array by adopting a coding method based on numerical prediction to obtain a first coding result; the second coding module is used for coding the second array by adopting a variable length coding method to obtain a second coding result; and the storage module is used for storing the first coding result and the second coding result according to a preset mapping relation.
The data decompression device provided by the embodiment of the specification comprises: the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring compressed data, and the compressed data comprises first compressed data and second compressed data with a preset mapping relation; the first data decompression module is used for decompressing the first compressed data by adopting a numerical prediction-based coding method based on the data characteristics of the first compressed data to obtain a first array; the second data decompression module is used for decompressing the second compressed data according to a preset variable length coding rule based on the data characteristics of the first compressed data to obtain a second array; and the data generation module is used for obtaining decompressed data according to the preset mapping relation based on the first array and the second array.
An embodiment of the present specification provides a data compression apparatus, including:
at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to cause the at least one processor to:
generating a first array according to an original sequence, wherein elements in the first array are different from each other and contain all numerical values in the original sequence;
generating a second array according to the original sequence, wherein elements in the second array are position indexes of the elements in the original sequence in the first array;
coding the first array by adopting a coding method based on numerical prediction to obtain a first coding result;
coding the second array by adopting a variable length coding method to obtain a second coding result;
storing the first encoding result and the second encoding result according to a preset mapping relation,
or,
to enable the at least one processor to:
acquiring compressed data, wherein the compressed data comprises first compressed data and second compressed data with a preset mapping relation;
decompressing the first compressed data by adopting a numerical prediction-based method based on the data characteristics of the first compressed data to obtain a first array;
decompressing the second compressed data according to a preset variable length coding rule based on the data characteristics of the first compressed data to obtain a second array;
and obtaining decompressed data according to the preset mapping relation based on the first array and the second array.
The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects: the original sequence is converted into two arrays, and coding is carried out by adopting different methods respectively based on the characteristics of the two arrays so as to obtain a final coding result. The data compression method increases the data compression degree of data occupying more places such as long type, double type or float type, and improves the data compression rate of numerical value sequences with unevenly distributed numerical values.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic flowchart of a data compression method provided in an embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a data compression method according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a numerical prediction-based encoding method according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a numerical prediction method in an encoding method based on numerical prediction according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of another numerical prediction method in the encoding method based on numerical prediction according to the embodiment of the present disclosure;
fig. 6 is a schematic diagram of a numerical prediction-based encoding method using two predictors according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a method for generating a first array from an original sequence according to an embodiment of the present disclosure;
fig. 8 is a schematic flow chart of a data decompression method provided in an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a data compression apparatus provided in an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of a data decompression device according to an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of an apparatus corresponding to a data compression method and a data decompression method according to an embodiment of the present specification.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In a traditional compression scheme of long type, double type or float type data, a variable length coding mode is directly used for compression, and the method brings negative optimization to the condition that values of long type, double type or float type data sequences are not uniformly distributed (when the values of long type, double type or float type data are larger, the number of compressed bytes is more than 8B); moreover, the conventional compression scheme has a poor compression effect on a sequence in which data is randomly scrambled. In view of this, it is necessary to propose a data compression method that is more effective for data of the long type, double type, or float type, and is not affected by the value distribution and/or randomness of the data.
For example, in one particular application scenario, the graph engine Sharkgraph can provide ultra-large graph storage and graph computation, and support timing graphs, and the point-edge attributes in Sharkgraph, which contain a large amount of data such as long/double types, can be stored in a column. Such a graph database usually occupies a very large storage space, and therefore, it is necessary to provide an effective data compression scheme to deal with the computation of the super-large graph so as to better utilize the storage space.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flowchart of a data compression method according to an embodiment of the present disclosure. From the viewpoint of a program, the execution subject of the flow may be a program installed in an application server or an application client.
Referring to fig. 1, the data compression method may include the steps of:
s110: and generating a first array according to the original sequence, wherein elements in the first array are different from each other and contain all numerical values in the original sequence.
The original sequence may be a sequence of any data type, and preferably may be a sequence of 32-bit or 64-bit data with a larger number of bits, such as long type, double type, or float type. That is to say, the data compression method provided by the embodiment of the present disclosure can be applied to data sequences of any data type, wherein the compression effect on data sequences of long type, double type or float type is particularly prominent.
According to the embodiment, the first array can be generated by a sequence deduplication method according to the original sequence, so that all mutually different elements in the original sequence are contained in the first array. Any existing deduplication method may be employed to generate the first array, and the elements in the first array resulting from the different methods may have a consistent order with the elements in the original sequence, or may have a random order. Specifically, for example, if the original sequence includes [100,400,200,400,500,100,500], the order of elements in the first array may be identical to the original sequence, i.e., may be [100,400,200,500]; or the order of the elements in the first array may be random, e.g., may be [400,200,100,500], etc.
For example, the first array may be obtained through a distint statement, and the obtained first array may include elements different from each other in the original sequence, where an order of the elements in the first array may be consistent with an order in which the corresponding elements appear in the original sequence for the first time. The data type of the elements in the first array may be the same as the data type of the elements in the original sequence.
S120: and generating a second array according to the original sequence, wherein the elements in the second array are position indexes of the elements in the original sequence in the first array.
Wherein the elements in the second array are index values, in particular, the position indexes of the elements in the original sequence in the first array. Since the amount of elements in the original sequence is limited, the range of values of the elements in the second array, which is a set of position indices, is limited, and the values of the elements in the second array are typically relatively small, well suited for achieving efficient compression using variable length coding methods. The data type of the elements in the second array may be different from the data type of the elements in the original sequence, e.g., when the data type in the original sequence is Long, the elements in the second array may be 32int type data.
According to the embodiment, according to different array generation methods, the two steps of generating the first array according to the original sequence and generating the second array according to the original sequence may be performed sequentially or simultaneously, which is not limited in the present application.
S130: and coding the first array by adopting a coding method based on numerical prediction to obtain a first coding result.
Specifically, the numerical prediction Method may include numerical prediction methods such as a Last Value Predictor (Last Value Predictor), a Stride Predictor (Stride Predictor), a Finite Context prediction Method (FCM), a Differential Finite Context prediction Method (DFCM), and the like, but is not limited thereto.
Specifically, the encoding method based on the numerical prediction method means that the element to be encoded is predicted to obtain the difference between the predicted value and the element to be encoded, and the difference is encoded and stored, that is, the element is encoded.
According to an embodiment, after obtaining the first encoding result, the first encoding result may be stored in a byte array.
S140: and coding the second array by adopting a variable length coding method to obtain a second coding result.
According to the embodiment, the variable length coding method, namely the variable length byte coding method, is a coding method for representing the same type of data by using byte data with different lengths, and can be beneficial to reducing the total number of bytes of the data, thereby reducing the occupied storage space. The variable length coding method includes a high order differentiation based variable length coding method such as UTF-8, base 128Varints, etc. and a variable length coding method using a proxy for differentiation.
According to an embodiment, after obtaining the second encoding result, the second encoding result may be stored in a byte array. According to an embodiment, when the elements in the second array are 32int type data, each data occupies 4 bytes before encoding; after encoding, each data may occupy 1 to 4 bytes.
S150: and storing the first coding result and the second coding result according to a preset mapping relation.
In particular, the first coding structure and the second coding result together constitute the compressed data of the original sequence. Wherein the mapping relation comprises that the element in the second encoding result represents the position index of the element in the original sequence in the first encoding result.
According to an embodiment, the encoding result including the first encoding result and the second encoding result may include a compression header, and the compression header may store the preset mapping relationship. Optionally, the compression header may further store therein storage structure-related information of the first encoding result and the second encoding result.
After the first array and the second array are encoded, the corresponding first encoding result and the second encoding result may be stored according to bytes, that is, the first encoding result and the second encoding result are stored as byte arrays. Here, the storage medium is not particularly limited.
Fig. 2 is a schematic diagram of a data compression method provided in an embodiment of the present disclosure.
As can be seen from fig. 1 and 2, the above-mentioned embodiments of the present disclosure provide a data compression method, which creatively compresses a sequence by converting an original sequence into two arrays, and performs specified compression according to the characteristics of the converted different arrays. The compression method of the embodiment of the disclosure has the advantages that the compression effect is independent of value distribution, the compression of numerical sequences in any numerical range can be realized, even for sequences with uneven numerical value distribution, the compression effect is very good, and a larger data compression ratio is obtained; when the values in the sequence are distributed densely or have many same values, the compression effect is extremely outstanding and has a larger compression ratio. The data compression method provided by the embodiment of the disclosure can reduce the total number of bytes of the file to be transmitted or stored, thereby realizing faster transmission of data, reducing the disk occupation space of the file, and improving the utilization rate of the storage space.
The following detailed description of the embodiments of the present disclosure will be made with reference to the accompanying drawings.
According to an embodiment, the generating a second array according to the original sequence (step S120) may specifically include: reading elements in the original sequence; searching for an element with the same value as the element in a first array; and acquiring subscript values of the elements in the first array, and storing the subscript values in a second array. The method of generating the second array is not limited to the method described herein.
According to an embodiment, the encoding the first array by using a coding method based on numerical prediction to obtain a first encoding result (step S130), may specifically include: predicting the elements in the first array by adopting a predictor to obtain a predicted value; performing XOR operation on the predicted value and the real value of the predicted element to obtain an XOR result; and coding the XOR result to obtain a first coding result.
According to an embodiment, if a predicted value of an element is close to a value of an element, and the predicted value of an element is identical to upper bits of the value of an element, their exclusive or result may have leading zeros, and thus, the leading zeros of the exclusive or result may be compressed. Obviously, the closer the predicted value is to the true value of the element, the more leading zeros in the xor result, and the greater the degree of compression that can be applied to the xor result.
Fig. 3 is a schematic diagram of an encoding method based on numerical prediction according to an embodiment of the present disclosure.
According to an optional embodiment, in step S130, the encoding the xor result to obtain a first encoding result may specifically include: coding the leading zeros of the XOR result to obtain a first coding section; determining the part except the leading zero in the exclusive-or result as a second coding segment; and obtaining a first coding result according to the first coding section and the second coding section.
Optionally, the first encoding result may sequentially include a first encoding section and a second encoding section of each element, and specifically, the first encoding result may sequentially include, from a start position to an end position, the first encoding section of the first element, the second encoding section of the first element, the first encoding section of the second element, the second encoding section of the second element, \8230 \ 8230:, the first encoding section of the nth element, and the second encoding section of the nth element.
Optionally, the first encoding result may include a first encoding segment and a second encoding segment in every two elements as a group, and specifically, the first encoding result may include, in sequence from a start position to an end position, a first encoding segment of a first element, a first encoding segment of a second element, a second encoding segment of the first element, a second encoding segment of the second element, \ 8230 \ 1, a first encoding segment of an n-1 element, a first encoding segment of an n-th element, a second encoding segment of an n-1 element, and a second encoding segment of an n-th element. Wherein the first code segment of the first element and the first code segment of the second element may be stored in one byte. It should be noted that, in the first encoding result, the storage manner of the first encoding section and the second encoding section of each element is not limited to the above two methods, but may be stored in any reasonable order.
Specifically, referring to fig. 3, data D n The encoding compression method of (3) may include: predictor generation prediction value Pred n (ii) a To-be-compressed value D n And the predicted value Pred n Performing XOR operation to obtain the XOR result Diff n Wherein, if the predicted value Pred n And a value D to be compressed n If they are close, the XOR result Diff n Comprises a plurality of leading zeros; encoding the XOR result Diff by a leading zero count value n The number of leading zeros in (a); obtaining an encoding result LZC n Bits n . Wherein a first coding section LZC is used n Represents a leading zero count value; using second code segments Bits n Representing the remaining bits. According to an embodiment, a preset number of bits (e.g., 4 bits) may be used to represent the first encoding segment LZC n
Accordingly, when data decompression is performed, the compressed data LZC is performed n Bits n Decompression to D n The method of (3) may comprise: first coding segment LZC reading a preset number of bits (e.g., 4 bits) n And the corresponding remaining bit (e.g., 64 minus LZC) n Bit number occupied by middle read leading zero) of the first coding segment Bits) of the second coding segment(s) n Obtaining Diff n (ii) a Using the same prediction value Pred as in data compression n And Diff n Performing XOR operation to obtain decompressed data D n
When the exclusive-or result includes not only leading zeros but also trailing zeros, the leading zeros and the trailing zeros may be encoded and compressed at the same time.
According to another optional embodiment, in step S130, the encoding the xor result to obtain a first encoding result may further include: coding the leading zeros of the XOR result to obtain a first coding section; coding the tail zero of the XOR result to obtain a second coding section; determining the part except the leading zero and the tail zero in the exclusive-or result as a third coding segment; and obtaining a first coding result according to the first coding segment, the second coding segment and the third coding segment, wherein the first coding result sequentially comprises the first coding segment, the third coding segment and the second coding segment.
Optionally, the first encoding result may sequentially include a first encoding segment, a second encoding segment, and a third encoding segment of each element, and specifically, the first encoding result may sequentially include, from a start position to an end position, the first encoding segment of the first element, the second encoding segment of the first element, the third encoding segment of the first element, the first encoding segment of the second element, the second encoding segment of the second element, the third encoding segment of the second element, \ 8230 \ 8230 \, the first encoding segment of the nth element, the second encoding segment of the nth element, and the third encoding segment of the nth element.
Optionally, the first encoding result may include a first encoding segment, a second encoding segment, and a third encoding segment every two elements as a group, and specifically, the first encoding result may include, in sequence from a start position to an end position, the first encoding segment of the first element, the first encoding segment of the second element, the second encoding segment of the first element, the second encoding segment of the second element, the third encoding segment of the first element, the third encoding segment of the second element, \\ 8230 \\ 8230 \ the first encoding segment of the n-1 element, the first encoding segment of the n-th element, the second encoding segment of the n-1 element, the third encoding segment of the n-1 element, and the third encoding segment of the n-1 element. Wherein the first code segment of the first element and the first code segment of the second element may be stored in one byte, and the third code segment of the first element and the third code segment of the second element may be stored in one byte. It should be noted that, in the first encoding result, the storage manners of the first encoding section, the second encoding section, and the third encoding section of each element are not limited to the above two methods, but may be stored in any reasonable order.
Specifically, for data D n The encoding compression method of (3) may include: predictor generation prediction value Pred n (ii) a Will compress the value D n And the predicted value Pred n Performing XOR operation to obtain the XOR result Diff n Wherein the result of the XOR is Diff n Comprises n1 leading zeros and n2 trailing zeros; encoding the n1 leading zeros by leading zero count values and by trailing zero count valuesTo encode the n2 tail zeros; obtaining an encoding result LZC n Bits n TZC n . Wherein, in the coding result, the first coding segment LZC is used n Indicating the count of leading zeros by a second coding section TZC n Indicating a tail zero count value; using third code segments Bits n Representing the remaining bits. According to an embodiment, a first preset number of bits (e.g., 4 bits) may be used to represent the first code segment LZC n The second coded segment TZC may be represented using a second preset number of bits (e.g., 4 bits) n
According to the embodiment, in practical application, the leading zeros and/or the trailing zeros may be compressed according to actual needs, so as to achieve that data is compressed to the maximum extent, and thus the maximum compression rate is obtained.
Fig. 4 is a schematic diagram of a numerical prediction method in an encoding method based on numerical prediction according to an embodiment of the present disclosure.
According to an alternative embodiment, in step S130, the predicting the elements in the first array to obtain a predicted value may specifically include: searching a corresponding historical value sequence based on a sequence formed by a plurality of previous elements of the elements to be predicted; and obtaining a corresponding predicted value based on the historical value sequence.
Specifically, according to a sequence formed by the first several (for example, the first 3) elements of the element to be predicted, a history value sequence consistent with the sequence may be searched in the history value table, and the searched history value sequence is used as an index to search for a corresponding predicted value in the predicted value table. And if the historical value sequence consistent with the current sequence is not found, giving an initial predicted value to the current element to be predicted.
More specifically, referring to fig. 4, the numerical prediction based coding method may specifically be a context-based numerical prediction method, where a context may be a history value formed by several adjacent elements. The history value table contains processed data sequences, for example, the data sequences may include data sequences formed by processed consecutive elements in the array to be compressed. The number of data in the data sequence can be set according to needs, and the larger the value is, the more accurate the prediction result is, but the more time is consumed for prediction, for example, the number can be set to 3. Wherein, the numerical value which appears after the data sequence in the historical value table is contained in the predicted value table. When prediction is carried out, firstly, a history value sequence corresponding to an adjacent element of a current element is searched in a history value table; and then searching a predicted value in a predicted value table by a hash equation by taking the historical value sequence as an index.
According to an embodiment, the value in the predictor may be updated when the value of the predicted element is known. Specifically, the real value of the element may be updated into the prediction value table, and the history value table may be updated according to the real value and the old history value of the element.
Fig. 5 is a schematic diagram of another numerical prediction method in an encoding method based on numerical prediction according to an embodiment of the present disclosure.
According to another optional embodiment, the predicting the elements in the first array to obtain the predicted value may specifically include: searching a corresponding historical difference sequence based on the difference sequences of the first elements of the elements to be predicted; obtaining a corresponding prediction difference value based on the historical difference value sequence; and obtaining a predicted value of the element to be predicted based on a previous element of the element to be predicted and the prediction difference value.
Specifically, a difference sequence formed by differences between every two adjacent elements in the first elements can be obtained according to the first elements of the elements to be predicted; searching a historical difference value sequence consistent with the difference value sequence in a historical difference value table; and using the searched historical difference sequence as an index to search a corresponding prediction difference in a prediction difference table. And if the historical difference sequence consistent with the current difference sequence is not found, giving an initial prediction difference value to the current difference value to be predicted.
More specifically, referring to fig. 5, the history value table may store a value D to be predicted n Previous value D of n-1 And a historical difference sequence, the historical difference sequence being a difference between adjacent elements of the processed dataThe sequence of which is constructed. The historical difference sequence may be, for example, a sequence of three differences between each of four consecutive data. The prediction difference table may include differences that occur after the history difference sequence in the history difference table. When making a prediction, for example, it may first be based on the current value D n A difference sequence is searched in a history value table, and a history difference sequence consistent with the difference sequence is searched in the history value table, for example, (delta 1, delta2, delta 3); then, by taking the historical difference sequence as an index, searching a prediction difference, such as dpre, in a prediction difference table through a hash equation; then, the previous value D is added n-1 Adding the prediction difference dpre to obtain a prediction value Pred n =D n-1 +dpre。
According to embodiments, the values in the predictor may be updated after the actual values of the predicted elements are known. Specifically, the real value of the predicted element may be updated into the history value table; the difference between the true value of the element being predicted and the value of the previous element may be calculated as a new difference, and then the new difference is stored in the prediction difference table and the history difference table is updated based on the new difference and the old history difference.
Fig. 4 and fig. 5 respectively describe the processes of numerical prediction by different predictors, and in an application, any of the predictors described in fig. 4 or fig. 5 may be used to obtain the predicted value according to actual needs. According to the embodiments of the present disclosure, in order to improve the accuracy of prediction, improve the prediction rate, and improve the compression rate, etc., more than one predictor may be employed at the same time.
Fig. 6 is a schematic diagram of a numerical prediction-based encoding method using two predictors according to an embodiment of the present disclosure.
Specifically, referring to fig. 6, the encoding the first array by using the coding method based on numerical prediction in step S130 to obtain a first encoding result may specifically include: predicting elements in the first array by adopting a first predictor to obtain a first predicted value; predicting the elements in the first array by adopting a second predictor to obtain a second predicted value; performing XOR operation on the first predicted value and the real value of the predicted element to obtain a first XOR result; performing XOR operation on the second predicted value and the real value of the predicted element to obtain a second XOR result; comparing the number of leading zeros of the first XOR result with the number of leading zeros of the second XOR result, and taking the result with more leading zeros as a final XOR result; and coding the final exclusive or result to obtain a first coding result.
According to an embodiment, the comparing of the first xor result and the second xor result comprises comparing the number of leading zeros and/or the number of trailing zeros in the xor result. For example, the number of leading zeros may be compared and the XOR result with the larger number of leading zeros may be used as the preferred final XOR result. For another example, the number of tail zeros may be compared, and the exclusive or result with the larger number of tail zeros may be used as the preferred final exclusive or result. As another example, the total number of leading zeros and trailing zeros may be compared, and an xor result with a larger sum of the numbers of leading zeros and trailing zeros may be selected, preferably the final xor result. The manner of selecting the preferred exclusive or result is not limited thereto.
Alternatively, the first predictor and the second predictor may be the same or different, for example, the predictors shown in fig. 4 or fig. 5 may be independently selected, respectively, or predictors other than the predictors shown in fig. 4 and fig. 5 may be selected.
In step S140, a specific process of encoding the second array will be described below by taking a variable length encoding method, base 128Varints, as an example.
The Base 128Varints encoding method may use one or more bytes to represent integer data. In a Base 128Varints coding string, a most significant bit (msb) in each byte is used as a flag bit, if the bit is 1, the next byte and the current byte jointly represent a current value, and if the bit is 0, the current byte is the last byte of the current value; the remaining seven bits of the byte will be used to store the data itself, representing a value of less than 2 to the power of 7, i.e., 0 to 127, at most, since only 7 bits are used for encoding. For a 7 th power value less than 2, a2 byte representation may be used; for a 14 power value less than 2, a2 byte representation may be used; for a value of the power of 21 less than 2, a 3-byte representation can be used; for 28 th power values less than 2, a 4 byte representation may be used.
The int32 type array [1,300,1024] is described as an example.
For the decimal number 1, binary (4 bytes) is represented as 00000000 00000000 0000 00000001. After variable length coding, only one byte is occupied to express, for example: 00000001. the number 1 may be represented using 1 byte.
For the decimal number 300, the binary (4 bytes) is represented as 00000000 00000000 00000001 00101100. The encoding process is as follows: (1) Starting from binary low bytes, segmenting every 7 bits to obtain 0000010 0101100; (2) reversing the sequence to obtain 0101100 0000010; and (3) increasing the identifier to obtain the encoding string 10101100 00000010. That is, the number 300 may be represented using 2 bytes. Accordingly, when decoding the encoded string 10101100 00000010: (1) The highest bit of the 1 st byte is 1, the backward reading is continued, and the highest bit of the 2 nd byte is 0, which indicates that the current value is represented by 2 bytes; (2) taking the lower 7 bits of two bytes 0101100 0000010; (3) reverse order, 0000010 0101100, decimal 300.
For the decimal number 1024, binary (4 bytes) is represented as 00000000 00000000 00000100 00000000. The encoding process is as follows: (1) Starting from the binary low byte, dividing every 7 bits to obtain 0001000 0000000; (2) reversing the sequence to obtain 0000000 0001000; and (3) increasing the identifier to obtain the code string 10000000 00001000. The number 1024 may be represented using 2 bytes.
For the initial array [1,300,1024] of int32 type fixed length coding, the number of bytes occupied before coding is 12, and the number of bytes occupied after variable length coding is 5. Variable length coding significantly reduces the memory space occupied by array storage.
It can be seen that the variable length coding compression has a more significant effect on the compression of smaller values. For the embodiment of the present disclosure, the position index value is stored in the second array, and under the condition that the number of elements in the array to be compressed is limited, the numerical range of the position index value is limited and is generally a relatively small numerical value, so that the advantage of variable length coding can be fully utilized for compression, the number of bytes can be very effectively reduced, and a significant compression effect is obtained.
According to the above description, the present disclosure provides a data compression method, in which an original sequence is first converted into two arrays, and different compression methods are selected for compression according to data characteristics of the two arrays, so as to achieve maximum compression of the original sequence and obtain a maximum compression ratio. In the process, a compression method based on numerical prediction is used, and the compression method can be applied to data arranged in any order, however, for an array with more disordered data arrangement, the time complexity and the operation complexity of the process of numerical prediction are higher. In view of this, the present disclosure provides an alternative data compression method, in which the arrays to be encoded are sorted first and then encoded by using a method based on numerical prediction. By adopting the ordered numerical prediction-based compression method, the time complexity and the operation complexity of the numerical prediction process can be reduced, and the compression rate can be further improved.
Fig. 7 is a schematic diagram of a method for generating a first array from an original sequence according to an embodiment of the present disclosure.
In another embodiment of the present disclosure, referring to fig. 7, the step of generating the first array according to the original sequence may specifically include: s111, extracting elements with different values in the original sequence to generate a first prepared array; s112, sorting the elements in the first prepared array according to a preset sorting rule to generate a first array.
Specifically, according to an embodiment, the predetermined sort rule may be set according to various sort rules as needed, such as a sequence of values from large to small, a sequence of values from small to large, a sequence of tail values from large to small, and a sequence of tail values from small to large. For example, sort of the first array may be implemented using a sort function, so that sorting is implemented in order of magnitude or magnitude, and such sorting results in higher order similarity of adjacent values, which may result in more leading zeros being generated when performing an exclusive-or operation. The manner of implementing the sorting is not limited thereto, and the sorting of the first array may be performed using any suitable method in the art.
According to the embodiment, the first array is sorted, so that the relevance of adjacent values in the first array is larger, more similarity of the adjacent values is mined, and when the exclusive-or operation is performed, the number of leading zeros and/or trailing zeros in the exclusive-or result is larger, so that the first array can be compressed to a larger extent, and the number of bytes occupied by the first compressed result is smaller, so that the storage space occupied by compressed data and/or the used transmission time are/is reduced.
According to the embodiment of the disclosure, the data compression method is provided, and the data compression method has an obvious compression effect on the data sequence with any value distribution; moreover, the method can achieve good compression effect on the sequences distributed in any order, namely, can achieve good compression rate on the data sequences distributed out of order/disorder.
To more clearly describe embodiments of the present invention, an example of data compression according to an embodiment is specifically shown below:
original sequence:
[24,22,11,20,11,6,25,33,7,41,26,46,34,47,49,26,26,10,2,39,35,8,43,3,28,29,22,1,5,38,36,42,44,29,24,30,0,15,34,6,14,31,38,34,44,15,17,5,19,41,23,30,37,25,44,27,33,23,11,27,32,3,43,29,18,45,13,4,5,20,11,0,18,3,32,32,31,17,0,30,7,26,47,30,20,46,35,10,19,18,43,11,29,5,6,39,33,31,14,23]
first array (after sorting):
[0,1,2,3,4,5,6,7,8,10,11,13,14,15,17,18,19,20,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46,47,49]
the first encoding result:
[118B,1B,-1B,-1B,-1B,-10B,2B,110B,1B,1B,-17B,1B,-18B,1B,1B,-1B,102B,2B,1B,-1B,-1B,-1B,-1B,-1B,-1B,-1B,-1B,-18B,1B,1B,-1B,-1B,-2B,1B,0B]
a second group of:
[20,18,10,17,10,6,21,29,7,36,22,41,30,42,43,22,22,9,2,35,31,8,38,3,24,25,18,1,5,34,32,37,39,25,20,26,0,13,30,6,12,27,34,30,39,13,14,5,16,36,19,26,33,21,39,23,29,19,10,23,28,3,38,25,15,40,11,4,5,17,10,0,15,3,28,28,27,14,0,26,7,22,42,26,17,41,31,9,16,15,38,10,25,5,6,35,29,27,12,19]
and a second encoding result:
[8B,20B,8B,18B,8B,10B,8B,17B,8B,10B,8B,6B,8B,21B,8B,29B,8B,7B,8B,36B,8B,22B,8B,41B,8B,30B,8B,42B,8B,43B,8B,22B,8B,22B,8B,9B,8B,2B,8B,35B,8B,31B,8B,8B,8B,38B,8B,3B,8B,24B,8B,25B,8B,18B,8B,1B,8B,5B,8B,34B,8B,32B,8B,37B,8B,39B,8B,25B,8B,20B,8B,26B,8B,0B,8B,13B,8B,30B,8B,6B,8B,12B,8B,27B,8B,34B,8B,30B,8B,39B,8B,13B,8B,14B,8B,5B,8B,16B,8B,36B,8B,19B,8B,26B,8B,33B,8B,21B,8B,39B,8B,23B,8B,29B,8B,19B,8B,10B,8B,23B,8B,28B,8B,3B,8B,38B,8B,25B,8B,15B,8B,40B,8B,11B,8B,4B,8B,5B,8B,17B,8B,10B,8B,0B,8B,15B,8B,3B,8B,28B,8B,28B,8B,27B,8B,14B,8B,0B,8B,26B,8B,7B,8B,22B,8B,42B,8B,26B,8B,17B,8B,41B,8B,31B,8B,9B,8B,16B,8B,15B,8B,38B,8B,10B,8B,25B,8B,5B,8B,6B,8B,35B,8B,29B,8B,27B,8B,12B,8B,19B]
the following takes the sorted first array (for example, long type first array) as an example to describe the process of encoding the first array to obtain the first encoding result:
it should be noted that, for the first array to be currently encoded, the xor result that only needs to pay attention to the high 14 bits may be preset, that is, there are at most 14 leading zeros, so that 4 bits may be used to store the number of leading zeros. Since 4 bits are used to place the leading zero encoding result, one byte can place the leading zero encoding result of two digits, and thus two digits are encoded in one group. The encoding process for the data (0, 1) in the first array is described below.
(1) Converting the Long type array into byte arrays [0x00,0x00 ], [0x00, 0x01], \8230;, [ 0x8230 ].
(2) Numerical prediction is performed using two predictors, where the first predictor can use the prediction method as shown in fig. 4, the initial prediction value (pred) n ) Set to 0;the second predictor may employ a prediction method, as shown in fig. 5, an initial prediction value (pred) n ) Is set to 0.
Exclusive-or operation is performed on the first digit [0x00,0x00 ] and the initial prediction value (for example, 0) of the first predictor p1 to obtain an exclusive-or result diff 1d [0x00,0x00 ]; meanwhile, the first digit [0x00,0x00 ] is subjected to exclusive OR operation with the initial prediction value (for example, 0) of the second predictor p2 to obtain an exclusive OR result diff 2d [0x00,0x00 ]. In case the two xor results are identical, the xor result of the first predictor p1 may be selected as the final xor result.
The parameters in the first predictor p1 and the second predictor p2 are updated according to the final exclusive-or result and the true value (here, 0) of the currently predicted value. Specifically, the history value table and the prediction value table in the predictor are updated separately. For example, the index of the prediction value table of the first predictor p1 may be updated in a manner of: ((lastIndex < < 6) > Lambda (update value > > 48)) & (table. Length-1), where "lastIndex" is the last index of the prediction value table, "update value this time" is the input value this time, and "table. Length" is the length of the prediction value table. For example, the index of the prediction value table of the second predictor p2 may be updated in a manner of: ((dfcmHash < < 2) > Lambda ((true value-lastValue) > > 40)) & (table. Length-1), where "dfcmHash" is the index this time, "true value" is the true value of the predicted value this time, "lastValue" is the true value of the last value, and "table. Length" is the length of the prediction value table.
(3) Performing exclusive or operation on the second digit [0x00, 0x01] and the updated predicted value of the first predictor p1 to obtain an exclusive or result diff 1e [0x00, 0x01]; meanwhile, the second digit [0x00, 0x01] is XOR-operated with the initial prediction value of the second predictor p2 to obtain an XOR result diff 2e [0x00, 0x01]. In case the two xor results are identical, the xor result of the first predictor p1 may be selected as the final xor result.
The parameters in the first predictor p1 and the second predictor p2 are updated according to the final exclusive-or result and the true value (here, 1) of the current predicted value.
(4) The exclusive or result diff 1d [0, 0x00] of the first digit 0, the number of leading zeros is 8, and in the data processing, when the number of leading zeros is greater than 4, 1 subtraction processing is performed, and zeroByteSize (0) =7 is written; the exclusive or result of the second digit 1, diff 1e, [ 0] 0, 00,0, 01], has a leading zero number of 7, and zeroByteSize (1) =6. The leading zeros of (0, 1) are encoded, one byte storage is used, diff 1d is left shifted by 4 bits to the upper 4 bits, diff 1e is put to the lower 4 bits, < code | = zeroByteSize (0) < <4|: code | = zeroByteSize (1), i.e., 118B.
(5) The non-leading zero portions of the first digit and the second digit are then stored. Specifically, for the number 0, the non-leading zeros portion need not be stored; for the number 1, 0000 00001, i.e., 1B, is stored.
Thus, the result of the encoding of data (0, 1) is stored as byte array [118B,1B ]. And repeating the steps to obtain a first coding result.
The data compression method according to the embodiment of the present disclosure is provided above, and based on the same idea, the embodiment of the present specification further provides a data decompression method corresponding to the data compression method.
Fig. 8 is a schematic flow chart of a data decompression method according to an embodiment of the present disclosure.
According to an embodiment, referring to fig. 8, a data decompression method includes the steps of:
s210: acquiring compressed data, wherein the compressed data comprises first compressed data and second compressed data with a preset mapping relation;
s220: based on the data characteristics of the first compressed data, decompressing the first compressed data by adopting a numerical prediction-based coding method to obtain a first array;
s230: decompressing the second compressed data according to a preset variable length coding rule based on the data characteristics of the first compressed data to obtain a second array;
s240: and obtaining decompressed data according to the preset mapping relation based on the first array and the second array.
According to an embodiment, the obtaining of the decompressed data according to the preset mapping relationship based on the first array and the second array specifically includes: taking the value of the element in the second array as a position index, and reading the value of the element in the first array; and storing the values of the elements in the first array read each time to obtain a decompressed array.
The data compression method and the data decompression method provided by the present disclosure are described above, and based on the same idea, the embodiments of the present specification further provide a device corresponding to the data compression method and a device corresponding to the data decompression method.
Fig. 9 is a schematic structural diagram of a data compression apparatus according to an embodiment of the present disclosure. As shown in fig. 9, the data compression apparatus may include:
a generating module 310, configured to generate a first array and a second array according to an original sequence, where elements in the first array are different from each other and include all values in the original sequence, and an element in the second array is a position index of an element in the original sequence in the first array;
a first encoding module 320, configured to encode the first array by using a numerical prediction based encoding method to obtain a first encoding result;
a second encoding module 330, configured to encode the second array by using a variable length coding method to obtain a second encoding result;
the storage module 340 is configured to store the first encoding result and the second encoding result according to a preset mapping relationship.
Fig. 10 is a schematic structural diagram of a data decompression device according to an embodiment of the present disclosure. As shown in fig. 10, the data decompression apparatus may include:
an obtaining module 410, configured to obtain compressed data, where the compressed data includes first compressed data and second compressed data having a preset mapping relationship;
a first data decompression module 420, configured to decompress the first compressed data by using a coding method based on numerical prediction based on data characteristics of the first compressed data to obtain a first array;
a second data decompression module 430, configured to decompress the second compressed data according to a preset variable length coding rule based on the data characteristics of the first compressed data, so as to obtain a second group;
and the data generation module 440 is configured to obtain decompressed data according to the preset mapping relationship based on the first array and the second array.
Based on the same idea, the embodiments of this specification further provide a device corresponding to the data compression and decompression method.
Fig. 11 is a schematic structural diagram of an apparatus corresponding to a data compression method and a data decompression method according to an embodiment of the present specification. As shown in fig. 11, a data processing apparatus 500 may include:
at least one processor 510; and the number of the first and second groups,
a memory 530 communicatively coupled to the at least one processor; wherein,
the memory 530 stores instructions 520 executable by the at least one processor 510 to enable the at least one processor 510 to:
generating a first array according to an original sequence, wherein elements in the first array are different from each other and contain all numerical values in the original sequence;
generating a second array according to the original sequence, wherein elements in the second array are position indexes of the elements in the original sequence in the first array;
coding the first array by adopting a coding method based on numerical prediction to obtain a first coding result;
coding the second array by adopting a variable length coding method to obtain a second coding result;
storing the first encoding result and the second encoding result according to a preset mapping relation;
or,
to enable the at least one processor 510 to:
acquiring compressed data, wherein the compressed data comprises first compressed data and second compressed data with a preset mapping relation;
based on the data characteristics of the first compressed data, decompressing the first compressed data by adopting a numerical prediction-based coding method to obtain a first array;
decompressing the second compressed data according to a preset variable length coding rule based on the data characteristics of the first compressed data to obtain a second array;
and obtaining decompressed data according to the preset mapping relation based on the first array and the second array.
While particular embodiments of the present specification have been described above, in some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device and apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to the partial description of the method embodiments for relevant points.
The apparatus, the device, and the method provided in the embodiments of the present specification are corresponding, and therefore, the apparatus and the device also have beneficial technical effects similar to those of the corresponding method, and since the beneficial technical effects of the method have been described in detail above, the beneficial technical effects of the corresponding apparatus and the device are not described again here.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain a corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical blocks. For example, a Programmable Logic Device (PLD) (e.g., a Field Programmable Gate Array (FPGA)) is an integrated circuit whose Logic functions are determined by a user programming the Device. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as ABEL (Advanced Boolean Expression Language), AHDL (alternate Hardware Description Language), traffic, CUPL (core universal Programming Language), HDCal, jhddl (Java Hardware Description Language), lava, lola, HDL, PALASM, rhyd (Hardware Description Language), and vhigh-Language (Hardware Description Language), which is currently used in most popular applications. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, respectively. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (14)

1. A method of data compression, comprising:
generating a first array according to an original sequence, wherein elements in the first array are different from each other and contain all numerical values in the original sequence;
generating a second array according to the original sequence, wherein elements in the second array are position indexes of the elements in the original sequence in the first array;
coding the first array by adopting a coding method based on numerical prediction to obtain a first coding result;
coding the second array by adopting a variable length coding method to obtain a second coding result;
and storing the first coding result and the second coding result according to a preset mapping relation.
2. The method according to claim 1, wherein generating the first array from the original sequence specifically comprises:
extracting elements with different values in the original sequence to generate a first prepared array;
and sequencing the elements in the first prepared array according to a preset sequencing rule to generate a first array.
3. The method of claim 1, wherein the encoding the first array using a numerical prediction based encoding method to obtain a first encoding result comprises:
predicting the elements in the first array by adopting a predictor to obtain a predicted value;
performing XOR operation on the predicted value and the real value of the predicted element to obtain an XOR result;
and coding the XOR result to obtain a first coding result.
4. The method according to claim 3, wherein the predicting the elements in the first array to obtain the predicted values specifically comprises:
searching a corresponding historical value sequence based on a sequence formed by a plurality of previous elements of the elements to be predicted;
and obtaining a corresponding predicted value based on the historical value sequence.
5. The method of claim 3, wherein predicting the elements in the first array to obtain predicted values comprises:
searching a corresponding historical difference sequence based on the difference sequences of the first elements of the elements to be predicted;
obtaining a corresponding prediction difference value based on the historical difference value sequence;
and obtaining a predicted value of the element to be predicted based on a previous element of the element to be predicted and the prediction difference value.
6. The method according to claim 3, wherein the encoding the xor result to obtain a first encoding result specifically comprises:
coding the leading zeros of the XOR result to obtain a first coding section;
determining the part except the leading zero in the exclusive-or result as a second coding segment;
and obtaining a first coding result according to the first coding section and the second coding section.
7. The method according to claim 3, wherein the encoding the xor result to obtain a first encoding result specifically includes:
coding the leading zeros of the XOR result to obtain a first coding section;
coding the tail zero of the XOR result to obtain a second coding section;
determining a part except the leading zero and the tail zero in the exclusive-or result as a third coding segment;
and obtaining a first coding result according to the first coding segment, the second coding segment and the third coding segment.
8. The method of claim 1, wherein the encoding the first array using a numerical prediction based encoding method to obtain a first encoding result comprises:
predicting elements in the first array by adopting a first predictor to obtain a first predicted value;
predicting the elements in the first array by adopting a second predictor to obtain a second predicted value;
performing XOR operation on the first predicted value and a real value of a predicted element to obtain a first XOR result;
performing XOR operation on the second predicted value and the real value of the predicted element to obtain a second XOR result;
comparing the leading zero number of the first exclusive-or result with the leading zero number of the second exclusive-or result, and taking the result with more leading zeros as a final exclusive-or result;
and coding the final exclusive or result to obtain a first coding result.
9. The method according to claim 1, wherein said encoding the second array by using a variable length coding method comprises:
and coding each element in the second array by adopting a variable length coding method based on high order differentiation.
10. A method of data decompression comprising:
acquiring compressed data, wherein the compressed data comprises first compressed data and second compressed data with a preset mapping relation;
based on the data characteristics of the first compressed data, decompressing the first compressed data by adopting a numerical prediction-based coding method to obtain a first array;
decompressing the second compressed data according to a preset variable length coding rule based on the data characteristics of the first compressed data to obtain a second array;
and obtaining decompressed data according to the preset mapping relation based on the first array and the second array.
11. The method according to claim 10, wherein obtaining the decompressed data according to the preset mapping relationship based on the first array and the second array specifically includes:
taking the value of the element in the second array as a position index, and reading the value of the element in the first array;
and storing the values of the elements in the first array read each time to obtain a decompressed array.
12. A data compression apparatus comprising:
a generating module, configured to generate a first array and a second array according to an original sequence, where elements in the first array are different from each other and include all values in the original sequence, and an element in the second array is a position index of an element in the original sequence in the first array;
the first coding module is used for coding the first array by adopting a coding method based on numerical prediction to obtain a first coding result;
the second coding module is used for coding the second array by adopting a variable length coding method to obtain a second coding result;
and the storage module is used for storing the first coding result and the second coding result according to a preset mapping relation.
13. A data decompression apparatus comprising:
the device comprises an acquisition module, a mapping module and a storage module, wherein the acquisition module is used for acquiring compressed data which comprises first compressed data and second compressed data with a preset mapping relation;
the first data decompression module is used for decompressing the first compressed data by adopting a numerical prediction-based coding method based on the data characteristics of the first compressed data to obtain a first array;
the second data decompression module is used for decompressing the second compressed data according to a preset variable length coding rule based on the data characteristics of the first compressed data to obtain a second group;
and the data generation module is used for obtaining decompressed data according to the preset mapping relation based on the first array and the second array.
14. A data processing apparatus comprising:
at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
generating a first array according to an original sequence, wherein elements in the first array are different from each other and contain all numerical values in the original sequence;
generating a second array according to the original sequence, wherein elements in the second array are position indexes of the elements in the original sequence in the first array;
coding the first array by adopting a coding method based on numerical prediction to obtain a first coding result;
coding the second array by adopting a variable length coding method to obtain a second coding result;
storing the first encoding result and the second encoding result according to a preset mapping relation,
or,
to enable the at least one processor to:
acquiring compressed data, wherein the compressed data comprises first compressed data and second compressed data with a preset mapping relation;
decompressing the first compressed data by adopting a numerical prediction-based method based on the data characteristics of the first compressed data to obtain a first array;
decompressing the second compressed data according to a preset variable length coding rule based on the data characteristics of the first compressed data to obtain a second array;
and obtaining decompressed data according to the preset mapping relation based on the first array and the second array.
CN201910380862.4A 2019-05-08 2019-05-08 Data compression and decompression method, device and equipment Active CN110266316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910380862.4A CN110266316B (en) 2019-05-08 2019-05-08 Data compression and decompression method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910380862.4A CN110266316B (en) 2019-05-08 2019-05-08 Data compression and decompression method, device and equipment

Publications (2)

Publication Number Publication Date
CN110266316A CN110266316A (en) 2019-09-20
CN110266316B true CN110266316B (en) 2023-02-21

Family

ID=67914392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910380862.4A Active CN110266316B (en) 2019-05-08 2019-05-08 Data compression and decompression method, device and equipment

Country Status (1)

Country Link
CN (1) CN110266316B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995273B (en) * 2019-10-21 2023-04-07 武汉神库小匠科技有限公司 Data compression method, device, equipment and medium for power database
CN113539264B (en) * 2021-07-13 2023-10-13 广东金鸿星智能科技有限公司 Voice command data transmission method and system for voice-controlled electric door
CN113630124B (en) * 2021-08-10 2023-08-08 优刻得科技股份有限公司 Method, system, equipment and medium for processing time sequence integer data
CN113630125A (en) * 2021-08-11 2021-11-09 深圳市联影高端医疗装备创新研究院 Data compression method, data encoding method, data decompression method, data encoding device, data decompression device, electronic equipment and storage medium
CN113987556B (en) * 2021-12-24 2022-05-10 杭州趣链科技有限公司 Data processing method and device, electronic equipment and storage medium
CN114610952B (en) * 2022-02-28 2023-01-13 广州鼎甲计算机科技有限公司 Effective data indexing method, system, device and storage medium
CN115118788B (en) * 2022-06-29 2023-01-31 北京中科心研科技有限公司 Time sequence data compression method and device, wearable intelligent equipment and storage medium
CN116861271B (en) * 2023-09-05 2023-12-08 智联信通科技股份有限公司 Data analysis processing method based on big data

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6757437B1 (en) * 1994-09-21 2004-06-29 Ricoh Co., Ltd. Compression/decompression using reversible embedded wavelets
WO1997035422A1 (en) * 1996-03-19 1997-09-25 Mitsubishi Denki Kabushiki Kaisha Encoder, decoder and methods used therefor
CN100420308C (en) * 2002-04-26 2008-09-17 株式会社Ntt都科摩 Image encoding device and image decoding device
JP4787100B2 (en) * 2006-07-27 2011-10-05 パナソニック株式会社 Image encoding device
EP2511834A1 (en) * 2011-04-11 2012-10-17 Alcatel Lucent Method of encoding a data identifier
GB2491589B (en) * 2011-06-06 2015-12-16 Canon Kk Method and device for encoding a sequence of images and method and device for decoding a sequence of image
GB2501535A (en) * 2012-04-26 2013-10-30 Sony Corp Chrominance Processing in High Efficiency Video Codecs
US20140029864A1 (en) * 2012-07-30 2014-01-30 Dror Reif Compression encoding and decoding method and apparatus
GB2530312B (en) * 2014-09-19 2016-09-14 Imagination Tech Ltd Data compression
CN106202172B (en) * 2016-06-24 2019-07-30 中国农业银行股份有限公司 Text compression methods and device
CN107547907B (en) * 2016-06-27 2020-02-21 华为技术有限公司 Method and device for coding and decoding

Also Published As

Publication number Publication date
CN110266316A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
CN110266316B (en) Data compression and decompression method, device and equipment
US5300930A (en) Binary encoding method with substantially uniform rate of changing of the binary elements and corresponding method of incrementation and decrementation
US7233266B2 (en) Data compression/decompression device and data compression/decompression method
CN113810057B (en) Method, apparatus and system for semantic value data compression and decompression
Plaisance et al. Vectorized vbyte decoding
CN113572479B (en) Method and system for generating finite state entropy coding table
WO2007002468A2 (en) Modeling for enumerative encoding
KR20190038746A (en) Data encoding method, device and storage medium
KR20190038747A (en) DATA ENCODING METHOD, DEVICE AND STORAGE MEDIUM
Becher et al. Normal numbers and finite automata
US20070126608A1 (en) Efficient decoding of n-tuple variable bit length symbols
Funasaka et al. Adaptive loss‐less data compression method optimized for GPU decompression
KR20190038601A (en) Chen Framework, Chen coding and Chen cord
US7796059B2 (en) Fast approximate dynamic Huffman coding with periodic regeneration and precomputing
CN110120819B (en) Boolean circuit coding method, device and system
Steadman et al. Variable-length constrained sequence codes
JPH11340838A (en) Coder and decoder
Wang et al. A simplified variant of tabled asymmetric numeral systems with a smaller look-up table
CN111431539B (en) Compression method and device for neural network data and computer readable storage medium
US20030052802A1 (en) Method and apparatus for huffman decoding technique
CN115333544B (en) Data decompression circuit and method, chip and electronic equipment thereof
CN118244993B (en) Data storage method, data processing method and device, electronic equipment and medium
CN113364466A (en) Data processing system
US12119847B2 (en) Noniterative entropy coding
CN112685404A (en) Encoding method applied to key tree, decoding method applied to key tree and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

GR01 Patent grant
GR01 Patent grant