CN113572479B - Method and system for generating finite state entropy coding table - Google Patents
Method and system for generating finite state entropy coding table Download PDFInfo
- Publication number
- CN113572479B CN113572479B CN202111107199.4A CN202111107199A CN113572479B CN 113572479 B CN113572479 B CN 113572479B CN 202111107199 A CN202111107199 A CN 202111107199A CN 113572479 B CN113572479 B CN 113572479B
- Authority
- CN
- China
- Prior art keywords
- value
- state
- character
- initial value
- entropy coding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000004044 response Effects 0.000 claims description 27
- 230000001133 acceleration Effects 0.000 abstract description 6
- 238000007906 compression Methods 0.000 description 25
- 230000006835 compression Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 20
- 238000004422 calculation algorithm Methods 0.000 description 19
- 238000012545 processing Methods 0.000 description 11
- 238000004590 computer program Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000013144 data compression Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000013500 data storage Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000006837 decompression Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910002056 binary alloy Inorganic materials 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention provides a method and a system for generating a finite state entropy coding table, wherein the method comprises the following steps: obtaining the occurrence frequency of each character in the data block to be coded; obtaining the state quantity of corresponding columns based on the preset maximum state value and each occurrence frequency, and further forming an empty table of the finite state entropy coding table; obtaining an initial value of each space based on the line number of each space in the empty table and the appearance frequency of the corresponding character, and further obtaining an initial table; traversing the initial table, and judging whether the initial value of the traversed mth row and nth column is greater than the initial value of the m-1 row and nth column; if so, taking the initial value of the nth row of the mth line as a temporary value of the nth row of the mth line, and judging whether the initial value is repeated with the initial value arranged before the temporary value; if not, taking the temporary value as the state value; if the traversal is completed, obtaining all state values, and judging whether the maximum state value exists; if so, a state table of the finite state entropy coding table is generated based on all the state values. The invention realizes the hardware acceleration of the finite state entropy coding.
Description
Technical Field
The invention relates to the technical field of data compression, in particular to a method and a system for generating a finite state entropy coding table.
Background
With the advent of the big data era, in the specific application fields of the internet of things, artificial intelligence and the like, the requirement on low time delay performance of mass data processing is continuously improved, and the lossless data compression technology is more and more important. The lossless data compression principle can be classified into a data statistics-based algorithm and a dictionary-based algorithm. The algorithm based on data statistics includes shannon-fanuo coding, huffman coding, arithmetic coding, run-length coding, Finite State Entropy (FSE), and the like; dictionary-based algorithms include LZ77(Lempel-Ziv77) encoding and LZ78(Lempel-Ziv 78) encoding, among others.
In order to improve the universality, 2 or more compression algorithms are generally adopted in the data compression scheme for hybrid compression. The fast lossless compression algorithm zstd (zstandard) is a hybrid compression algorithm consisting of LZ77 encoding, huffman encoding and FSE. Compared with other compression algorithms (such as Deflate algorithm, Bzip2 algorithm and Brotli algorithm), Zstd has better compression performance. Moreover, the Zstd belongs to an open source algorithm, provides 22 compression levels, is used for balancing the compression speed and the compression rate, and is widely applied to the fields of Linux kernel, FreeBSD operating system, AWS Redshift data warehouse and the like.
The lossless compression technology based on the software implementation mode has the advantages of high flexibility, universality, low cost and the like, but the software execution mode can only be sequentially executed, so that resources are occupied for a long time when a Central Processing Unit (CPU) processes mass data, the compression speed is greatly reduced, and the requirement of a specific application field on real-time compression processing of the mass data is difficult to meet. The hardware implementation is an effective way to solve the problems, and the purposes of improving the transmission speed, the resource utilization rate and the safety can be achieved by benefiting from the inherent parallel processing characteristics of the hardware.
For 3 major components in Zstd: statistical analysis was performed on LZ77, FSE and Huffman with a compression time ratio of about 4:1:1, where FSE ratio, although not large, had a large impact on the performance of Zstd. Because of the use of FSE, Zstd has better compression performance than other hybrid compression algorithms. In addition, hardware acceleration schemes for LZ77 and Huffman coding are mature, but FSE as a novel compression algorithm has precision similar to arithmetic coding and compression speed of Huffman coding, symbol recoding can be accurate to decimal place, and multiplication and division updating states are not needed in calculation. Therefore, the research on the hardware acceleration architecture of the FSE has important significance for realizing the overall acceleration of the Zstd algorithm, and is an effective method for meeting the requirements of specific application fields.
Disclosure of Invention
In view of the above, the present invention provides a method and a system for generating a finite state entropy coding table, so as to solve the problem of the prior art that a hardware acceleration scheme for finite state entropy coding is lacked.
Based on the above object, the present invention provides a method for generating a finite state entropy coding table, comprising the following steps:
obtaining the occurrence frequency of each character in the data block based on the number proportion of each character in the data block to be coded in the data block;
obtaining the state quantity of corresponding columns of each character in the finite state entropy coding table based on a preset maximum state value and the occurrence frequency of each character, and forming a null table of the finite state entropy coding table based on the state quantity of the corresponding columns of each character and the occurrence frequency of each character;
obtaining an initial value of each space based on the line number of each space in the empty table and the appearance frequency of the corresponding character, and obtaining an initial table of the finite state entropy coding table based on the filling of the initial value of each space;
traversing the initial table, and judging whether the initial value of the traversed mth row and nth column is greater than the initial value of the m-1 row and nth column;
in response to the initial value of the nth column in the mth row being greater than the initial value of the nth column in the m-1 th row, taking the initial value of the nth column in the mth row as a temporary value thereof, and judging whether the temporary value of the nth column in the mth row is repeated with the initial value arranged before the temporary value;
in response to the temporary value of the mth row and nth column not being repeated with the initial value arranged before, taking the temporary value of the mth row and nth column as the state value thereof;
responding to the completion of traversal, obtaining all state values of the finite state entropy coding table, and judging whether the maximum state value exists in all the state values;
in response to there being a maximum state value, a state table of the finite state entropy encoding table is generated based on all of the state values.
In some embodiments, the method further comprises:
and in response to the initial value of the nth column in the mth row being less than or equal to the initial value of the nth column in the m-1 row, increasing the initial value of the nth column in the m-1 row by a preset increment value to be used as a temporary value of the nth column in the mth row.
In some embodiments, the method further comprises:
and in response to the repetition of the temporary value of the nth column in the mth row and the initial value arranged before the temporary value, increasing the temporary value of the nth column in the mth row by a preset increment value as an updated temporary value until the updated temporary value is not repeated with the initial value arranged before the temporary value, and taking the updated temporary value as the state value of the nth column in the mth row.
In some embodiments, the method further comprises:
in response to no maximum state value, replacing a maximum value of all state values with a maximum state value to generate a state table of the finite state entropy encoding table.
In some embodiments, obtaining the state number of the corresponding column of each character in the finite state entropy coding table based on the preset maximum state value and the occurrence frequency of each character includes:
and multiplying the preset maximum state value by the occurrence frequency of each character respectively to obtain the state quantity of the corresponding column of each character in the finite state entropy coding table respectively.
In some embodiments, forming the empty table of the finite state entropy coding table based on the number of states of the corresponding column of each character and the frequency of occurrence of each character comprises:
and respectively taking the state quantity of each character as the quantity of the spaces of the corresponding column of the finite state entropy coding table, and arranging the sequence among the columns based on the occurrence frequency of each character to form an empty table of the finite state entropy coding table.
In some embodiments, ranking the order between the columns based on the magnitude of the frequency of occurrence of the characters comprises:
and arranging the corresponding columns from left to right according to the sequence of the appearance frequency of each character from large to small.
In some embodiments, obtaining the initial value of each space based on the number of lines of each space in the empty table and the occurrence frequency of the corresponding character includes:
and rounding the ratio of the number of lines of each space in the empty table to the appearance frequency of the corresponding character to zero to obtain the initial value of each space.
In some embodiments, traversing the initial table comprises:
the initial table is traversed from left to right and top to bottom.
In another aspect of the present invention, a system for generating a finite state entropy coding table is further provided, including:
the occurrence frequency obtaining module is configured to obtain the occurrence frequency of each character in the data block based on the number ratio of each character in the data block to be coded in the data block;
the empty table obtaining module is configured to obtain the state quantity of each character corresponding column in the finite state entropy coding table based on a preset maximum state value and the occurrence frequency of each character, and form an empty table of the finite state entropy coding table based on the state quantity of each character corresponding column and the occurrence frequency of each character;
the initial table obtaining module is configured to obtain an initial value of each space based on the number of lines of each space in the empty table and the occurrence frequency of the corresponding character, and obtain an initial table of the finite state entropy coding table based on filling of the initial value of each space;
the first judgment module is configured for traversing the initial table and judging whether the initial value of the traversed nth row of the mth line is greater than the initial value of the nth row of the m-1 line;
a second judging module, configured to, in response to the initial value of the mth row and nth column being greater than the initial value of the m-1 row and nth column, take the initial value of the mth row and nth column as its temporary value, and judge whether the temporary value of the mth row and nth column overlaps with the initial value arranged before it;
a state value module configured to take the temporary value of the mth row and the nth column as the state value thereof in response to the temporary value of the mth row and the nth column not being repeated with the initial value arranged before;
the third judgment module is configured to respond to the traversal completion, obtain all state values of the finite state entropy coding table, and judge whether the maximum state value exists in all the state values; and
and the state table generating module is used for responding to the maximum state value and generating the state table of the finite state entropy coding table based on all the state values.
In yet another aspect of the present invention, there is also provided a computer readable storage medium storing computer program instructions which, when executed by a processor, implement any one of the methods described above.
In yet another aspect of the present invention, a computer device is provided, which includes a memory and a processor, the memory storing a computer program, the computer program executing any one of the above methods when executed by the processor.
The invention has at least the following beneficial technical effects:
1. the method for generating the finite state entropy coding table has the advantages that the data storage form is simple and convenient, only the form of one-dimensional arrays is used for storage, and the form similar to a Huffman tree binary tree chain table is not used for storage, so that the memory space can be greatly reduced;
2. the method for generating the finite state entropy coding table is suitable for FSE compression and decompression realized by hardware under the Zstd standard, is mainly used for a comparator and an adder, and is simple and convenient to calculate, so that the hardware resource overhead is effectively reduced, and the hardware utilization rate is improved;
3. further improves the compressing and decompressing speed of the Zstd algorithm and meets the increasing requirement of the specific application field on the compression performance.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a diagram illustrating a method for generating a finite state entropy coding table according to an embodiment of the present invention;
FIG. 2 is a diagram of a data block to be encoded according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an initial Table of FSE tables provided in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of an FSE Table calculation traversal process provided in the embodiment of the present invention;
FIG. 5 is a diagram illustrating an FSE Table iterative computation I provided in accordance with an embodiment of the present invention;
FIG. 6 is a diagram illustrating an FSE Table iterative computation II according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a traversal completed FSE Table provided in accordance with an embodiment of the present invention;
FIG. 8 is a diagram illustrating an FSE Table state Table provided in accordance with an embodiment of the present invention;
FIG. 9 is a diagram illustrating a system for generating a finite state entropy encoding table according to an embodiment of the present invention;
FIG. 10 is a diagram of a computer-readable storage medium for implementing a method for generating a finite state entropy encoding table according to an embodiment of the present invention;
fig. 11 is a schematic hardware configuration diagram of a computer device for executing a method for generating a finite state entropy coding table according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two non-identical entities with the same name or different parameters, and it is understood that "first" and "second" are only used for convenience of expression and should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements does not include all of the other steps or elements inherent in the list.
In view of the above objects, a first aspect of the embodiments of the present invention proposes an embodiment of a method for generating a finite state entropy coding table. Fig. 1 is a schematic diagram illustrating an embodiment of a method for generating a finite state entropy coding table according to the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:
step S10, obtaining the frequency of each character in the data block based on the number ratio of each character in the data block to be coded;
step S20, obtaining the state quantity of each character corresponding column in the finite state entropy coding table based on the preset maximum state value and the occurrence frequency of each character, and forming a null table of the finite state entropy coding table based on the state quantity of each character corresponding column and the occurrence frequency of each character;
step S30, obtaining the initial value of each space based on the line number of each space in the empty table and the appearance frequency of the corresponding character, and obtaining the initial table of the finite state entropy coding table based on the filling of the initial value of each space;
step S40, traversing the initial table, and judging whether the initial value of the traversed mth row and nth column is larger than the initial value of the m-1 row and nth column;
step S50, in response to the initial value of the mth row and nth column being greater than the initial value of the m-1 row and nth column, taking the initial value of the mth row and nth column as its temporary value, and determining whether the temporary value of the mth row and nth column overlaps with the initial value arranged before it;
step S60, in response to the temporary value of the mth row and nth column not being repeated with the initial value arranged before, taking the temporary value of the mth row and nth column as the state value thereof;
step S70, responding to the traversal completion, obtaining all state values of the finite state entropy coding table, and judging whether the maximum state value exists in all the state values;
step S80, in response to there being a maximum state value, generates a state table of the finite state entropy encoding table based on all the state values.
In the embodiment of the invention, m is an integer greater than or equal to 2, and n is an integer greater than or equal to 1.
FSE belongs to an entropy coding of cans (table asymmetric digital systems) in the asymmetric digital systems (ANS). The current research is mainly directed to the important components uabs (unified asymmetry metric binary systems) of the tass and ANS, while there is little research on the hardware architecture of the FSE.
The method for generating the finite state entropy coding table has the advantages that the data storage form is simple and convenient, only the form of one-dimensional arrays is used for storage, and the form similar to a Huffman tree binary tree chain table is not needed for storage, so that the memory space can be greatly reduced; in addition, the method for generating the finite state entropy coding table of the embodiment of the invention is suitable for FSE (finite state entropy coding) compression and decompression realized by hardware under the specification standard of Zstd (fast lossless compression algorithm), is mainly used for a comparator and an adder, and is simple and convenient to calculate, thereby effectively reducing the hardware resource overhead and improving the hardware utilization rate; further improves the compressing and decompressing speed of the Zstd algorithm and meets the increasing requirement of the specific application field on the compression performance.
All calculation and storage modes related to the method for generating the finite state entropy coding table can be realized by hardware, the efficiency can be improved for software calculation, and the method is more flexible in application due to various realizable forms. If the method is realized in a hardware mode, the method can become a hardware acceleration technology aiming at network data storage promotion, can accelerate the compression based on limited entropy coding data, and effectively reduces the load of a server CPU (central processing unit). It can be absorbed in data compression and accelerate, and helping hand data center's performance promotes.
In some embodiments, the method further comprises: and in response to the initial value of the nth column in the mth row being less than or equal to the initial value of the nth column in the m-1 row, increasing the initial value of the nth column in the m-1 row by a preset increment value to be used as a temporary value of the nth column in the mth row.
In a preferred embodiment, the preset increment value is an integer greater than 0.
In some embodiments, the method further comprises: and in response to the repetition of the temporary value of the nth column in the mth row and the initial value arranged before the temporary value, increasing the temporary value of the nth column in the mth row by a preset increment value as an updated temporary value until the updated temporary value is not repeated with the initial value arranged before the temporary value, and taking the updated temporary value as the state value of the nth column in the mth row.
In the above embodiment, the values of each traversed space are compared with the values of a row in the same column and queried by the repeated values arranged before, and then corresponding operations are performed based on the comparison result to satisfy corresponding conditions, and corresponding operations are performed based on the query result to satisfy corresponding conditions, and then the next space is traversed.
In some embodiments, the method further comprises: in response to no maximum state value, replacing a maximum value of all state values with a maximum state value to generate a state table of the finite state entropy encoding table.
In some embodiments, obtaining the state number of the corresponding column of each character in the finite state entropy coding table based on the preset maximum state value and the occurrence frequency of each character includes: and multiplying the preset maximum state value by the occurrence frequency of each character respectively to obtain the state quantity of the corresponding column of each character in the finite state entropy coding table respectively.
In some embodiments, forming the empty table of the finite state entropy coding table based on the number of states of the corresponding column of each character and the frequency of occurrence of each character comprises: and respectively taking the state quantity of each character as the quantity of the spaces of the corresponding column of the finite state entropy coding table, and arranging the sequence among the columns based on the occurrence frequency of each character to form an empty table of the finite state entropy coding table.
In some embodiments, ranking the order between the columns based on the magnitude of the frequency of occurrence of the characters comprises: and arranging the corresponding columns from left to right according to the sequence of the appearance frequency of each character from large to small.
In some embodiments, obtaining the initial value of each space based on the number of lines of each space in the empty table and the occurrence frequency of the corresponding character includes: and rounding the ratio of the number of lines of each space in the empty table to the appearance frequency of the corresponding character to zero to obtain the initial value of each space.
In some embodiments, traversing the initial table comprises: the initial table is traversed from left to right and top to bottom.
Specifically, the method for generating the finite state entropy coding table according to an exemplary embodiment of the present invention is as follows:
the first step is as follows: the frequency of symbols in the Text of the data block to be encoded as shown in fig. 2 is counted and sorted, so as to obtain the following table:
TABLE 1 statistical table of character frequencies of data to be compressed and encoded
Character(s) | a | b | c | d | e | f |
Number of occurrences | 5 | 9 | 12 | 13 | 16 | 45 |
Frequency of occurrence | 0.05 | 0.09 | 0.12 | 0.13 | 0.16 | 0.45 |
The second step is that: the encoding and decoding process of FSE (Finite State Entropy coding) needs to use core Table FSE Table, and the primary task of the generation mode is to initially calculate the number of states in each symbol corresponding column in the Finite State Entropy coding Table:
the state number calculation method is the product of the maximum state number and the occurrence frequency of each symbol. The maximum number of states is 31 (in practice, the larger the number of states, the closer the compression effect is to the limit compression ratio).
Namely, the number of states of Symbol x in the FSE Table is:
Numstat_x = (Sumstat·P(Sx)) round
then the number of states in Symbol a column is:
Numstat_a = (Sumstat·P(Sa)) round = (31*0.05) round = (1.55) round=1
the number of corresponding columns of the state tables of other Symbol b, c, d, e and f columns is as follows:
Numstat_b = (Sumstat·P(Sb)) round = (31*0.09) round =2
Numstat_c = (Sumstat·P(Sc)) round = (31*0.12) round =3
Numstat_d = (Sumstat·P(Sd)) round = (31*0.13) round =4
Numstat_e = (Sumstat·P(Se)) round = (31*0.16) round =4
Numstat_f = (Sumstat·P(Sf)) round = (31*0.45) round =13
the third step: and forming an empty Table with determined size and shape according to the state number of the columns of the symbols obtained by calculating the FSE Table and the occurrence frequency of the symbols. And then respectively calculating the numerical value in each line of blank space, wherein the numerical value calculation rule of each frame is as follows:
ValNum_row,x = (Num_row/P(Sf)) round
for the first row, the element in the (1, f) position = 1/0.45 = 2.222 …, rounded to zero by 2; the value at the (1, e) position = 1/0.16 = 6.25, rounded to zero to 6, (1, d) = 1/0.13 = 7.692 …, rounded to zero to 7, … …. The second row (2, f) = 2/0.45 = 4.4444, which is approximately 4, … …, and thus the FSE Table initial Table as shown in fig. 3 is constructed.
The fourth step: the values of the elements in the initial table are adjusted. The entropy coding state table needs to satisfy the following two characteristics: (1) each value in the table is unique (i.e., no duplication is present); (2) each column is ordered by value from small to large. Now, the initial Table is adjusted based on a traversal comparison manner, and the traversal order is from left to right, and from top to bottom, that is, the schematic diagram of the FSE Table calculation traversal process shown in fig. 4.
Suppose that element a is searchedm,nThat is, in the element with table position (m, n), the judgment and processing of the condition one are performed first, and then the judgment and processing of the condition two are performed, wherein the judgment of the condition two should traverse all am,nSearching for the previous element。
Condition one judgment and processing (assuming that the preset increment value is 1):
if ∃ m, n is m.gtoreq.2
Has am,n≤am-1,n
Then am,n = am-1,n+1
And (2) judging and processing under the second condition (assuming that the preset increment value is 1):
This step is applied to the exemplary embodiment, as shown in the schematic diagram of the FSE Table iterative computation I shown in fig. 5, when the f column and the element 6 in the third row are traversed, it is found that it is larger than the element 4 in the previous row, that is, the judgment and processing of the condition two are performed. Find the element 6 of the third row of f columns and the element 6 of the first row of e columns repeat, replace the element of the third row of f columns by 7; then, the element 7 of the first row of the d columns is found to be replaced by 7, and the element of the third row of the f columns is replaced by 8; then, find replace with 8 and repeat with c element 8 of the first row, replace f element of the third row with 9; if it is found that there is no more repetition after the replacement is 9, the element replacement of the third row of the f columns is completed.
As shown in the schematic diagram of the FSE Table iterative computation II shown in fig. 6, when traversing to the fourth row of the f-column, the value of 8 is smaller than the third row 9 of the f-column, and since each column must be in an increasing state, the value of the third row is added with 1 and assigned to the fourth row, that is, the value of the fourth row is replaced with 9+1= 10; and replacing by 10, traversing the previous elements, and finishing the replacement if no superposition occurs.
By traversing all the elements of the operation Table by the method, the traversed FSE Table diagram as shown in FIG. 7 can be obtained, wherein the bold font is the adjusted numerical value.
The fifth step: the maximum state value is arranged in the FSE Table. This operation is to insert the FSE Table into the maximum state value 31 in this example, which must be present in the FSE Table Table. Observing that no 31 exists in the currently generated FSE Table, and the maximum value is 30, at this time, replacing 30 with 31, namely obtaining the schematic diagram of the FSE Table state Table shown in FIG. 8.
In a second aspect of the embodiments of the present invention, a system for generating a finite state entropy coding table is further provided. FIG. 9 is a diagram illustrating an embodiment of a system for generating a finite state entropy encoding table according to the present invention. As shown in fig. 9, a system for generating a finite state entropy encoding table includes: an appearance frequency obtaining module 10 configured to obtain an appearance frequency of each character in the data block based on a ratio of the number of each character in the data block to be encoded; the empty table obtaining module 20 is configured to obtain the state quantity of each column corresponding to each character in the finite state entropy coding table based on a preset maximum state value and the occurrence frequency of each character, and form an empty table of the finite state entropy coding table based on the state quantity of each column corresponding to each character and the occurrence frequency of each character; an initial table obtaining module 30 configured to obtain an initial value of each space based on the number of lines where each space is located in the empty table and the occurrence frequency of the corresponding character, and obtain an initial table of the finite state entropy coding table based on filling of the initial value of each space; the first judging module 40 is configured to traverse the initial table, and judge whether the initial value of the traversed nth row of the mth row is greater than the initial value of the nth row of the m-1 row; a second judging module 50 configured to, in response to the initial value of the mth row and nth column being greater than the initial value of the m-1 row and nth column, take the initial value of the mth row and nth column as its temporary value, and judge whether the temporary value of the mth row and nth column overlaps with the initial value arranged before it; a state value module 60 configured to take the temporary value of the mth row and the nth column as its state value in response to the temporary value of the mth row and the nth column not being repeated with the initial value arranged before it; a third determining module 70, configured to obtain all state values of the finite state entropy coding table in response to the traversal completion, and determine whether there is a maximum state value among all the state values; and a state table generating module 80 configured to generate a state table of the finite state entropy encoding table based on all the state values in response to having the maximum state value.
In a third aspect of the embodiments of the present invention, a computer-readable storage medium is further provided, and fig. 10 is a schematic diagram of a computer-readable storage medium implementing a method for generating a finite-state entropy coding table according to an embodiment of the present invention. As shown in fig. 10, the computer-readable storage medium 3 stores computer program instructions 31. The computer program instructions 31, when executed by a processor, implement the method of any of the embodiments described above.
It is understood that all embodiments, features and advantages set forth above with respect to the method for generating a finite-state entropy coding table according to the present invention are equally applicable, without conflict with each other, to the system for generating a finite-state entropy coding table and to the storage medium according to the present invention.
In a fourth aspect of the embodiments of the present invention, there is further provided a computer device, including a memory 402 and a processor 401 as shown in fig. 11, where the memory 402 stores therein a computer program, and the computer program, when executed by the processor 401, implements the method of any one of the above embodiments.
Fig. 11 is a schematic hardware structure diagram of an embodiment of a computer device for executing the method for generating a finite-state entropy coding table according to the present invention. Taking the computer device shown in fig. 11 as an example, the computer device includes a processor 401 and a memory 402, and may further include: an input device 403 and an output device 404. The processor 401, the memory 402, the input device 403 and the output device 404 may be connected by a bus or other means, and fig. 11 illustrates an example of connection by a bus. The input device 403 may receive input numeric or character information and produce key signal inputs related to user settings and function control of the generation system of the finite state entropy encoding table. The output device 404 may include a display device such as a display screen.
The memory 402, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method for generating a finite-state entropy coding table in the embodiments of the present application. The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of a generation method of the finite state entropy encoding table, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 402 may optionally include memory located remotely from processor 401, which may be connected to local modules via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor 401 executes various functional applications of the server and data processing, namely, the method for generating the finite state entropy coding table of the above method embodiment, by running the nonvolatile software program, instructions and modules stored in the memory 402.
Finally, it should be noted that the computer-readable storage medium (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.
Claims (6)
1. A method for generating a finite state entropy coding table, comprising the steps of:
obtaining the occurrence frequency of each character in a data block to be coded based on the number proportion of each character in the data block;
obtaining the state quantity of corresponding columns of each character in the finite state entropy coding table based on the preset maximum state value and the occurrence frequency of each character, and forming an empty table of the finite state entropy coding table based on the state quantity of corresponding columns of each character and the occurrence frequency of each character, wherein the empty table further comprises:
multiplying the preset maximum state value by the occurrence frequency of each character respectively to obtain the state quantity of the corresponding column of each character in the finite state entropy coding table respectively; and
respectively taking the state quantity of each character as the quantity of spaces of corresponding columns of the finite state entropy coding table, and arranging the sequence among the columns based on the occurrence frequency of each character to form an empty table of the finite state entropy coding table, wherein the arranging the sequence among the columns based on the occurrence frequency of each character comprises: arranging corresponding columns from left to right according to the sequence of the appearance frequency of each character from large to small;
obtaining an initial value of each space based on the number of lines where each space is located in the empty table and the occurrence frequency of the corresponding character, wherein the initial value comprises the following steps: zero-rounding is carried out on the ratio of the number of lines where each space is located in the empty table to the occurrence frequency of the corresponding character to obtain the initial value of each space, and the initial table of the finite state entropy coding table is obtained based on filling of the initial value of each space;
traversing the initial table, and judging whether the initial value of the traversed mth row and nth column is greater than the initial value of the m-1 row and nth column, wherein m is an integer greater than or equal to 2, and n is an integer greater than or equal to 1;
in response to the initial value of the mth row and nth column being greater than the initial value of the m-1 row and nth column, taking the initial value of the mth row and nth column as a temporary value thereof, and determining whether the temporary value of the mth row and nth column is repeated with the initial value arranged before the temporary value;
in response to the temporary value of the mth row and nth column not being repeated with the initial value arranged before, taking the temporary value of the mth row and nth column as its state value;
responding to the completion of traversal, obtaining all state values of the finite state entropy coding table, and judging whether the maximum state value exists in all the state values;
in response to there being the maximum state value, generating a state table of the finite state entropy encoding table based on all of the state values.
2. The method of claim 1, further comprising:
and in response to the initial value of the nth column in the mth row being less than or equal to the initial value of the nth column in the m-1 row, increasing the initial value of the nth column in the m-1 row by a preset increment value to be used as a temporary value of the nth column in the mth row.
3. The method of claim 2, further comprising:
and in response to the repetition of the temporary value of the mth row and the nth column with the initial value arranged before the temporary value, increasing the temporary value of the mth row and the nth column by the preset increment value as an updated temporary value until the updated temporary value is not repeated with the initial value arranged before the temporary value, and taking the updated temporary value as the state value of the mth row and the nth column.
4. The method of claim 1, further comprising:
in response to the absence of the maximum state value, replacing a largest value of the all state values with the maximum state value to generate a state table of the finite state entropy encoding table.
5. The method of claim 1, wherein traversing the initial table comprises:
traversing the initial table from left to right and from top to bottom.
6. A system for generating a finite state entropy coding table, comprising:
the device comprises an appearance frequency obtaining module, a coding module and a decoding module, wherein the appearance frequency obtaining module is configured to obtain the appearance frequency of each character in a data block to be coded based on the number ratio of each character in the data block;
the empty table obtaining module is configured to obtain the state quantity of each column corresponding to each character in the finite state entropy coding table based on a preset maximum state value and the occurrence frequency of each character, and form an empty table of the finite state entropy coding table based on the state quantity of each column corresponding to each character and the occurrence frequency of each character, and further includes:
multiplying the preset maximum state value by the occurrence frequency of each character respectively to obtain the state quantity of the corresponding column of each character in the finite state entropy coding table respectively; and
respectively taking the state quantity of each character as the quantity of spaces of corresponding columns of the finite state entropy coding table, and arranging the sequence among the columns based on the occurrence frequency of each character to form an empty table of the finite state entropy coding table, wherein the arranging the sequence among the columns based on the occurrence frequency of each character comprises: arranging corresponding columns from left to right according to the sequence of the appearance frequency of each character from large to small;
an initial table obtaining module configured to obtain an initial value of each space based on the number of lines where each space is located in the empty table and the occurrence frequency of the corresponding character, the initial table obtaining module including: zero-rounding is carried out on the ratio of the number of lines where each space is located in the empty table to the occurrence frequency of the corresponding character to obtain the initial value of each space, and the initial table of the finite state entropy coding table is obtained based on filling of the initial value of each space;
the first judging module is configured to traverse the initial table and judge whether an initial value of an nth column of an mth row traversed by the initial table is greater than an initial value of an nth column of an m-1 row, wherein m is an integer greater than or equal to 2, and n is an integer greater than or equal to 1;
a second judging module, configured to, in response to the initial value of the mth row and nth column being greater than the initial value of the m-1 row and nth column, take the initial value of the mth row and nth column as a temporary value thereof, and judge whether the temporary value of the mth row and nth column overlaps with the initial value arranged before the temporary value;
a state value module configured to take the temporary value of the mth row and nth column as its state value in response to the temporary value of the mth row and nth column not being repeated with the initial value arranged before it;
a third judging module, configured to obtain all state values of the finite state entropy coding table in response to completion of traversal, and judge whether the maximum state value exists among all the state values; and
and the state table generating module is used for responding to the maximum state value and generating the state table of the finite state entropy coding table based on all the state values.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111107199.4A CN113572479B (en) | 2021-09-22 | 2021-09-22 | Method and system for generating finite state entropy coding table |
PCT/CN2022/074614 WO2023045204A1 (en) | 2021-09-22 | 2022-01-28 | Method and system for generating finite state entropy coding table, medium, and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111107199.4A CN113572479B (en) | 2021-09-22 | 2021-09-22 | Method and system for generating finite state entropy coding table |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113572479A CN113572479A (en) | 2021-10-29 |
CN113572479B true CN113572479B (en) | 2021-12-21 |
Family
ID=78173917
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111107199.4A Active CN113572479B (en) | 2021-09-22 | 2021-09-22 | Method and system for generating finite state entropy coding table |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113572479B (en) |
WO (1) | WO2023045204A1 (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113572479B (en) * | 2021-09-22 | 2021-12-21 | 苏州浪潮智能科技有限公司 | Method and system for generating finite state entropy coding table |
CN114513210B (en) * | 2022-04-20 | 2022-08-02 | 苏州浪潮智能科技有限公司 | State selection method, system, storage medium and device for finite state entropy coding |
CN115441878A (en) * | 2022-08-05 | 2022-12-06 | 海飞科(南京)信息技术有限公司 | FSE code table rapid establishing method for text compression |
US12057864B2 (en) | 2022-09-20 | 2024-08-06 | Hong Kong Applied Science and Technology Research Institute Company Limited | Hardware implementation of frequency table generation for Asymmetric-Numeral-System-based data compression |
WO2024105793A1 (en) * | 2022-11-15 | 2024-05-23 | 株式会社メガチップス | Memory system, decoding circuit, and encoded data generating method |
CN117155405B (en) * | 2023-08-09 | 2024-07-12 | 海飞科(珠海)信息技术有限公司 | Quick tANS coding and decoding conversion table establishment method based on gradient descent |
CN116933734B (en) * | 2023-09-15 | 2023-12-19 | 山东济矿鲁能煤电股份有限公司阳城煤矿 | Intelligent diagnosis method for cutter faults of shield machine |
CN117171399B (en) * | 2023-11-02 | 2024-02-20 | 云图数据科技(郑州)有限公司 | New energy data optimized storage method based on cloud platform |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107565971A (en) * | 2017-09-07 | 2018-01-09 | 华为技术有限公司 | A kind of data compression method and device |
CN110602498A (en) * | 2019-09-20 | 2019-12-20 | 唐驰鹏 | Self-adaptive finite state entropy coding method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101725223B1 (en) * | 2011-03-25 | 2017-04-11 | 삼성전자 주식회사 | Data compressing method of storage device |
US11483009B2 (en) * | 2019-05-08 | 2022-10-25 | Intel Corporation | Self-checking compression |
US11973519B2 (en) * | 2020-06-23 | 2024-04-30 | Intel Corporation | Normalized probability determination for character encoding |
CN111787325B (en) * | 2020-07-03 | 2022-03-08 | 北京博雅慧视智能技术研究院有限公司 | Entropy encoder and encoding method thereof |
CN112953550B (en) * | 2021-03-23 | 2023-01-31 | 上海复佳信息科技有限公司 | Data compression method, electronic device and storage medium |
CN113572479B (en) * | 2021-09-22 | 2021-12-21 | 苏州浪潮智能科技有限公司 | Method and system for generating finite state entropy coding table |
-
2021
- 2021-09-22 CN CN202111107199.4A patent/CN113572479B/en active Active
-
2022
- 2022-01-28 WO PCT/CN2022/074614 patent/WO2023045204A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107565971A (en) * | 2017-09-07 | 2018-01-09 | 华为技术有限公司 | A kind of data compression method and device |
CN110602498A (en) * | 2019-09-20 | 2019-12-20 | 唐驰鹏 | Self-adaptive finite state entropy coding method |
Non-Patent Citations (1)
Title |
---|
有限状态熵编码的硬件加速设计与实现;邢琳;《中国优秀硕士学位论文全文数据库信息科技辑》;20210215(第2期);第23-27页 * |
Also Published As
Publication number | Publication date |
---|---|
WO2023045204A1 (en) | 2023-03-30 |
CN113572479A (en) | 2021-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113572479B (en) | Method and system for generating finite state entropy coding table | |
CN105893337B (en) | Method and apparatus for text compression and decompression | |
CN111008230B (en) | Data storage method, device, computer equipment and storage medium | |
US7688233B2 (en) | Compression for deflate algorithm | |
US9240237B2 (en) | Semiconductor device and method of writing/reading entry address into/from semiconductor device | |
CN108628898B (en) | Method, device and equipment for data storage | |
Ledwon et al. | High-throughput FPGA-based hardware accelerators for deflate compression and decompression using high-level synthesis | |
US10756758B1 (en) | Length-limited huffman encoding | |
CN110266316A (en) | A kind of data compression, decompressing method, device and equipment | |
CN113300715B (en) | Data processing method, device, hardware compression equipment and medium | |
US20230318621A1 (en) | Compression And Decompression In Hardware For Data Processing | |
CN110825323A (en) | Storage and reading method of floating point number data and computer readable storage medium | |
CN113630125A (en) | Data compression method, data encoding method, data decompression method, data encoding device, data decompression device, electronic equipment and storage medium | |
US20190379393A1 (en) | Dynamic dictionary-based data symbol encoding | |
Choi et al. | Design of FPGA-based LZ77 compressor with runtime configurable compression ratio and throughput | |
CN113824449B (en) | Static Huffman parallel coding method, system, storage medium and equipment | |
Shcherbakov et al. | A parallel adaptive range coding compressor: algorithm, FPGA prototype, evaluation | |
CN117155405B (en) | Quick tANS coding and decoding conversion table establishment method based on gradient descent | |
Yuan et al. | Parallel implementation of lossy data compression for temporal data sets | |
CN107623524B (en) | Hardware-based Huffman coding method and system | |
Wang et al. | A simplified variant of tabled asymmetric numeral systems with a smaller look-up table | |
CN107729577B (en) | Data searching method based on multidimensional hash table, terminal equipment and storage medium | |
CN113381769B (en) | Decoder based on FPGA | |
CN115811317A (en) | Stream processing method and system based on self-adaptive non-decompression direct calculation | |
CN115115044A (en) | Configurable sparse convolution hardware acceleration method and system based on channel fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |