CN114640357B - Data encoding method, apparatus and storage medium - Google Patents
Data encoding method, apparatus and storage medium Download PDFInfo
- Publication number
- CN114640357B CN114640357B CN202210541766.5A CN202210541766A CN114640357B CN 114640357 B CN114640357 B CN 114640357B CN 202210541766 A CN202210541766 A CN 202210541766A CN 114640357 B CN114640357 B CN 114640357B
- Authority
- CN
- China
- Prior art keywords
- huffman
- huffman tree
- tree
- data
- trees
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/60—General implementation details not specific to a particular type of compression
- H03M7/6047—Power optimization with respect to the encoder, decoder, storage or transmission
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The application relates to the field of data coding, and particularly discloses a data coding method, equipment and a storage medium, wherein the method comprises the following specific steps: acquiring a first Huffman tree set corresponding to data to be encoded, wherein the first Huffman tree set comprises a plurality of Huffman trees; sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences; performing radix sequencing on the multiple code word length sequences, and performing de-duplication processing on multiple Huffman trees of the first Huffman tree set according to the sequencing result of the radix sequencing to obtain a second Huffman tree set; calculating a first code rate increment and a second code rate increment respectively corresponding to two adjacent Huffman trees in the second Huffman tree set, and reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set; and determining a Huffman tree for data coding of the data to be coded according to the third Huffman tree set. Based on the method, the operation resource of the data encoding process can be reduced.
Description
Technical Field
The present application relates to the field of data encoding, and in particular, to a data encoding method, device, and storage medium.
Background
Currently, Arithmetic Coding (arithmetric Coding) and Huffman Coding (Huffman Coding) are the two main Entropy Coding (Entropy Coding) methods. The arithmetic coding has the advantages of being closer to the optimal entropy, having higher compression efficiency than the Huffman coding and being more widely applied. However, the arithmetic coding algorithm is complex and is not suitable for scenes (such as high-definition real-time software decoding) with limited computing resources or strict requirements on power consumption. The complexity of the Huffman coding is far lower than that of arithmetic coding, and the Huffman coding has great application value in scenes with limited resources or strict requirements on power consumption.
Generally, in the process of using huffman coding, the probability distribution of the data to be coded is updated first, and then the huffman tree is updated according to the updated probability distribution. The process requires continuous online creation of the huffman tree, is a complex and frequent operation, and still occupies a large amount of computing resources and storage resources. Therefore, it is necessary to reduce the complexity of huffman coding, so that it can be applied in the scenario where resources are limited or power consumption is critical.
Disclosure of Invention
The application provides a data encoding method, a device and a storage medium, which are used for encoding data with low complexity, so that high-efficiency data encoding can be realized under the condition of resource limitation, the consumption of computing resources and storage resources is reduced, and the operation cost in the data encoding process is reduced.
In a first aspect, the present application provides a data encoding method, the method comprising: acquiring a first Huffman tree set corresponding to data to be encoded, wherein the first Huffman tree set comprises a plurality of Huffman trees; sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences; performing radix sorting on the multiple code word length sequences, and performing de-duplication processing on the multiple huffman trees of the first huffman tree set according to a sorting result of the radix sorting to obtain a second huffman tree set, wherein the de-duplication processing includes reserving any one huffman tree with the same code word length sequence; according to the sorting result, calculating a first code rate increment and a second code rate increment respectively corresponding to two adjacent Huffman trees in the second Huffman tree set, and reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set; and determining the Huffman tree of the data to be coded for data coding according to the third Huffman tree set.
In a second aspect, the present application provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement any one of the data encoding methods provided in the embodiments of the present application when executing the computer program.
In a third aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement a data encoding method as any one provided in embodiments of the present application.
The application discloses a data encoding method, a device and a storage medium, wherein the method comprises the following steps: acquiring a first Huffman tree set corresponding to data to be encoded, wherein the first Huffman tree set comprises a plurality of Huffman trees; sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences; performing radix sequencing on the multiple code word length sequences, and performing de-duplication processing on the multiple Huffman trees of the first Huffman tree set according to the sequencing result of the radix sequencing to obtain a second Huffman tree set, wherein the de-duplication processing comprises the step of reserving any one Huffman tree with the same code word length sequence; according to the sequencing result, calculating a first code rate increment and a second code rate increment respectively corresponding to two adjacent Huffman trees in the second Huffman tree set, and reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set; and determining a Huffman tree for data coding of the data to be coded according to the third Huffman tree set. According to the technical scheme, repeated items are removed through the length of the code word, similar items are combined in a low-loss mode, the complexity of the Huffman coding algorithm is reduced, the consumption of computing resources and storage resources can be reduced, and high-efficiency data coding is achieved under the condition that the resources are limited or the requirement on power consumption is strict.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of a data encoding scenario provided in an embodiment of the present application;
fig. 2 is a schematic flow chart of a data encoding method provided in an embodiment of the present application;
FIG. 3 is a diagram illustrating a Huffman tree generation process according to an embodiment of the present disclosure;
fig. 4 is a schematic block diagram of a structure of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
In order to realize efficient data encoding under the condition of resource limitation and reduce the consumption of computing resources and storage resources, the application provides a data encoding method, a device, equipment and a storage medium.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating a scene of data encoding according to an embodiment of the present application. As shown in fig. 1, the method may be applied to a terminal, and in particular, to a terminal installed with an image compression application program, where the terminal is configured to acquire an image uploaded by a user, encode and compress the image uploaded by the user, and transmit the compressed image to a server for storage through a network. The server is used for storing the image sent by the terminal. The terminal is in communication connection with the server through a network.
The server may be an independent server, a server cluster, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The terminal can be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and a wearable device.
It should be noted that, in the embodiment of the present application, related data may be acquired and processed based on an artificial intelligence technique, for example, a first huffman tree set corresponding to data to be encoded is acquired. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Referring to fig. 2, fig. 2 is a schematic flowchart of a data encoding method according to an embodiment of the present disclosure. The data coding method is used for realizing high-efficiency data coding under the condition of resource limitation, reducing the consumption of computing resources and storage resources and reducing the operation cost in the data coding process.
As shown in fig. 2, the data encoding method specifically includes: step S101 to step S105.
S101, a first Huffman tree set corresponding to data to be coded is obtained, and the first Huffman tree set comprises a plurality of Huffman trees.
Specifically, the data to be encoded is acquired, the data to be encoded is scanned, a plurality of character types included in the data to be encoded are identified, and a huffman tree set is created in advance according to the character types, wherein the huffman tree set comprises a huffman tree corresponding to the character type. Carrying out symmetry comparison on any two Huffman trees in the Huffman tree set; and if the comparison result of the symmetry comparison shows that the two Huffman trees have symmetry, deleting any Huffman tree, and keeping the Huffman trees without symmetry to obtain a first Huffman tree set.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a huffman tree generation process. As shown in fig. 3, the character types to be encoded include A, B, C and D, each character type has its corresponding probability value (0.35, 0.4, 0.15, and 0.1), and according to the 4 character types and their corresponding probability values, a plurality of different huffman trees can be created, each huffman tree includes a root node and a leaf node, where each leaf node corresponds to one character type, each character type has a corresponding code word, the code word is a binary path from the root node to the leaf node (0/1 indicates left/right), and the number of bits of a binary number included in the code word is the length of the code word.
It should be noted that given N weights as N leaf nodes, a binary Tree is constructed, and if the weighted path length of the Tree reaches the minimum, such binary Tree is called an optimal binary Tree, also called a Huffman Tree (Huffman Tree). The huffman tree is the tree with the shortest path length and the node with the larger weight is closer to the root.
In some embodiments, the huffman tree includes a left sub-tree and a right sub-tree, for example, a root node of the huffman tree may be used as a partition point, resulting in the left sub-tree and the right sub-tree of the huffman tree, and the root node is the huffman tree. Exchanging the positions of the left subtree and the right subtree of any Hoffman tree to obtain a contrast Hoffman tree; if the control huffman tree is identical to another huffman tree, then the two huffman trees are determined to have symmetry.
Illustratively, let the set of all possible Huffman trees of the N character types be,Is the ith Huffman tree. It is obvious thatIs a set of all N leaf node binary trees. If n is<All of NIs already constructed, then
WhereinRepresents: fromTaking a binary tree as left sub-tree, and extractingA binary tree is taken as a right subtree to form all combinations of trees with N leaf nodes. If it is usedTo representNumber of middle Huffman trees (obviously)=1), then:
as can be seen from the formula, the,as N appears exponentially, rapidly increases, with N =16,= 9694845. Obviously, this takes up enormous computational and memory resources, and therefore, in order to reduce the burden on the computing system, it is necessary to reduceThe number of medium huffman trees.
Due to the fact thatAndthe huffman trees (only the left subtree and the right subtree are exchanged) are duplicated, so that only one huffman tree needs to be reserved. After the removal of symmetry repeatability, there are:
it is calculated that, when N =16,=11813, it can be seen that the effect of such deduplication is significant, and deleteAnd a large number of repeated Huffman trees are adopted, so that the consumption of storage resources is reduced. However, this method of removing the weight is not sufficient, and when N is largeThere are still a lot of repetitions, so the above-mentioned set of deduplicated huffman trees needs to be deduplicated again.
S102, sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences.
Specifically, the codeword length of each huffman tree is obtained, and the codeword lengths are sorted in an increasing order to generate a plurality of codeword length sequences, where each huffman tree corresponds to one codeword length sequence.
Illustratively, a huffman tree with N =8 has a total of 8 codeword lengths, which are 1,2, 4, 3, 6, 5, 7, and 7, respectively, and the codeword length sequence obtained after increasing sorting is {1,2,3,4,5,6,7,7 }.
S103, conducting base sequencing on the multiple code word length sequences, and conducting de-duplication processing on the multiple Huffman trees of the first Huffman tree set according to the sequencing result of the base sequencing to obtain a second Huffman tree set, wherein the de-duplication processing includes the step of reserving any one Huffman tree with the same code word length sequence.
Specifically, radix sorting (radix sort) is performed on all huffman trees in the first huffman tree set according to the code length sequences after increasing rearrangement, the repeated code length sequences of the huffman trees are arranged together, if the code length sequences of a plurality of huffman trees are the same, only one huffman tree needs to be reserved, and after the repeated items are deleted, the second huffman tree set is obtained.
It should be noted that, after symmetric de-weighting, when N is large,the number of (a) is still large. To further reduce complexity, the code word length sequence of the huffman tree in the first huffman coding is base ordered (radix sort) and the repeated trees are arranged together.
In some embodiments of the present invention, the,the probability distribution of the probability value corresponding to the character type after degressive rearrangement is,Is of a length of incremental rearrangementThen, thenCode rate ofComprises the following steps:
according to the calculation formula of the coding rate, if the code word lengths of the two huffman trees are the same according to the code word length sequence obtained by incremental rearrangement, the coding efficiency of the two huffman trees is the same, and therefore only one huffman tree needs to be reserved. Thus, a large number of repeated items can be removed, and under the condition of not losing the code rate, Huffman trees with different coding code rates are reserved.
In some embodiments, the first huffman tree set may also be a huffman tree set without symmetric deduplication. If the huffman trees have symmetry, their code word length sequences will be the same, so after sorting them by base, they will be arranged together, and in step S103, the repeated items between them can be removed.
Thus, when the number of the character types N is small, the complexity is controlled within a reasonable range by the lossless deduplication technology on the premise of not influencing compression efficiency. But when the number of N is relatively large,the number of the medium huffman trees is still larger, and the reduction is still needed in the scene that the resource is limited or the requirement on power consumption is harshNumber of intermediate Huffman treesAnd further reducing the occupation of computing resources and storage resources.
S104, according to the sequencing result, calculating a first code rate increment and a second code rate increment respectively corresponding to two adjacent Huffman trees in the second Huffman tree set, and reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set.
Specifically, according to the sorting result, taking every two adjacent huffman trees in the second huffman tree combination set as a control group, wherein the control group comprises a first huffman tree and a second huffman tree; coding the probability distribution corresponding to the first Huffman tree according to the second Huffman tree, wherein the increment of the generated coding code rate is the increment of the first code rate; and coding the probability distribution corresponding to the second Huffman tree according to the first Huffman tree, wherein the increment of the generated coding code rate is the increment of the second code rate.
In an exemplary manner, the first and second electrodes are,the code word length of is in an increasing rearranged sequence ofThe corresponding optimal probability distribution is:
if the probability distribution of the ith huffman tree is coded by the code word of the jth huffman tree, the coding rate will be increased, and the formula of the code rate increment is as follows:
is composed ofUse of probability distributionThe code word is encoded to generate a code rate increment. On the contrary, in the case of a single-layer structure,is composed ofUsing probability distribution ofThe code word is encoded to generate a code rate increment.Andmay be different in size.Andthe similarity of the huffman tree is reflected. Due to the fact thatIs a collection obtained by base sorting according to the code word length sequence, soThe inner huffman trees are basically ranked according to similarity, when the ith tree is adjacent to the jth tree,has the smallest value (i.e., i = j-1 or i = j + 1).
In some embodiments, the two adjacent trees may be the (i +2 a) th and (i +1+2 a) th trees, where a =0, 1,2,3, ….
In some embodiments, a magnitude value of a huffman tree in the third set of huffman trees is obtained; and if the quantity value is larger than the preset value, continuously deleting the Huffman trees of the third Huffman tree set according to the first code rate increment and the second code rate increment until the quantity value is smaller than or equal to the preset value.
Illustratively, the code rate increment generated by mutual permutation between two adjacent huffman trees is calculated:and(ii) a Will be provided withAndis set as the minimum value ofWhereinCorresponding Huffman tree isDeletion of. For example,is less than,The corresponding Huffman tree isDeletion of. Repeating the above two steps untilIs less than the predetermined value. In this way, the number of huffman tree sets can be further compressed under the condition of less coding rate loss, so as to adapt to the use scene with limited resources or strict requirements on power consumption.
In some embodiments, since encoding and decoding are usually implemented by looking up the encoding table, the longest codeword length in the encoding table determines the length of the encoding table, and in order to limit the length of the encoding table, the longest codeword length can be limited. In this case, the huffman tree having the longest codeword length exceeding the preset value is deleted.
By the above technical means, the number of the steps can be further reducedThe number of the Hoffman trees in the method can also control the loss of compression performance to be extremely small, and the requirements of computing resources and storage resources are reduced.
And S105, determining a Huffman tree for data coding of the data to be coded according to the third Huffman tree set.
Specifically, the coding rate of each huffman tree in the third huffman tree set is calculated, and a fourth huffman tree set is determined according to the coding rate; and determining the Huffman tree of the data to be coded for data coding from the third Huffman tree set or the fourth Huffman tree set.
For example, the coding rate of each huffman tree in the third huffman tree set is calculated, the coding rates of each huffman tree are incrementally sorted, the huffman tree corresponding to the coding rate sorted in the top is determined as the fourth huffman tree set, and for example, the fourth huffman tree set is generated according to the huffman tree corresponding to the coding rate sorted in the top eight.
In some embodiments, only the fourth set of huffman trees may be reserved due to limited computational and memory resources of the usage scenario. Adaptively, the number of huffman trees in the fourth huffman tree set can also be reduced, for example, only the huffman trees with the first three coding rate orders in the fourth huffman tree set are reserved, so as to adapt to the requirement of the extreme use scenario.
In some embodiments, a change granularity of data to be encoded is obtained; if the change granularity is smaller than the preset granularity, determining a Huffman tree for data coding of the data to be coded from the fourth Huffman tree set; and if the change granularity is larger than or equal to the preset granularity, determining the Huffman tree of the data to be coded for data coding from the third Huffman tree set.
In some embodiments, the granularity of the unit for changing the data to be encoded is obtained, and whether a local update or a global update is required is determined according to the granularity, where the local update is a huffman tree determined from the fourth huffman tree set that the data to be encoded performs data encoding, and the global update is a huffman tree determined from the fourth huffman tree set that the data to be encoded performs data encoding.
Illustratively, the data to be encoded is image data, and the granularity of the change unit of the image data includes: CU units and CTU units, CTU units being much larger than CU units. When the change unit is a CU unit, since the granularity of the change unit is small, and thus the change in the probability distribution is small, the encoding code rate of the change unit only needs to be calculated according to the huffman tree in the fourth huffman tree set, and the optimal huffman tree is determined from the fourth huffman tree set.
When the change unit is a CTU unit, since the data update reaches a large scale, the probability distribution changes greatly, and the change unit may eventually deviate from the global optimum as the deviation becomes farther, it is necessary to perform global calculation on the coding rate of the change unit once, calculate the coding rate of the change unit according to the huffman trees in the third huffman tree set, reacquire the globally optimum huffman tree, and reacquire the globally optimum fourth huffman tree set.
It should be noted that the CTU is a processing unit of h.265/High Efficiency Video Coding (HEVC). The CTU Unit (Coding Tree Unit) is a Coding Tree Unit, and the CU Unit (Coding Unit) is a Coding Unit. The Code Tree Unit (CTU) may comprise one Coding Unit (CU) or be divided into a plurality of smaller units.
Based on the data coding provided by the embodiment of the application, the high-efficiency data coding can be realized under the condition of resource limitation, the consumption of computing resources and storage resources is reduced, and the operation cost in the data coding process is reduced.
Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.
Referring to fig. 4, the computer device includes a processor, a memory and a network interface connected by a system bus, wherein the memory may include a storage medium and an internal memory.
The storage medium may store an operating system and a computer program. The computer program comprises program instructions, which when executed, can make a processor execute any one of the data encoding methods provided by the embodiments of the present application.
The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.
The internal memory provides an environment for the execution of a computer program on a storage medium, which when executed by a processor causes the processor to perform any of the data encoding methods. The storage medium may be non-volatile or volatile.
The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Illustratively, in one embodiment, the processor is configured to execute a computer program stored in the memory to perform the steps of: acquiring a first Huffman tree set corresponding to data to be encoded, wherein the first Huffman tree set comprises a plurality of Huffman trees; sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences; performing radix sequencing on the multiple code word length sequences, and performing de-duplication processing on the multiple Huffman trees of the first Huffman tree set according to the sequencing result of the radix sequencing to obtain a second Huffman tree set, wherein the de-duplication processing comprises the step of reserving any one Huffman tree with the same code word length sequence; according to the sequencing result, calculating a first code rate increment and a second code rate increment respectively corresponding to two adjacent Huffman trees in the second Huffman tree set, and reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set; and determining a Huffman tree for data coding of the data to be coded according to the third Huffman tree set.
In some embodiments, before the obtaining of the first huffman tree set corresponding to the data to be encoded is implemented, the processor is further specifically configured to implement: acquiring a Huffman tree set corresponding to data to be encoded; carrying out symmetry comparison on any two Huffman trees in the Huffman tree set; and if the comparison result of the symmetry comparison shows that the two Huffman trees have symmetry, deleting any one Huffman tree to obtain a first Huffman tree set.
In some embodiments, the processor, when implementing symmetry comparison between any two huffman trees, is further specifically configured to implement: exchanging the positions of the left sub-tree and the right sub-tree of any Hoffman tree to obtain a contrast Hoffman tree; if the control huffman tree is identical to another huffman tree, then the two huffman trees are determined to have symmetry.
In some embodiments, when the processor implements sorting of the codeword lengths of the huffman tree to obtain a plurality of codeword length sequences, the processor is further specifically configured to implement: and carrying out ascending sequencing on the code word length of each Hoffman tree to generate a plurality of code word length sequences.
In some embodiments, when the processor calculates, according to the sorting result, a first code rate increment and a second code rate increment respectively corresponding to two adjacent huffman trees in the second huffman tree set, the processor is further specifically configured to implement: taking every two adjacent huffman trees in the second huffman tree set as a comparison group according to the sorting result, wherein the comparison group comprises a first huffman tree and a second huffman tree; coding the probability distribution corresponding to the first Huffman tree according to the second Huffman tree, wherein the increment of the generated coding code rate is the increment of the first code rate; and coding the probability distribution corresponding to the second Huffman tree according to the first Huffman tree, wherein the increment of the generated coding code rate is the increment of the second code rate.
In some embodiments, the processor is configured to execute the computer program stored in the memory, and is further specifically configured to: acquiring the quantity value of the Huffman trees in the third Huffman tree set; and if the quantity value is larger than the preset value, continuously deleting the Huffman trees of the third Huffman tree set according to the first code rate increment and the second code rate increment until the quantity value is smaller than or equal to the preset value.
In some embodiments, when implementing the determining of the huffman tree to be data-encoded from the third set of huffman trees for data encoding, the processor is further specifically configured to implement: calculating the coding rate of each Huffman tree in the third Huffman tree set, and determining a fourth Huffman tree set according to the coding rate; and determining the Huffman tree of the data to be coded for data coding from the third Huffman tree set or the fourth Huffman tree set.
In some embodiments, when the processor determines, from the third huffman tree set or the fourth huffman tree set, a huffman tree to be data-encoded for the data encoding, the processor is further specifically configured to: acquiring change granularity of data to be coded; if the change granularity is smaller than the preset granularity, determining a Huffman tree for data coding of the data to be coded from the fourth Huffman tree set; and if the change granularity is larger than or equal to the preset granularity, determining the Huffman tree of the data to be coded for data coding from the third Huffman tree set.
Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and the processor executes the program instructions to implement any one of the data encoding methods provided in the embodiments of the present application.
The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (9)
1. A method of encoding data, the method comprising:
acquiring a first Huffman tree set corresponding to data to be encoded, wherein the first Huffman tree set comprises a plurality of Huffman trees;
sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences;
performing radix sorting on the multiple code word length sequences, and performing de-duplication processing on the multiple huffman trees of the first huffman tree set according to a sorting result of the radix sorting to obtain a second huffman tree set, wherein the de-duplication processing includes reserving any one huffman tree with the same code word length sequence;
according to the sorting result, taking every two adjacent Huffman trees in the second Huffman tree set as a control group, wherein the control group comprises a first Huffman tree and a second Huffman tree; coding the probability distribution corresponding to the first Huffman tree according to the second Huffman tree, wherein the increment of the generated coding code rate is a first code rate increment; coding the probability distribution corresponding to the second Huffman tree according to the first Huffman tree, wherein the increment of the generated coding code rate is a second code rate increment; reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set;
and determining the Huffman tree of the data to be coded for data coding according to the third Huffman tree set.
2. The method according to claim 1, further comprising, before said obtaining a first huffman tree set corresponding to data to be encoded:
acquiring a Huffman tree set corresponding to data to be encoded;
carrying out symmetry comparison on any two Huffman trees in the Huffman tree set;
and if the comparison result of the symmetry comparison indicates that the two Huffman trees have symmetry, deleting any one Huffman tree to obtain the first Huffman tree union.
3. The method of claim 2, wherein the huffman tree comprises a left sub-tree and a right sub-tree, and wherein the symmetrically comparing any two huffman trees in the huffman tree set comprises:
exchanging the positions of the left sub-tree and the right sub-tree of any one Hoffman tree to obtain a contrast Hoffman tree;
determining that two of said Huffman trees have symmetry if said control Huffman tree is identical to another of said Huffman trees.
4. The method of claim 1, wherein the sorting the codeword lengths of the Huffman tree to obtain a plurality of codeword length sequences comprises:
and carrying out ascending sequencing on the code word length of each Huffman tree to generate a plurality of code word length sequences.
5. The method of claim 1, further comprising:
obtaining a quantity value of the Huffman trees in the third Huffman tree set;
and if the quantity value is larger than a preset value, continuing deleting the Huffman trees of the third Huffman tree set according to the first code rate increment and the second code rate increment until the quantity value is smaller than or equal to a preset value.
6. The method according to claim 1, wherein said determining the huffman tree for data encoding of the data to be encoded according to the third set of huffman trees comprises:
calculating the coding rate of each Huffman tree in the third Huffman tree set, and determining a fourth Huffman tree set according to the coding rate;
determining the Huffman tree of the data to be encoded for data encoding from the third set of Huffman trees or the fourth set of Huffman trees.
7. The method according to claim 6, wherein said determining the Huffman tree from the third set of Huffman trees or the fourth set of Huffman trees for data encoding of the data to be encoded comprises:
acquiring the change granularity of the data to be coded;
if the change granularity is smaller than the preset granularity, determining the Huffman tree of the data to be coded for data coding from a fourth Huffman tree set;
and if the change granularity is larger than or equal to the preset granularity, determining the Huffman tree of the data to be coded for data coding from a third Huffman tree set.
8. A computer device, wherein the computer device comprises a memory and a processor;
the memory is used for storing a computer program;
the processor for executing the computer program and implementing the data encoding method of any one of claims 1 to 7 when executing the computer program.
9. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, causes the processor to carry out a data encoding method as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210541766.5A CN114640357B (en) | 2022-05-19 | 2022-05-19 | Data encoding method, apparatus and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210541766.5A CN114640357B (en) | 2022-05-19 | 2022-05-19 | Data encoding method, apparatus and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114640357A CN114640357A (en) | 2022-06-17 |
CN114640357B true CN114640357B (en) | 2022-09-27 |
Family
ID=81953155
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210541766.5A Active CN114640357B (en) | 2022-05-19 | 2022-05-19 | Data encoding method, apparatus and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114640357B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105656604A (en) * | 2016-01-21 | 2016-06-08 | 北京邮电大学 | Bit interleaved polar code modulation method and apparatus |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2530012A (en) * | 2014-08-05 | 2016-03-16 | Illumina Cambridge Ltd | Methods and systems for data analysis and compression |
CN107483059B (en) * | 2017-07-31 | 2020-06-12 | 广东工业大学 | Multi-channel data coding and decoding method and device based on dynamic Huffman tree |
US10693493B1 (en) * | 2019-02-14 | 2020-06-23 | International Business Machines Corporation | Reducing latch count to save hardware area for dynamic Huffman table generation |
CN110868223B (en) * | 2019-12-06 | 2023-10-27 | 广东海洋大学 | Numerical operation implementation method and circuit for Huffman coding |
-
2022
- 2022-05-19 CN CN202210541766.5A patent/CN114640357B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105656604A (en) * | 2016-01-21 | 2016-06-08 | 北京邮电大学 | Bit interleaved polar code modulation method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN114640357A (en) | 2022-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11722148B2 (en) | Systems and methods of data compression | |
CN107291935B (en) | Spark and Huffman coding based CPIR-V nearest neighbor privacy protection query method | |
WO2023159820A1 (en) | Image compression method, image decompression method, and apparatuses | |
CN112463784A (en) | Data deduplication method, device, equipment and computer readable storage medium | |
CN112672168A (en) | Point cloud compression method and device based on graph convolution | |
CN117914951A (en) | Communication data transmission compression method and system | |
CN114640356A (en) | Big data compression method, system and storage medium based on neural network | |
CN114222129A (en) | Image compression encoding method, image compression encoding device, computer equipment and storage medium | |
CN111737406B (en) | Text retrieval method, device and equipment and training method of text retrieval model | |
WO2023051335A1 (en) | Data encoding method, data decoding method, and data processing apparatus | |
Lei et al. | Compressing deep convolutional networks using k-means based on weights distribution | |
CN110442489A (en) | The method and storage medium of data processing | |
CN115905168A (en) | Adaptive compression method and compression apparatus, computer device, storage medium | |
CN114640357B (en) | Data encoding method, apparatus and storage medium | |
CN113343020A (en) | Image processing method and device based on artificial intelligence and electronic equipment | |
CN112101548A (en) | Data compression method and device, data decompression method and device, and electronic device | |
CN116760661A (en) | Data storage method, apparatus, computer device, storage medium, and program product | |
WO2023205969A1 (en) | Point cloud geometric information compression method and apparatus, point cloud geometric information decompression method and apparatus, point cloud video encoding method and apparatus, and point cloud video decoding method and apparatus | |
WO2024011426A1 (en) | Point cloud geometry data augmentation method and apparatus, encoding method and apparatus, decoding method and apparatus, and encoding and decoding system | |
CN112886967B (en) | Data compression coding processing method and device | |
CN110751274A (en) | Neural network compression method and system based on random projection hash | |
Gilmary et al. | Compression techniques for dna sequences: A thematic review | |
CN110782003A (en) | Neural network compression method and system based on Hash learning | |
CN118202339A (en) | Compression method and storage device for database data | |
CN115982634A (en) | Application program classification method and device, electronic equipment and computer program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |