[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114640357B - Data encoding method, apparatus and storage medium - Google Patents

Data encoding method, apparatus and storage medium Download PDF

Info

Publication number
CN114640357B
CN114640357B CN202210541766.5A CN202210541766A CN114640357B CN 114640357 B CN114640357 B CN 114640357B CN 202210541766 A CN202210541766 A CN 202210541766A CN 114640357 B CN114640357 B CN 114640357B
Authority
CN
China
Prior art keywords
huffman
huffman tree
tree
data
trees
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210541766.5A
Other languages
Chinese (zh)
Other versions
CN114640357A (en
Inventor
涂承杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yuanxiang Information Technology Co ltd
Original Assignee
Shenzhen Yuanxiang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yuanxiang Information Technology Co ltd filed Critical Shenzhen Yuanxiang Information Technology Co ltd
Priority to CN202210541766.5A priority Critical patent/CN114640357B/en
Publication of CN114640357A publication Critical patent/CN114640357A/en
Application granted granted Critical
Publication of CN114640357B publication Critical patent/CN114640357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6047Power optimization with respect to the encoder, decoder, storage or transmission

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application relates to the field of data coding, and particularly discloses a data coding method, equipment and a storage medium, wherein the method comprises the following specific steps: acquiring a first Huffman tree set corresponding to data to be encoded, wherein the first Huffman tree set comprises a plurality of Huffman trees; sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences; performing radix sequencing on the multiple code word length sequences, and performing de-duplication processing on multiple Huffman trees of the first Huffman tree set according to the sequencing result of the radix sequencing to obtain a second Huffman tree set; calculating a first code rate increment and a second code rate increment respectively corresponding to two adjacent Huffman trees in the second Huffman tree set, and reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set; and determining a Huffman tree for data coding of the data to be coded according to the third Huffman tree set. Based on the method, the operation resource of the data encoding process can be reduced.

Description

Data encoding method, apparatus and storage medium
Technical Field
The present application relates to the field of data encoding, and in particular, to a data encoding method, device, and storage medium.
Background
Currently, Arithmetic Coding (arithmetric Coding) and Huffman Coding (Huffman Coding) are the two main Entropy Coding (Entropy Coding) methods. The arithmetic coding has the advantages of being closer to the optimal entropy, having higher compression efficiency than the Huffman coding and being more widely applied. However, the arithmetic coding algorithm is complex and is not suitable for scenes (such as high-definition real-time software decoding) with limited computing resources or strict requirements on power consumption. The complexity of the Huffman coding is far lower than that of arithmetic coding, and the Huffman coding has great application value in scenes with limited resources or strict requirements on power consumption.
Generally, in the process of using huffman coding, the probability distribution of the data to be coded is updated first, and then the huffman tree is updated according to the updated probability distribution. The process requires continuous online creation of the huffman tree, is a complex and frequent operation, and still occupies a large amount of computing resources and storage resources. Therefore, it is necessary to reduce the complexity of huffman coding, so that it can be applied in the scenario where resources are limited or power consumption is critical.
Disclosure of Invention
The application provides a data encoding method, a device and a storage medium, which are used for encoding data with low complexity, so that high-efficiency data encoding can be realized under the condition of resource limitation, the consumption of computing resources and storage resources is reduced, and the operation cost in the data encoding process is reduced.
In a first aspect, the present application provides a data encoding method, the method comprising: acquiring a first Huffman tree set corresponding to data to be encoded, wherein the first Huffman tree set comprises a plurality of Huffman trees; sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences; performing radix sorting on the multiple code word length sequences, and performing de-duplication processing on the multiple huffman trees of the first huffman tree set according to a sorting result of the radix sorting to obtain a second huffman tree set, wherein the de-duplication processing includes reserving any one huffman tree with the same code word length sequence; according to the sorting result, calculating a first code rate increment and a second code rate increment respectively corresponding to two adjacent Huffman trees in the second Huffman tree set, and reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set; and determining the Huffman tree of the data to be coded for data coding according to the third Huffman tree set.
In a second aspect, the present application provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement any one of the data encoding methods provided in the embodiments of the present application when executing the computer program.
In a third aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement a data encoding method as any one provided in embodiments of the present application.
The application discloses a data encoding method, a device and a storage medium, wherein the method comprises the following steps: acquiring a first Huffman tree set corresponding to data to be encoded, wherein the first Huffman tree set comprises a plurality of Huffman trees; sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences; performing radix sequencing on the multiple code word length sequences, and performing de-duplication processing on the multiple Huffman trees of the first Huffman tree set according to the sequencing result of the radix sequencing to obtain a second Huffman tree set, wherein the de-duplication processing comprises the step of reserving any one Huffman tree with the same code word length sequence; according to the sequencing result, calculating a first code rate increment and a second code rate increment respectively corresponding to two adjacent Huffman trees in the second Huffman tree set, and reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set; and determining a Huffman tree for data coding of the data to be coded according to the third Huffman tree set. According to the technical scheme, repeated items are removed through the length of the code word, similar items are combined in a low-loss mode, the complexity of the Huffman coding algorithm is reduced, the consumption of computing resources and storage resources can be reduced, and high-efficiency data coding is achieved under the condition that the resources are limited or the requirement on power consumption is strict.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of a data encoding scenario provided in an embodiment of the present application;
fig. 2 is a schematic flow chart of a data encoding method provided in an embodiment of the present application;
FIG. 3 is a diagram illustrating a Huffman tree generation process according to an embodiment of the present disclosure;
fig. 4 is a schematic block diagram of a structure of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flowcharts shown in the figures are illustrative only and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
In order to realize efficient data encoding under the condition of resource limitation and reduce the consumption of computing resources and storage resources, the application provides a data encoding method, a device, equipment and a storage medium.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating a scene of data encoding according to an embodiment of the present application. As shown in fig. 1, the method may be applied to a terminal, and in particular, to a terminal installed with an image compression application program, where the terminal is configured to acquire an image uploaded by a user, encode and compress the image uploaded by the user, and transmit the compressed image to a server for storage through a network. The server is used for storing the image sent by the terminal. The terminal is in communication connection with the server through a network.
The server may be an independent server, a server cluster, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The terminal can be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and a wearable device.
It should be noted that, in the embodiment of the present application, related data may be acquired and processed based on an artificial intelligence technique, for example, a first huffman tree set corresponding to data to be encoded is acquired. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Referring to fig. 2, fig. 2 is a schematic flowchart of a data encoding method according to an embodiment of the present disclosure. The data coding method is used for realizing high-efficiency data coding under the condition of resource limitation, reducing the consumption of computing resources and storage resources and reducing the operation cost in the data coding process.
As shown in fig. 2, the data encoding method specifically includes: step S101 to step S105.
S101, a first Huffman tree set corresponding to data to be coded is obtained, and the first Huffman tree set comprises a plurality of Huffman trees.
Specifically, the data to be encoded is acquired, the data to be encoded is scanned, a plurality of character types included in the data to be encoded are identified, and a huffman tree set is created in advance according to the character types, wherein the huffman tree set comprises a huffman tree corresponding to the character type. Carrying out symmetry comparison on any two Huffman trees in the Huffman tree set; and if the comparison result of the symmetry comparison shows that the two Huffman trees have symmetry, deleting any Huffman tree, and keeping the Huffman trees without symmetry to obtain a first Huffman tree set.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a huffman tree generation process. As shown in fig. 3, the character types to be encoded include A, B, C and D, each character type has its corresponding probability value (0.35, 0.4, 0.15, and 0.1), and according to the 4 character types and their corresponding probability values, a plurality of different huffman trees can be created, each huffman tree includes a root node and a leaf node, where each leaf node corresponds to one character type, each character type has a corresponding code word, the code word is a binary path from the root node to the leaf node (0/1 indicates left/right), and the number of bits of a binary number included in the code word is the length of the code word.
It should be noted that given N weights as N leaf nodes, a binary Tree is constructed, and if the weighted path length of the Tree reaches the minimum, such binary Tree is called an optimal binary Tree, also called a Huffman Tree (Huffman Tree). The huffman tree is the tree with the shortest path length and the node with the larger weight is closer to the root.
In some embodiments, the huffman tree includes a left sub-tree and a right sub-tree, for example, a root node of the huffman tree may be used as a partition point, resulting in the left sub-tree and the right sub-tree of the huffman tree, and the root node is the huffman tree. Exchanging the positions of the left subtree and the right subtree of any Hoffman tree to obtain a contrast Hoffman tree; if the control huffman tree is identical to another huffman tree, then the two huffman trees are determined to have symmetry.
Illustratively, let the set of all possible Huffman trees of the N character types be
Figure 925905DEST_PATH_IMAGE001
Figure 316566DEST_PATH_IMAGE002
Is the ith Huffman tree. It is obvious that
Figure 832998DEST_PATH_IMAGE003
Is a set of all N leaf node binary trees. If n is<All of N
Figure 798549DEST_PATH_IMAGE003
Is already constructed, then
Figure 400431DEST_PATH_IMAGE004
Wherein
Figure 860363DEST_PATH_IMAGE005
Represents: from
Figure 231301DEST_PATH_IMAGE006
Taking a binary tree as left sub-tree, and extracting
Figure 367753DEST_PATH_IMAGE007
A binary tree is taken as a right subtree to form all combinations of trees with N leaf nodes. If it is used
Figure 863457DEST_PATH_IMAGE008
To represent
Figure 454975DEST_PATH_IMAGE009
Number of middle Huffman trees (obviously)
Figure 680420DEST_PATH_IMAGE010
=1), then:
Figure 990703DEST_PATH_IMAGE011
as can be seen from the formula, the,
Figure 567178DEST_PATH_IMAGE012
as N appears exponentially, rapidly increases, with N =16,
Figure 368912DEST_PATH_IMAGE013
= 9694845. Obviously, this takes up enormous computational and memory resources, and therefore, in order to reduce the burden on the computing system, it is necessary to reduce
Figure 448863DEST_PATH_IMAGE003
The number of medium huffman trees.
Due to the fact that
Figure 192697DEST_PATH_IMAGE014
And
Figure 990889DEST_PATH_IMAGE015
the huffman trees (only the left subtree and the right subtree are exchanged) are duplicated, so that only one huffman tree needs to be reserved. After the removal of symmetry repeatability, there are:
Figure 596314DEST_PATH_IMAGE016
it is calculated that, when N =16,
Figure 796351DEST_PATH_IMAGE017
=11813, it can be seen that the effect of such deduplication is significant, and delete
Figure 711086DEST_PATH_IMAGE018
And a large number of repeated Huffman trees are adopted, so that the consumption of storage resources is reduced. However, this method of removing the weight is not sufficient, and when N is large
Figure 996574DEST_PATH_IMAGE018
There are still a lot of repetitions, so the above-mentioned set of deduplicated huffman trees needs to be deduplicated again.
S102, sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences.
Specifically, the codeword length of each huffman tree is obtained, and the codeword lengths are sorted in an increasing order to generate a plurality of codeword length sequences, where each huffman tree corresponds to one codeword length sequence.
Illustratively, a huffman tree with N =8 has a total of 8 codeword lengths, which are 1,2, 4, 3, 6, 5, 7, and 7, respectively, and the codeword length sequence obtained after increasing sorting is {1,2,3,4,5,6,7,7 }.
S103, conducting base sequencing on the multiple code word length sequences, and conducting de-duplication processing on the multiple Huffman trees of the first Huffman tree set according to the sequencing result of the base sequencing to obtain a second Huffman tree set, wherein the de-duplication processing includes the step of reserving any one Huffman tree with the same code word length sequence.
Specifically, radix sorting (radix sort) is performed on all huffman trees in the first huffman tree set according to the code length sequences after increasing rearrangement, the repeated code length sequences of the huffman trees are arranged together, if the code length sequences of a plurality of huffman trees are the same, only one huffman tree needs to be reserved, and after the repeated items are deleted, the second huffman tree set is obtained.
It should be noted that, after symmetric de-weighting, when N is large,
Figure 140111DEST_PATH_IMAGE018
the number of (a) is still large. To further reduce complexity, the code word length sequence of the huffman tree in the first huffman coding is base ordered (radix sort) and the repeated trees are arranged together.
In some embodiments of the present invention, the,
Figure 194654DEST_PATH_IMAGE019
the probability distribution of the probability value corresponding to the character type after degressive rearrangement is
Figure 280291DEST_PATH_IMAGE020
Figure 787496DEST_PATH_IMAGE019
Is of a length of incremental rearrangement
Figure 734723DEST_PATH_IMAGE021
Then, then
Figure 112615DEST_PATH_IMAGE019
Code rate of
Figure 978940DEST_PATH_IMAGE022
Comprises the following steps:
Figure 360724DEST_PATH_IMAGE023
according to the calculation formula of the coding rate, if the code word lengths of the two huffman trees are the same according to the code word length sequence obtained by incremental rearrangement, the coding efficiency of the two huffman trees is the same, and therefore only one huffman tree needs to be reserved. Thus, a large number of repeated items can be removed, and under the condition of not losing the code rate, Huffman trees with different coding code rates are reserved.
In some embodiments, the first huffman tree set may also be a huffman tree set without symmetric deduplication. If the huffman trees have symmetry, their code word length sequences will be the same, so after sorting them by base, they will be arranged together, and in step S103, the repeated items between them can be removed.
Thus, when the number of the character types N is small, the complexity is controlled within a reasonable range by the lossless deduplication technology on the premise of not influencing compression efficiency. But when the number of N is relatively large,
Figure 970697DEST_PATH_IMAGE024
the number of the medium huffman trees is still larger, and the reduction is still needed in the scene that the resource is limited or the requirement on power consumption is harsh
Figure 875199DEST_PATH_IMAGE024
Number of intermediate Huffman treesAnd further reducing the occupation of computing resources and storage resources.
S104, according to the sequencing result, calculating a first code rate increment and a second code rate increment respectively corresponding to two adjacent Huffman trees in the second Huffman tree set, and reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set.
Specifically, according to the sorting result, taking every two adjacent huffman trees in the second huffman tree combination set as a control group, wherein the control group comprises a first huffman tree and a second huffman tree; coding the probability distribution corresponding to the first Huffman tree according to the second Huffman tree, wherein the increment of the generated coding code rate is the increment of the first code rate; and coding the probability distribution corresponding to the second Huffman tree according to the first Huffman tree, wherein the increment of the generated coding code rate is the increment of the second code rate.
In an exemplary manner, the first and second electrodes are,
Figure 912425DEST_PATH_IMAGE025
the code word length of is in an increasing rearranged sequence of
Figure 784435DEST_PATH_IMAGE026
The corresponding optimal probability distribution is:
Figure 198099DEST_PATH_IMAGE027
if the probability distribution of the ith huffman tree is coded by the code word of the jth huffman tree, the coding rate will be increased, and the formula of the code rate increment is as follows:
Figure 957107DEST_PATH_IMAGE028
Figure 430814DEST_PATH_IMAGE029
is composed of
Figure 993382DEST_PATH_IMAGE030
Use of probability distribution
Figure 69792DEST_PATH_IMAGE031
The code word is encoded to generate a code rate increment. On the contrary, in the case of a single-layer structure,
Figure 745624DEST_PATH_IMAGE032
is composed of
Figure 124652DEST_PATH_IMAGE033
Using probability distribution of
Figure 708605DEST_PATH_IMAGE034
The code word is encoded to generate a code rate increment.
Figure 464071DEST_PATH_IMAGE029
And
Figure 197672DEST_PATH_IMAGE032
may be different in size.
Figure 13181DEST_PATH_IMAGE035
And
Figure 160129DEST_PATH_IMAGE032
the similarity of the huffman tree is reflected. Due to the fact that
Figure 578341DEST_PATH_IMAGE018
Is a collection obtained by base sorting according to the code word length sequence, so
Figure 963186DEST_PATH_IMAGE018
The inner huffman trees are basically ranked according to similarity, when the ith tree is adjacent to the jth tree,
Figure 152859DEST_PATH_IMAGE029
has the smallest value (i.e., i = j-1 or i = j + 1).
In some embodiments, the two adjacent trees may be the (i +2 a) th and (i +1+2 a) th trees, where a =0, 1,2,3, ….
In some embodiments, a magnitude value of a huffman tree in the third set of huffman trees is obtained; and if the quantity value is larger than the preset value, continuously deleting the Huffman trees of the third Huffman tree set according to the first code rate increment and the second code rate increment until the quantity value is smaller than or equal to the preset value.
Illustratively, the code rate increment generated by mutual permutation between two adjacent huffman trees is calculated:
Figure 583840DEST_PATH_IMAGE036
and
Figure 805743DEST_PATH_IMAGE037
(ii) a Will be provided with
Figure 372990DEST_PATH_IMAGE036
And
Figure 405668DEST_PATH_IMAGE038
is set as the minimum value of
Figure 323946DEST_PATH_IMAGE039
Wherein
Figure 83960DEST_PATH_IMAGE039
Corresponding Huffman tree is
Figure 771294DEST_PATH_IMAGE040
Deletion of
Figure 974873DEST_PATH_IMAGE040
. For example,
Figure 114867DEST_PATH_IMAGE036
is less than
Figure 941222DEST_PATH_IMAGE038
Figure 483062DEST_PATH_IMAGE036
The corresponding Huffman tree is
Figure 123122DEST_PATH_IMAGE034
Deletion of
Figure 750412DEST_PATH_IMAGE041
. Repeating the above two steps until
Figure 852229DEST_PATH_IMAGE018
Is less than the predetermined value. In this way, the number of huffman tree sets can be further compressed under the condition of less coding rate loss, so as to adapt to the use scene with limited resources or strict requirements on power consumption.
In some embodiments, since encoding and decoding are usually implemented by looking up the encoding table, the longest codeword length in the encoding table determines the length of the encoding table, and in order to limit the length of the encoding table, the longest codeword length can be limited. In this case, the huffman tree having the longest codeword length exceeding the preset value is deleted.
By the above technical means, the number of the steps can be further reduced
Figure 514155DEST_PATH_IMAGE018
The number of the Hoffman trees in the method can also control the loss of compression performance to be extremely small, and the requirements of computing resources and storage resources are reduced.
And S105, determining a Huffman tree for data coding of the data to be coded according to the third Huffman tree set.
Specifically, the coding rate of each huffman tree in the third huffman tree set is calculated, and a fourth huffman tree set is determined according to the coding rate; and determining the Huffman tree of the data to be coded for data coding from the third Huffman tree set or the fourth Huffman tree set.
For example, the coding rate of each huffman tree in the third huffman tree set is calculated, the coding rates of each huffman tree are incrementally sorted, the huffman tree corresponding to the coding rate sorted in the top is determined as the fourth huffman tree set, and for example, the fourth huffman tree set is generated according to the huffman tree corresponding to the coding rate sorted in the top eight.
In some embodiments, only the fourth set of huffman trees may be reserved due to limited computational and memory resources of the usage scenario. Adaptively, the number of huffman trees in the fourth huffman tree set can also be reduced, for example, only the huffman trees with the first three coding rate orders in the fourth huffman tree set are reserved, so as to adapt to the requirement of the extreme use scenario.
In some embodiments, a change granularity of data to be encoded is obtained; if the change granularity is smaller than the preset granularity, determining a Huffman tree for data coding of the data to be coded from the fourth Huffman tree set; and if the change granularity is larger than or equal to the preset granularity, determining the Huffman tree of the data to be coded for data coding from the third Huffman tree set.
In some embodiments, the granularity of the unit for changing the data to be encoded is obtained, and whether a local update or a global update is required is determined according to the granularity, where the local update is a huffman tree determined from the fourth huffman tree set that the data to be encoded performs data encoding, and the global update is a huffman tree determined from the fourth huffman tree set that the data to be encoded performs data encoding.
Illustratively, the data to be encoded is image data, and the granularity of the change unit of the image data includes: CU units and CTU units, CTU units being much larger than CU units. When the change unit is a CU unit, since the granularity of the change unit is small, and thus the change in the probability distribution is small, the encoding code rate of the change unit only needs to be calculated according to the huffman tree in the fourth huffman tree set, and the optimal huffman tree is determined from the fourth huffman tree set.
When the change unit is a CTU unit, since the data update reaches a large scale, the probability distribution changes greatly, and the change unit may eventually deviate from the global optimum as the deviation becomes farther, it is necessary to perform global calculation on the coding rate of the change unit once, calculate the coding rate of the change unit according to the huffman trees in the third huffman tree set, reacquire the globally optimum huffman tree, and reacquire the globally optimum fourth huffman tree set.
It should be noted that the CTU is a processing unit of h.265/High Efficiency Video Coding (HEVC). The CTU Unit (Coding Tree Unit) is a Coding Tree Unit, and the CU Unit (Coding Unit) is a Coding Unit. The Code Tree Unit (CTU) may comprise one Coding Unit (CU) or be divided into a plurality of smaller units.
Based on the data coding provided by the embodiment of the application, the high-efficiency data coding can be realized under the condition of resource limitation, the consumption of computing resources and storage resources is reduced, and the operation cost in the data coding process is reduced.
Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.
Referring to fig. 4, the computer device includes a processor, a memory and a network interface connected by a system bus, wherein the memory may include a storage medium and an internal memory.
The storage medium may store an operating system and a computer program. The computer program comprises program instructions, which when executed, can make a processor execute any one of the data encoding methods provided by the embodiments of the present application.
The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.
The internal memory provides an environment for the execution of a computer program on a storage medium, which when executed by a processor causes the processor to perform any of the data encoding methods. The storage medium may be non-volatile or volatile.
The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Illustratively, in one embodiment, the processor is configured to execute a computer program stored in the memory to perform the steps of: acquiring a first Huffman tree set corresponding to data to be encoded, wherein the first Huffman tree set comprises a plurality of Huffman trees; sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences; performing radix sequencing on the multiple code word length sequences, and performing de-duplication processing on the multiple Huffman trees of the first Huffman tree set according to the sequencing result of the radix sequencing to obtain a second Huffman tree set, wherein the de-duplication processing comprises the step of reserving any one Huffman tree with the same code word length sequence; according to the sequencing result, calculating a first code rate increment and a second code rate increment respectively corresponding to two adjacent Huffman trees in the second Huffman tree set, and reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set; and determining a Huffman tree for data coding of the data to be coded according to the third Huffman tree set.
In some embodiments, before the obtaining of the first huffman tree set corresponding to the data to be encoded is implemented, the processor is further specifically configured to implement: acquiring a Huffman tree set corresponding to data to be encoded; carrying out symmetry comparison on any two Huffman trees in the Huffman tree set; and if the comparison result of the symmetry comparison shows that the two Huffman trees have symmetry, deleting any one Huffman tree to obtain a first Huffman tree set.
In some embodiments, the processor, when implementing symmetry comparison between any two huffman trees, is further specifically configured to implement: exchanging the positions of the left sub-tree and the right sub-tree of any Hoffman tree to obtain a contrast Hoffman tree; if the control huffman tree is identical to another huffman tree, then the two huffman trees are determined to have symmetry.
In some embodiments, when the processor implements sorting of the codeword lengths of the huffman tree to obtain a plurality of codeword length sequences, the processor is further specifically configured to implement: and carrying out ascending sequencing on the code word length of each Hoffman tree to generate a plurality of code word length sequences.
In some embodiments, when the processor calculates, according to the sorting result, a first code rate increment and a second code rate increment respectively corresponding to two adjacent huffman trees in the second huffman tree set, the processor is further specifically configured to implement: taking every two adjacent huffman trees in the second huffman tree set as a comparison group according to the sorting result, wherein the comparison group comprises a first huffman tree and a second huffman tree; coding the probability distribution corresponding to the first Huffman tree according to the second Huffman tree, wherein the increment of the generated coding code rate is the increment of the first code rate; and coding the probability distribution corresponding to the second Huffman tree according to the first Huffman tree, wherein the increment of the generated coding code rate is the increment of the second code rate.
In some embodiments, the processor is configured to execute the computer program stored in the memory, and is further specifically configured to: acquiring the quantity value of the Huffman trees in the third Huffman tree set; and if the quantity value is larger than the preset value, continuously deleting the Huffman trees of the third Huffman tree set according to the first code rate increment and the second code rate increment until the quantity value is smaller than or equal to the preset value.
In some embodiments, when implementing the determining of the huffman tree to be data-encoded from the third set of huffman trees for data encoding, the processor is further specifically configured to implement: calculating the coding rate of each Huffman tree in the third Huffman tree set, and determining a fourth Huffman tree set according to the coding rate; and determining the Huffman tree of the data to be coded for data coding from the third Huffman tree set or the fourth Huffman tree set.
In some embodiments, when the processor determines, from the third huffman tree set or the fourth huffman tree set, a huffman tree to be data-encoded for the data encoding, the processor is further specifically configured to: acquiring change granularity of data to be coded; if the change granularity is smaller than the preset granularity, determining a Huffman tree for data coding of the data to be coded from the fourth Huffman tree set; and if the change granularity is larger than or equal to the preset granularity, determining the Huffman tree of the data to be coded for data coding from the third Huffman tree set.
Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and the processor executes the program instructions to implement any one of the data encoding methods provided in the embodiments of the present application.
The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A method of encoding data, the method comprising:
acquiring a first Huffman tree set corresponding to data to be encoded, wherein the first Huffman tree set comprises a plurality of Huffman trees;
sequencing the code word lengths of the Huffman tree to obtain a plurality of code word length sequences;
performing radix sorting on the multiple code word length sequences, and performing de-duplication processing on the multiple huffman trees of the first huffman tree set according to a sorting result of the radix sorting to obtain a second huffman tree set, wherein the de-duplication processing includes reserving any one huffman tree with the same code word length sequence;
according to the sorting result, taking every two adjacent Huffman trees in the second Huffman tree set as a control group, wherein the control group comprises a first Huffman tree and a second Huffman tree; coding the probability distribution corresponding to the first Huffman tree according to the second Huffman tree, wherein the increment of the generated coding code rate is a first code rate increment; coding the probability distribution corresponding to the second Huffman tree according to the first Huffman tree, wherein the increment of the generated coding code rate is a second code rate increment; reserving the Huffman tree corresponding to the larger one of the first code rate increment and the second code rate increment to obtain a third Huffman tree set;
and determining the Huffman tree of the data to be coded for data coding according to the third Huffman tree set.
2. The method according to claim 1, further comprising, before said obtaining a first huffman tree set corresponding to data to be encoded:
acquiring a Huffman tree set corresponding to data to be encoded;
carrying out symmetry comparison on any two Huffman trees in the Huffman tree set;
and if the comparison result of the symmetry comparison indicates that the two Huffman trees have symmetry, deleting any one Huffman tree to obtain the first Huffman tree union.
3. The method of claim 2, wherein the huffman tree comprises a left sub-tree and a right sub-tree, and wherein the symmetrically comparing any two huffman trees in the huffman tree set comprises:
exchanging the positions of the left sub-tree and the right sub-tree of any one Hoffman tree to obtain a contrast Hoffman tree;
determining that two of said Huffman trees have symmetry if said control Huffman tree is identical to another of said Huffman trees.
4. The method of claim 1, wherein the sorting the codeword lengths of the Huffman tree to obtain a plurality of codeword length sequences comprises:
and carrying out ascending sequencing on the code word length of each Huffman tree to generate a plurality of code word length sequences.
5. The method of claim 1, further comprising:
obtaining a quantity value of the Huffman trees in the third Huffman tree set;
and if the quantity value is larger than a preset value, continuing deleting the Huffman trees of the third Huffman tree set according to the first code rate increment and the second code rate increment until the quantity value is smaller than or equal to a preset value.
6. The method according to claim 1, wherein said determining the huffman tree for data encoding of the data to be encoded according to the third set of huffman trees comprises:
calculating the coding rate of each Huffman tree in the third Huffman tree set, and determining a fourth Huffman tree set according to the coding rate;
determining the Huffman tree of the data to be encoded for data encoding from the third set of Huffman trees or the fourth set of Huffman trees.
7. The method according to claim 6, wherein said determining the Huffman tree from the third set of Huffman trees or the fourth set of Huffman trees for data encoding of the data to be encoded comprises:
acquiring the change granularity of the data to be coded;
if the change granularity is smaller than the preset granularity, determining the Huffman tree of the data to be coded for data coding from a fourth Huffman tree set;
and if the change granularity is larger than or equal to the preset granularity, determining the Huffman tree of the data to be coded for data coding from a third Huffman tree set.
8. A computer device, wherein the computer device comprises a memory and a processor;
the memory is used for storing a computer program;
the processor for executing the computer program and implementing the data encoding method of any one of claims 1 to 7 when executing the computer program.
9. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, causes the processor to carry out a data encoding method as claimed in any one of claims 1 to 7.
CN202210541766.5A 2022-05-19 2022-05-19 Data encoding method, apparatus and storage medium Active CN114640357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210541766.5A CN114640357B (en) 2022-05-19 2022-05-19 Data encoding method, apparatus and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210541766.5A CN114640357B (en) 2022-05-19 2022-05-19 Data encoding method, apparatus and storage medium

Publications (2)

Publication Number Publication Date
CN114640357A CN114640357A (en) 2022-06-17
CN114640357B true CN114640357B (en) 2022-09-27

Family

ID=81953155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210541766.5A Active CN114640357B (en) 2022-05-19 2022-05-19 Data encoding method, apparatus and storage medium

Country Status (1)

Country Link
CN (1) CN114640357B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656604A (en) * 2016-01-21 2016-06-08 北京邮电大学 Bit interleaved polar code modulation method and apparatus

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2530012A (en) * 2014-08-05 2016-03-16 Illumina Cambridge Ltd Methods and systems for data analysis and compression
CN107483059B (en) * 2017-07-31 2020-06-12 广东工业大学 Multi-channel data coding and decoding method and device based on dynamic Huffman tree
US10693493B1 (en) * 2019-02-14 2020-06-23 International Business Machines Corporation Reducing latch count to save hardware area for dynamic Huffman table generation
CN110868223B (en) * 2019-12-06 2023-10-27 广东海洋大学 Numerical operation implementation method and circuit for Huffman coding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656604A (en) * 2016-01-21 2016-06-08 北京邮电大学 Bit interleaved polar code modulation method and apparatus

Also Published As

Publication number Publication date
CN114640357A (en) 2022-06-17

Similar Documents

Publication Publication Date Title
US11722148B2 (en) Systems and methods of data compression
CN107291935B (en) Spark and Huffman coding based CPIR-V nearest neighbor privacy protection query method
WO2023159820A1 (en) Image compression method, image decompression method, and apparatuses
CN112463784A (en) Data deduplication method, device, equipment and computer readable storage medium
CN112672168A (en) Point cloud compression method and device based on graph convolution
CN117914951A (en) Communication data transmission compression method and system
CN114640356A (en) Big data compression method, system and storage medium based on neural network
CN114222129A (en) Image compression encoding method, image compression encoding device, computer equipment and storage medium
CN111737406B (en) Text retrieval method, device and equipment and training method of text retrieval model
WO2023051335A1 (en) Data encoding method, data decoding method, and data processing apparatus
Lei et al. Compressing deep convolutional networks using k-means based on weights distribution
CN110442489A (en) The method and storage medium of data processing
CN115905168A (en) Adaptive compression method and compression apparatus, computer device, storage medium
CN114640357B (en) Data encoding method, apparatus and storage medium
CN113343020A (en) Image processing method and device based on artificial intelligence and electronic equipment
CN112101548A (en) Data compression method and device, data decompression method and device, and electronic device
CN116760661A (en) Data storage method, apparatus, computer device, storage medium, and program product
WO2023205969A1 (en) Point cloud geometric information compression method and apparatus, point cloud geometric information decompression method and apparatus, point cloud video encoding method and apparatus, and point cloud video decoding method and apparatus
WO2024011426A1 (en) Point cloud geometry data augmentation method and apparatus, encoding method and apparatus, decoding method and apparatus, and encoding and decoding system
CN112886967B (en) Data compression coding processing method and device
CN110751274A (en) Neural network compression method and system based on random projection hash
Gilmary et al. Compression techniques for dna sequences: A thematic review
CN110782003A (en) Neural network compression method and system based on Hash learning
CN118202339A (en) Compression method and storage device for database data
CN115982634A (en) Application program classification method and device, electronic equipment and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant