[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN101299611A - Data compression method based on set run - Google Patents

Data compression method based on set run Download PDF

Info

Publication number
CN101299611A
CN101299611A CNA2008101228946A CN200810122894A CN101299611A CN 101299611 A CN101299611 A CN 101299611A CN A2008101228946 A CNA2008101228946 A CN A2008101228946A CN 200810122894 A CN200810122894 A CN 200810122894A CN 101299611 A CN101299611 A CN 101299611A
Authority
CN
China
Prior art keywords
character
data
characteristic
characteristic character
run
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008101228946A
Other languages
Chinese (zh)
Other versions
CN101299611B (en
Inventor
茅文深
常传文
顾晶
姜黎
李化雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN2008101228946A priority Critical patent/CN101299611B/en
Publication of CN101299611A publication Critical patent/CN101299611A/en
Application granted granted Critical
Publication of CN101299611B publication Critical patent/CN101299611B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a data compression method based on the set run, including reading the data to be compressed in the memorizer of the computer; looking for the characteristic character set in the data; coding; storing the compressed data into the memorizer of the computer. Looking for the characteristic character set in the data includes the following steps: counting the isolated character; judging whether presenting the isolated character, if true, executing the next step, else converting all the characters into the characteristic characters, wherein the continuous occurrence number mapped by each character is the numerical value, and converting to the coding step; sorting the isolated character; traversing the number of each element that may occur of all the characteristic character set, selecting the number of the characteristic character set when the compression ratio reach the minimum; mapping each element in the characteristic character set till the element appears continuously. The invention advances the efficiency of the run length coding, and obtains better data compression effect in the computer data compression process.

Description

A kind of data compression method based on set run
Technical field
The present invention relates to a kind of data compression method of computer realm, particularly a kind of data compression method based on set run with good compression ratio.
Background technology
Along with computer technology and fast development of information technology, a large amount of information is represented, is stored and transmit in digitized mode, as radar, image, voice etc., nowadays an important problem that faces is exactly that these information have taken huge space, is example with the radar signal, and a radar triggers 3000 pixels, run-down (containing 4096 triggerings) needs 15 seconds, its data transfer rate needs 6.3Mbps, and preserving a width of cloth radar image needs the 12M byte, and data volume is very big.Equally, the lot of data amount has brought inconvenience also for analysis and transmission, brings big pressure to communication, with respect to improving memory device to capacity and increase communication bandwidth, adopts the cost of data compression method lower, can obtain effect preferably.
So-called data compression is by changing the expression mode of information, expression information as much as possible in the finite information space.Compression algorithm mainly contains two classes: class compression is reversible, and promptly the data after the compression can recover initial data fully, without any information loss, are called lossless compress; Another kind of compression is irreversible, promptly from the compression after data can't recover original data fully, information has certain loss, is called lossy compression method.Common lossless compression method has huffman coding, Run-Length Coding, arithmetic coding etc., and is simple because of Run-Length Coding, algorithm complex is low, be easy to advantages such as hardware realization, uses comparatively extensive.
Run-Length Coding is a kind of lossless compression-encoding method of simplicity.It mainly is the action of doing compression at the repetition word string of a succession of appearance.Such as, for aaaaaaa, can be expressed as 7a, 7 expression count values, a then is a data value, for data: aaaaaabbbcccc clocklike, then is expressed as 6a3b4c equally.The Run-Length Coding algorithm has good Code And Decode advantage, but when meeting abcde so each other all during unduplicated data, coding will return 1a1b1c1d1e, and data capacity is preceding 2 times of compression, and this can not put up with.This " morbid state " situation that is a kind of.In order to avoid the appearance of " morbid state " situation as far as possible, need improve the basic skills of Run-Length Coding.Improved method is in the specific implementation counting byte and encoded byte to be distinguished, and utilizes high two signs as compression of counting byte.The isolated data all different to the front and back adjacent byte has only when high 2 complete 1 (being C0) of counting byte just to add 1 counting, otherwise directly exports this pixel value, the situation of therefore having avoided compression back length to double.So just make high 2 of count word joint itself also to be complete 1, promptly counting byte is C0H+n (consecutive identical byte number).When the value of single image data during more than or equal to C0, then output C1 earlier exports this image data value again, otherwise directly exports these data.
If any following a series of data: D2,20,30,30,30, C0, C1, C1, E2, E2, E2 ..., E2 (132), E0, E0, D4, data are after compression: C1, D2,20, C3,30, C1, C0, C2, C1, FF, E2, FF, E2, C6, E2, C2, E0, C1, D4 can see from this compression process, isolated character D2, C0, D4 front have counting byte C1, and do not have before 20.Can effectively avoid compressing the abnormal conditions that expand in the back like this.
Can above-mentioned improvement not address the problem: 1,, " morbid state " situation still inevitably can occur, make that compression back data are 1 times of data before the compression if when the isolated in a large number character of certain data appears at greater than C0.2, above-mentioned coding can only maximumly be represented continuous 63 identical byte numbers, can't regulate, and has limited the raising of compression efficiency.3, Gu Ding threshold value C0 has limited compression effectiveness, does not adjust accordingly according to the data type feature.
Summary of the invention
Goal of the invention: the present invention is directed to the deficiencies in the prior art, a kind of data compression method based on set run with higher compression ratios is provided.
Technical scheme: the invention discloses a kind of data compression method, may further comprise the steps based on set run:
(1) reads data to be compressed in the computer storage;
(2) characteristic character set in the searching data;
(3) encode;
(4) data after will compressing deposit computer storage in;
The characteristic character set that described step (2) is sought in the data may further comprise the steps:
(101) the isolated character of statistics is promptly traveled through the character in the character set by computer processor;
(102) judge whether to exist isolated character, be execution in step (103) then, otherwise all characters are all turned to characteristic character, the continuous occurrence number of each characteristic character mapping is its numerical value itself, and changes above-mentioned steps (3) over to;
(103) number of times that will isolate the character appearance sorts from small to large;
(104) travel through the various element numbers that all characteristic character set may occur, choose the number that the value that makes compression ratio r reaches the characteristic character set element of minimum;
(105) each element map in the characteristic character set is arrived continuous occurrence number.
Among the present invention, preferably, in the described step (105) mapping policy be in the characteristic character set each characteristic character to shine upon the sequence number that sorts in continuous occurrence number and the step (103) identical.Each characteristic character set is assumed to be that element need shine upon continuous occurrence number in the A set, and mapping policy only need guarantee get final product continuously, the strategy of recommendation be A gather in each characteristic character to shine upon continuous occurrence number identical with the sequence number that step (103) sorts.
Among the present invention, preferably, set the initial value of a compression ratio r in the described step (104), the r value of follow-up each element number is according to the method recursion of difference.The various element numbers that traversal characteristic character set A may occur on the basis of step (103) ordering are chosen the value that makes r and are reached minimum A set element number.Can calculate an initial value, as 1, the r value recursion of follow-up each element number gets final product, and takes this significantly to accelerate search speed.
Step described in the present invention (3) coding may further comprise the steps: (201) judge whether to isolated character, if then carry out step (202), look into A set mapping table otherwise carry out step (203) that current character exported again in the character of output mapping earlier; (202) judge whether to be A set character that look into A set mapping table if then carry out step (204), output earlier is mapped as 1 A set character, exports current character again; Otherwise carrying out step (205) directly exports character.
Among the present invention, described set run compression algorithm is derived as follows:
Suppose certain sequence, its length is n, definition X IjFor j time number of times appears in value i continuously in this sequence, 0≤i≤255,1≤j≤n wherein.Then must have:
n = Σ i = 0 255 Σ J = 1 n j X ij
If set M is the set that 0~255 numeral is formed, establishing set A is all characteristic character set, is the subclass of M, and LA is an A set element number.If set B is all non-characteristic character set, then have:
A ∪ B=M
Figure A20081012289400052
Element number LB=256-LA in the set B
The effect of characteristic character another one is the number of times that expression occurs continuously, so each characteristic character in the set A will shine upon a numerical value, it is 1~LA-1 that this numerical value can represent to occur continuously the number scope.Former Run-Length Coding algorithm is just gathered a special case of algorithm, and promptly set A is the element more than or equal to 192, and 193 can represent to occur 1 time continuously, and 194 can represent to occur 2 times continuously, and the rest may be inferred.Suppose set A for 5,7,56......96,222,254}, in cataloged procedure, 5 may be mapped as continuous appearance 1 time, 7 are mapped as 2 times, 56 are mapped as 3 times, coding back code length is divided into following three parts:
(a) the isolated character code length in the set B: K 1 = Σ i ∈ B X i 1 ; - - - ( 1 )
(b) the isolated character code length in the set A: K 2 = 2 Σ i ∈ A X i 1 ; - - - ( 2 )
(c) at least two identical characters code length: K3-K4 appear continuously, wherein:
K 3 = 2 Σ i ∈ M Σ j = 2 n UpInt ( j / ( LA - 1 ) ) X ij ; UpInt () is last bracket function; (3)
K 4 = Σ i ∈ B Σ j ∈ B X ij G is { j% (LA-1)=1 and n 〉=j 〉=2}.(4)
With former distance of swimming algorithm is example, K4 has considered following situation: if 165 (less than threshold values 192) and 224 (greater than threshold values 192) each occurred continuously 64 times, the former is 255 165 165 3 bytes (not being 255 165 193 165 4 bytes) behind the coding, behind latter's coding is 255 224 193 224, four bytes.
Wherein K 1 + K 2 = Σ i ∈ M X i 1 + Σ i ∈ A X i 1 , Order
K 5 = Σ i ∈ A X i 1 , K 6 = Σ i ∈ A X i 1 . (5)
Total code length is: L=K5+K6+K3-K4;
The definition compression ratio r = L n = K 5 + K 6 + K 3 - K 4 n = K 5 n + K 6 n + K 3 n - K 4 n - - - ( 6 )
Order r 1 = K 5 n , r 2 = K 6 n , r 3 = K 3 n , r 4 = K 4 n
For reaching compression effectiveness preferably, just require to make r as far as possible little, under the certain situation of sequence, r1 is a constant, the division with set A and B does not change, and does not do consideration; The division of r2 and A set is in close relations, and r3 is only relevant with A set element number; With respect to r1~r3, r4 is indivisible, also can not consider.Problem is converted into the division methods of seeking set A and B, makes r2+r3 reach minimum.
Suppose that the element number of set A fixes, the value of r3 is exactly a fixed value, make the r2 value reach minimum, and the division of set A just must be made up of the less element of isolated character appearance, thus, can design following steps:
The first, seek characteristic character set, promptly described A set.Particularly comprise following three parts: (a) the isolated character occurrence number of sequence is sorted according to from small to large order, the set A element is preferentially chosen the isolated character that less number of times occurs, and as isolated character, then character set M is the A set.(b) the various element numbers that traversal A may occur on the basis of step (a) ordering are chosen the value that makes r and are reached minimum A set element number.(c) element is being born the task of shining upon continuous occurrence number in each A set, and mapping policy only need guarantee to get final product continuously.
The second, adopt the general coding method in this area that the data of seeking after the characteristic character collection is handled are encoded.Among the present invention, preferably, the more traditional Run-Length Coding algorithm of described set run algorithm has following feature: (a) divide characteristic character no longer with threshold reference, and constitute with two mutually disjoint set.(b) number of A set no longer is fixed as 63, so the continuous occurrence number of denotable maximum is unfixing yet, excavates the potentiality of Run-Length Coding to greatest extent.
In the process of data compression, in order to keep synchronously with decoding end, and, can take data are divided into key frame and non-key frame key frame calculated characteristics character set in order to obtain better effect, and write in the encoding code stream, non-key frame directly utilizes the characteristic character set that draws, and the interval of key frame can be adjusted according to data situation, and following three kinds of situations can be arranged: 1, if data type is stably, can be once with a key frame initialization, coding all is non-key frame afterwards.2, if data type is gradual, a key frame and some non-key frames are combined as cluster, key frame is before, and the characteristic character of key frame was assembled fruit in the non-key frame in bunch utilized bunch.3, if data type is non-stationary, changes acutely, abandoning non-key frame, is key frame all.
Beneficial effect: the present invention is from the angle research Run-Length Coding of set theory, a kind of new Run-Length Coding algorithm--set run has been proposed, do not re-use threshold value in the algorithm, and the universe character is divided into two mutually disjoint set, be respectively characteristic character and non-characteristic character, represent continuous occurrence number by characteristic character is reasonably shone upon, thereby the efficient of Run-Length Coding algorithm has been used the limit.Through evidence, the present invention improves the efficiency of algorithm of Run-Length Coding greatly, is applied in and can obtains better data compression effect in the computer data compression process.
Experiment is for the data of having chosen three types of radars, image, text, and wherein radar data has been chosen two types, and totally four groups of data are also found in experiment, is gradual with the A set of data type, this also for practical application provides may.Fig. 4 is the relation between various types of data A set length and the compression ratio, and the figure cathetus partly is all types of former distance of swimming compression algorithm rates.Fig. 5 is under the optimum A set situation, and the algorithm of set run algorithm and former distance of swimming algorithm relatively.Provable by above experimental result, the present invention has designed a kind of new run length encoding method, and this method has been introduced set theory in the coding, has overcome the lower problem of former Run-Length Coding code efficiency, has improved the data compression ratio greatly.
Description of drawings
Fig. 1 is an outline flowchart of the present invention.
Fig. 2 is coding flow process flow chart among the present invention.
Fig. 3 is the present invention's detailed flow chart on Fig. 1 basis.
Fig. 4 is graph of a relation between various types of data A set length and the compression ratio.
Fig. 5 is that set run algorithm and former distance of swimming algorithm compare.
Embodiment
Below in conjunction with accompanying drawing the present invention is done further explanation.
As Fig. 1, shown in Figure 3, a kind of data compression method based on set run of the present invention may further comprise the steps: step 1, read the data to be compressed in the computer storage; Step 2 is sought the characteristic character set in the data; Step 3 is encoded; Step 4 deposits the data after the compression in computer storage.Wherein, described step 2 is sought the characteristic character set in the data, may further comprise the steps: step 101, and the isolated character of statistics is promptly traveled through the character in the character set by computer processor; Step 102 judges whether to exist isolated character, is execution in step 103 then, otherwise all characters are all turned to characteristic character, and change above-mentioned steps 3 over to; The number of times that step 103 will isolate the character appearance sorts from small to large; The various element numbers that all characteristic character set of step 104 traversal may occur are chosen the number that the value that makes compression ratio r reaches the characteristic character set element of minimum; Step 105 arrives continuous occurrence number with each element map in the characteristic character set.In the present embodiment, in the described step 105 mapping policy be in the characteristic character set each characteristic character to shine upon the sequence number of ordering in continuous occurrence number and the step 103 identical.Element need shine upon continuous occurrence number in each characteristic character set A set, and mapping policy only need guarantee get final product continuously, the strategy of recommendation be A gather in each characteristic character to shine upon continuous occurrence number identical with the sequence number that step 103 sorts.In the present embodiment, set the initial value of a compression ratio r in the described step 104, the r value recursion of follow-up each element number.The various element numbers that traversal characteristic character set A may occur on the basis of step 103 ordering are chosen the value that makes r and are reached minimum A set element number.Can calculate an initial value, as 1, the r value recursion of follow-up each element number gets final product, and takes this significantly to accelerate search speed.
As shown in Figure 2, among the present invention, the core thinking of coding and the universal method of this area are similar.Seek the improvement of the characteristic character set in the data in according to the present invention, cataloged procedure may further comprise the steps: step 201 judges whether into isolated character, if then carry out step 202; Otherwise carry out step 203, look into A set mapping table, current character exported again in the character of output mapping earlier; Step 202 judges whether to A set character, if then carry out step 204, looks into A set mapping table, and output earlier is mapped as 1 A set character, exports current character again; Otherwise carry out step 205, directly character is exported.This coding method is basic identical with former Run-Length Coding, and difference is following two parts: (a) can represent that maximum read-around ratio no longer is fixed as 63, change A set number into, can occur as constant at the coding initial phase.Need inquire about phase map element in the A set when (b) occurrence number is write code stream continuously.
The thinking of described compression method and effect for a better understanding of the present invention, following data compression is that example is introduced data handling procedure of the present invention in detail.
Read the data to be compressed in the computer storage, data to be compressed (data are according to row storage, add * and go up target and be isolated character) as shown in table 1, totally 100 bytes, wherein isolated character amounts to 59:
Table 1 input data sequence
2 2 2 194 * 197 * 199 * 214 * 217 * 238 * 255 *
240 * 81 81 81 35 * 32 32 32 243 * 242 *
241 * 245 * 247 * 137 137 137 86 86 84 * 88 *
18 * 209 * 210 * 211 211 213 213 218 * 219 * 221
221 219 * 71 71 72 72 72 73 73 82
82 82 82 82 82 82 225 * 224 * 230 * 232 *
233 * 228 * 229 * 233 * 230 * 204 * 199 * 201 * 198 * 195 *
199 * 200 200 199 * 196 * 207 * 202 * 2 * 3 * 5 *
8 * 9 * 119 119 119 119 119 240 * 249 * 253 *
254 * 250 * 251 * 250 * 251 * 252 * 194 * 195 * 194 * 195 *
If adopt original Run-Length Coding algorithm, coding result is shown in table 2 (data are arranged according to row), and coding back length is 137 bytes.
Table 2 adopts original Run-Length Coding algorithm coding result
195 2 193 194 193 197 193 199 193 214
193 217 193 238 193 255 193 240 195 81
35 195 32 193 243 193 242 193 241 193
245 193 247 195 137 194 86 84 88 18
193 209 193 210 194 211 194 213 193 218
193 219 194 221 193 219 194 71 195 72
194 73 199 82 193 225 193 224 193 230
193 232 193 233 193 228 193 229 193 233
193 230 193 204 193 199 193 201 193 198
193 195 193 199 194 200 193 199 193 196
193 207 193 202 2 3 5 8 9 197
119 193 240 193 249 193 253 193 254 193
250 193 251 193 250 193 251 193 252 193
194 193 195 193 194 193 195
Isolated character according to the method for the invention his-and-hers watches 1 data is added up, and can be added up and according to the isolated character form after the ordering from small to large, and is as shown in table 3, and adding * in the table, to go up target be the isolated character that occurs in the data.
Among the present invention, so-called A set length refers to the number of element in the A set, and isolated character sorts as shown in table 3 according to occurrence number from small to large.According to the principle that characteristic set is chosen, preferentially choose the little character of occurrence number, the number scope that A set can be chosen element is 1~256: if elect as at 1 o'clock, then the A set just is preceding 1 of table 3, is { 0}; If elect as at 2 o'clock, then A set just is preceding 2 of table 3, and so that 0,1}; If elect as at 3 o'clock, then A set just is preceding 3 of table 3, and so that 0,1,4}; If elect as at 4 o'clock, then A set just be preceding 4 of table 3, for { 0,1,4, if 6}...... elects as at 256 o'clock, then the A set is all characters of table 3 just.
Isolated character occurrence number ordering back result in table 3 data
Sequence number Character Occurrence number Sequence number Character Occurrence number Sequence number Character Occurrence number
0 0 0 85 94 0 170 179 0
1 1 0 86 95 0 171 180 0
2 4 0 87 96 0 172 181 0
3 6 0 88 97 0 173 182 0
4 7 0 89 98 0 174 183 0
5 10 0 90 99 0 175 184 0
6 11 0 91 100 0 176 185 0
7 12 0 92 101 0 177 186 0
8 13 0 93 102 0 178 187 0
9 14 0 94 103 0 179 188 0
10 15 0 95 104 0 180 189 0
11 16 0 96 105 0 181 190 0
12 17 0 97 106 0 182 191 0
13 19 0 98 107 0 183 192 0
14 20 0 99 108 0 184 193 0
15 21 0 100 109 0 185 200 0
16 22 0 101 110 0 186 203 0
17 23 0 102 111 0 187 205 0
18 24 0 103 112 0 188 206 0
19 25 0 104 113 0 189 208 0
20 26 0 105 114 0 190 211 0
21 27 0 106 115 0 191 212 0
22 28 0 107 116 0 192 213 0
23 29 0 108 117 0 193 215 0
24 30 0 109 118 0 194 216 0
25 31 0 110 119 0 195 220 0
26 32 0 111 120 0 196 221 0
27 33 0 112 121 0 197 222 0
28 34 0 113 122 0 198 223 0
29 36 0 114 123 0 199 226 0
30 37 0 115 124 0 200 227 0
Connect table
31 38 0 116 125 0 201 231 0
32 39 0 117 126 0 202 234 0
33 40 0 118 127 0 203 235 0
34 41 0 119 128 0 204 236 0
35 42 0 120 129 0 205 237 0
36 43 0 121 130 0 206 239 0
37 44 0 122 131 0 207 244 0
38 45 0 123 132 0 208 246 0
39 46 0 124 133 0 209 248 0
40 47 0 125 134 0 210 * 84 * 1 *
41 48 0 126 135 0 211 * 214 * 1 *
42 49 0 127 136 0 212 * 35 * 1 *
43 50 0 128 137 0 213 * 196 * 1 *
44 51 0 129 138 0 214 * 217 * 1 *
45 52 0 130 139 0 215 * 218 * 1 *
46 53 0 131 140 0 216 * 197 * 1 *
47 54 0 132 141 0 217 * 198 * 1 *
48 55 0 133 142 0 218 * 2 * 1 *
49 56 0 134 143 0 219 * 201 * 1 *
50 57 0 135 144 0 220 * 224 * 1 *
51 58 0 136 145 0 221 * 225 * 1 *
52 59 0 137 146 0 222 * 202 * 1 *
53 60 0 138 147 0 223 * 18 * 1 *
54 61 0 139 148 0 224 * 228 * 1 *
55 62 0 140 149 0 225 * 229 * 1 *
56 63 0 141 150 0 226 * 204 * 1 *
57 64 0 142 151 0 227 * 232 * 1 *
58 65 0 143 152 0 228 * 88 * 1 *
59 66 0 144 153 0 229 * 8 * 1 *
60 67 0 145 154 0 230 * 207 * 1 *
61 68 0 146 155 0 231 * 9 * 1 *
62 69 0 147 156 0 232 * 238 * 1 *
63 70 0 148 157 0 233 * 209 * 1 *
64 71 0 149 158 0 234 * 241 * 1 *
65 72 0 150 159 0 235 * 242 * 1 *
66 73 0 151 160 0 236 * 243 * 1 *
67 74 0 152 161 0 237 * 210 * 1 *
68 75 0 153 162 0 238 * 245 * 1 *
69 76 0 154 163 0 239 * 5 * 1 *
70 77 0 155 164 0 240 * 247 * 1 *
71 78 0 156 165 0 241 * 3 * 1 *
Connect table
72 79 0 157 166 0 242 * 249 * 1 *
73 80 0 158 167 0 243 * 252 * 1 *
74 81 0 159 168 0 244 * 253 * 1 *
75 82 0 160 169 0 245 * 254 * 1 *
76 83 0 161 170 0 246 * 255 * 1 *
77 85 0 162 171 0 247 * 250 * 2 *
78 86 0 163 172 0 248 * 251 * 2 *
79 87 0 164 173 0 249 * 230 * 2 *
80 89 0 165 174 0 250 * 233 * 2 *
81 90 0 166 175 0 251 * 240 * 2 *
82 91 0 167 176 0 252 * 219 * 2 *
83 92 0 168 177 0 253 * 195 * 3 *
84 93 0 169 178 0 254 * 194 * 3 *
255 * 199 * 4 *
Isolated character in the his-and-hers watches 3, utilize formula (6) definition compression ratio:
r = L n = K 5 + K 6 + K 3 - K 4 n = K 5 n + K 6 n + K 3 n - K 4 n Substitution travels through searching, and the compression ratio (not considering r4) that various element number is corresponding different is as shown in table 4, adds * in the form and goes up the length of element that the target position is optimum.Make the r value of formula (6) reach the A set length that minimum is optimum, in order to obtain the A set length of each set length value, just must all calculate each length, if directly calculate according to formula (6), amount of calculation can be very big, can draw from above-mentioned explanation, length is that length is the subclass of (L+1) for the A set of (L), only many elements, so-called recursive operation, promptly only needing computational length is 1 A set r value, later method recursion according to difference, promptly only need to calculate the value of r (L+1)-r (L), promptly can be drawn by formula (3) (4) (5), K5 is a constant, be 0 after the difference, K6 only has the occurrence number that increases that element, and K4 is indivisible after difference, need not to calculate.Through above-mentioned processing, can significantly reduce the amount of calculation of the compression ratio r that calculates every kind of length.
As can be seen from the table, the characteristic set element number all is identical in the compression ratio that 8~210 scope draws, and this mainly is because data acquisition system length very little, has only 100 bytes to cause.
The compression ratio that the various element numbers of table 4 are corresponding down.
Element number Compression ratio Element number Compression ratio Element number Compression ratio Element number Compression ratio Element number Compression ratio
3 0.93 54 0.73 105 0.73 156 0.73 207 0.73
4 0.79 55 0.73 106 0.73 157 0.73 208 0.73
5 0.77 56 0.73 107 0.73 158 0.73 209 0.73
6 0.75 57 0.73 108 0.73 159 0.73 210 0.73
Connect table
7 0.75 58 0.73 109 0.73 160 0.73 211 0.74
8 * 0.73 * 59 0.73 110 0.73 161 0.73 212 0.75
9 0.73 60 0.73 111 0.73 162 0.73 213 0.76
10 0.73 61 0.73 112 0.73 163 0.73 214 0.77
11 0.73 62 0.73 113 0.73 164 0.73 215 0.78
12 0.73 63 0.73 114 0.73 165 0.73 216 0.79
13 0.73 64 0.73 115 0.73 166 0.73 217 0.80
14 0.73 65 0.73 116 0.73 167 0.73 218 0.81
15 0.73 66 0.73 117 0.73 168 0.73 219 0.82
16 0.73 67 0.73 118 0.73 169 0.73 220 0.83
17 0.73 68 0.73 119 0.73 170 0.73 221 0.84
18 0.73 69 0.73 120 0.73 171 0.73 222 0.85
19 0.73 70 0.73 121 0.73 172 0.73 223 0.86
20 0.73 71 0.73 122 0.73 173 0.73 224 0.87
21 0.73 72 0.73 123 0.73 174 0.73 225 0.88
22 0.73 73 0.73 124 0.73 175 0.73 226 0.89
23 0.73 74 0.73 125 0.73 176 0.73 227 0.90
24 0.73 75 0.73 126 0.73 177 0.73 228 0.91
25 0.73 76 0.73 127 0.73 178 0.73 229 0.92
26 0.73 77 0.73 128 0.73 179 0.73 230 0.93
27 0.73 78 0.73 129 0.73 180 0.73 231 0.94
28 0.73 79 0.73 130 0.73 181 0.73 232 0.95
29 0.73 80 0.73 131 0.73 182 0.73 233 0.96
30 0.73 81 0.73 132 0.73 183 0.73 234 0.97
31 0.73 82 0.73 133 0.73 184 0.73 235 0.98
32 0.73 83 0.73 134 0.73 185 0.73 236 0.99
33 0.73 84 0.73 135 0.73 186 0.73 237 1.00
34 0.73 85 0.73 136 0.73 187 0.73 238 1.01
35 0.73 86 0.73 137 0.73 188 0.73 239 1.02
36 0.73 87 0.73 138 0.73 189 0.73 240 1.03
37 0.73 88 0.73 139 0.73 190 0.73 241 1.04
38 0.73 89 0.73 140 0.73 191 0.73 242 1.05
39 0.73 90 0.73 141 0.73 192 0.73 243 1.06
40 0.73 91 0.73 142 0.73 193 0.73 244 1.07
41 0.73 92 0.73 143 0.73 194 0.73 245 1.08
42 0.73 93 0.73 144 0.73 195 0.73 246 1.09
43 0.73 94 0.73 145 0.73 196 0.73 247 1.10
44 0.73 95 0.73 146 0.73 197 0.73 248 1.12
45 0.73 96 0.73 147 0.73 198 0.73 249 1.14
46 0.73 97 0.73 148 0.73 199 0.73 250 1.16
Connect table
47 0.73 98 0.73 149 0.73 200 0.73 251 1.18
48 0.73 99 0.73 150 0.73 201 0.73 252 1.20
49 0.73 100 0.73 151 0.73 202 0.73 253 1.22
50 0.73 101 0.73 152 0.73 203 0.73 254 1.25
51 0.73 102 0.73 153 0.73 204 0.73
52 0.73 103 0.73 154 0.73 205 0.73
53 0.73 104 0.73 155 0.73 206 0.73
By on can draw, optimum characteristic set number is 8, tables look-up 3 then, can draw characteristic set is preceding 8 characters of table 3, i.e. A set is as shown in table 5:
Table 5 characteristic character set
Sequence number Character
0 0
1 1
2 4
3 6
4 7
5 10
6 11
7 12
According to mapping policy, can represent that continuous number of times carries out following mapping to each character among the characteristic character set A, as shown in table 6:
Table 6 characteristic character and continuous occurrence number mapping table
The mapping number of times Character
0 0
1 1
2 4
3 6
4 7
5 10
6 11
7 12
So far, choosing of characteristic character set finished.
Choose preceding 10 bytes and encode as example, data are 2,2,2,194,197,199,214,217,238,255, and the front occurs 32 continuously, table look-up 6, what number of times were shone upon in expression 3 is character 6, so, output 6,2, next an isolated character 194, because not in characteristic character, directly output, thereafter 197,199,214,217 all is isolated character and not in characteristic character set, all directly exports.Next coming in order are analogized, the output code flow result who uses the set run coding to draw is (data by rows) shown in the table 7, add * in the table and go up target for non-isolated character having occurred, and used characteristic character, final coding back length is 87 bytes, much smaller than 137 byte lengths of original Run-Length Coding.
This set run of table 7 coding output code flow
6 * 2 * 194 197 199 214 217 238 255 240
6 * 81 * 35 6 * 32 * 243 242 241 245 247
6 * 137 * 4 * 86 * 84 88 18 209 210 4 *
211 * 4 * 213 * 218 219 4 * 221 * 219 4 * 71 *
6 * 72 * 4 * 73 * 12 * 82 * 225 224 230 232
233 228 229 233 230 204 199 201 198 195
199 4 * 200 * 199 196 207 202 2 3 5
8 9 10 * 119 * 240 249 253 254 250 251
250 251 252 194 195 194 195
So far all codings are finished compressed file and are promptly finished, and use the data after technology well known in the art will be compressed to deposit computer storage in.Decode according to A set during decompression and get final product.
Compression method of the present invention can be applied to following occasion: 1, the use of radar data recorder, can implement and the compression radar data.2, when channel resource is limited, can be used for chnnel coding, save communication bandwidth.3, be used for the harmless audio frequency and video occasion of various requirement.4, be used to replace the entropy coding of existing lossy coding, replaceable as the distance of swimming compression among the existing JPEG is the set run coding method of this patent, realizes better compression effectiveness.
The invention provides a kind of thinking and method of the data compression method based on set run; the method and the approach of this technical scheme of specific implementation are a lot; the above only is a preferred implementation of the present invention; should be understood that; for those skilled in the art; under the prerequisite that does not break away from the principle of the invention, can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.The all available prior art of each component part not clear and definite in the present embodiment is realized.

Claims (4)

1, a kind of data compression method based on set run may further comprise the steps:
(1) reads data to be compressed in the computer storage;
(2) characteristic character set in the searching data;
(3) encode;
(4) data after will compressing deposit computer storage in;
It is characterized in that described step (2) is sought the characteristic character set in the data, may further comprise the steps:
(101) the isolated character of statistics;
(102) judge whether to exist isolated character, be execution in step (103) then, otherwise all characters are all turned to characteristic character, the continuous occurrence number of each characteristic character mapping is its numerical value itself, and changes above-mentioned steps (3) over to;
(103) number of times that will isolate the character appearance sorts from small to large;
(104) travel through the various element numbers that all characteristic character set may occur, choose the number that the value that makes compression ratio r reaches the characteristic character set element of minimum;
(105) each element map in the characteristic character set is arrived continuous occurrence number.
2, a kind of data compression method according to claim 1 based on set run, it is characterized in that, in the described step (105) mapping policy be in the characteristic character set each characteristic character to shine upon the sequence number that sorts in continuous occurrence number and the step (103) identical.
3, a kind of data compression method based on set run according to claim 1 is characterized in that, sets the initial value of a compression ratio r in the described step (104), the r value recursion of follow-up each element number.
4, a kind of data compression method based on set run according to claim 1 is characterized in that, described step (3) is encoded, and may further comprise the steps:
Judge whether to isolated character that (201) if then carry out step (202), look into A set mapping table otherwise carry out step (203), current character exported again in the character of output mapping earlier;
(202) judge whether character into characteristic character set, look into the characteristic character set mapping table if then carry out step (204), output earlier is mapped as the character of 1 characteristic character set, exports current character again; Otherwise carrying out step (205) directly exports character.
CN2008101228946A 2008-06-30 2008-06-30 Data compression method based on set run Expired - Fee Related CN101299611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101228946A CN101299611B (en) 2008-06-30 2008-06-30 Data compression method based on set run

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101228946A CN101299611B (en) 2008-06-30 2008-06-30 Data compression method based on set run

Publications (2)

Publication Number Publication Date
CN101299611A true CN101299611A (en) 2008-11-05
CN101299611B CN101299611B (en) 2011-10-05

Family

ID=40079318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101228946A Expired - Fee Related CN101299611B (en) 2008-06-30 2008-06-30 Data compression method based on set run

Country Status (1)

Country Link
CN (1) CN101299611B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102064833A (en) * 2010-12-17 2011-05-18 曙光信息产业(北京)有限公司 Regular expression compressing method for DFA (Discriminant Function Analysis)
CN102595496A (en) * 2012-03-08 2012-07-18 西北大学 Context-adaptive quotient and remainder encoding method used for sensing data of wireless sensing nodes
EP2950451A1 (en) * 2014-05-28 2015-12-02 Nxp B.V. Signal-based data compression
CN105191145A (en) * 2013-03-01 2015-12-23 古如罗技微系统公司 Data encoder, data decoder and method
CN105359418A (en) * 2013-03-01 2016-02-24 古如罗技微系统公司 Encoder, decoder and method
CN106170760A (en) * 2014-07-11 2016-11-30 华为技术有限公司 A kind of method and device of the expection compression ratio calculating data
CN106507108A (en) * 2016-12-07 2017-03-15 中国石油大学(华东) Method and device for image encoding and decoding
CN106685429A (en) * 2016-12-29 2017-05-17 广州华多网络科技有限公司 Integer compression method and device
CN107565970A (en) * 2017-08-17 2018-01-09 郑州云海信息技术有限公司 A kind of the mixing lossless compression method and device of feature based identification

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102064833A (en) * 2010-12-17 2011-05-18 曙光信息产业(北京)有限公司 Regular expression compressing method for DFA (Discriminant Function Analysis)
CN102595496A (en) * 2012-03-08 2012-07-18 西北大学 Context-adaptive quotient and remainder encoding method used for sensing data of wireless sensing nodes
CN105191145A (en) * 2013-03-01 2015-12-23 古如罗技微系统公司 Data encoder, data decoder and method
CN105359418A (en) * 2013-03-01 2016-02-24 古如罗技微系统公司 Encoder, decoder and method
CN105359418B (en) * 2013-03-01 2018-11-09 古如罗技微系统公司 Encoder, decoder and coding-decoding method
CN105191145B (en) * 2013-03-01 2018-09-25 古如罗技微系统公司 Data encoder, data decoder and decoding method
CN105277940B (en) * 2014-05-28 2018-04-06 恩智浦有限公司 Data compression based on signal
EP2950451A1 (en) * 2014-05-28 2015-12-02 Nxp B.V. Signal-based data compression
CN105277940A (en) * 2014-05-28 2016-01-27 恩智浦有限公司 Signal-based data compression
US9448300B2 (en) 2014-05-28 2016-09-20 Nxp B.V. Signal-based data compression
CN106170760A (en) * 2014-07-11 2016-11-30 华为技术有限公司 A kind of method and device of the expection compression ratio calculating data
CN106507108A (en) * 2016-12-07 2017-03-15 中国石油大学(华东) Method and device for image encoding and decoding
CN106507108B (en) * 2016-12-07 2018-04-17 杜昀晓 Image coding, decoded method and apparatus
CN106685429A (en) * 2016-12-29 2017-05-17 广州华多网络科技有限公司 Integer compression method and device
CN106685429B (en) * 2016-12-29 2020-07-10 广州华多网络科技有限公司 Integer compression method and device
CN107565970A (en) * 2017-08-17 2018-01-09 郑州云海信息技术有限公司 A kind of the mixing lossless compression method and device of feature based identification
CN107565970B (en) * 2017-08-17 2021-01-15 苏州浪潮智能科技有限公司 Hybrid lossless compression method and device based on feature recognition

Also Published As

Publication number Publication date
CN101299611B (en) 2011-10-05

Similar Documents

Publication Publication Date Title
CN101299611B (en) Data compression method based on set run
CN112953550B (en) Data compression method, electronic device and storage medium
US5973630A (en) Data compression for use with a communications channel
EP0695040B1 (en) Data compressing method and data decompressing method
CN102122960B (en) Multi-character combination lossless data compression method for binary data
CN106407285B (en) A kind of optimization bit file compression & decompression method based on RLE and LZW
CN108810553B (en) Mobile node monitoring data sequence compression method based on sparse processing
CN101667843B (en) Methods and devices for compressing and uncompressing data of embedded system
CN112003625A (en) Huffman coding method, system and equipment
CN100517979C (en) Data compression and decompression method
US6919826B1 (en) Systems and methods for efficient and compact encoding
CN112968706B (en) Data compression method, FPGA chip and FPGA online upgrading method
CN110021369B (en) Gene sequencing data compression and decompression method, system and computer readable medium
CN110233626B (en) Mechanical vibration signal edge data lossless compression method based on two-dimensional adaptive quantization
CN116016606B (en) Sewage treatment operation and maintenance data efficient management system based on intelligent cloud
CN115882866A (en) Data compression method based on data difference characteristic
CN103546161A (en) Lossless compression method based on binary processing
CN105120276A (en) Adaptive Motion JPEG coding method and system
CN110995753A (en) Combined Compression Method of Remote Communication Messages in Electricity Information Collection System
CN104394415B (en) A kind of method of video big data distribution decoding
US6292115B1 (en) Data compression for use with a communications channel
JP2022048930A (en) Data compression method, data compression device, data compression program, data decompression method, data decompression device, and data decompression program
Mathpal et al. A research paper on lossless data compression techniques
Mahmood et al. An Efficient 6 bit Encoding Scheme for Printable Characters by table look up
Ibrahim et al. Comparison between (rle and huffman) algorithmsfor lossless data compression

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111005

Termination date: 20130630