CN1777038A

CN1777038A - Two-dimensional vector data compression method

Info

Publication number: CN1777038A
Application number: CN 200510019931
Authority: CN
Inventors: 吴华意; 朱海军; 刘异
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2005-12-01
Filing date: 2005-12-01
Publication date: 2006-05-24
Anticipated expiration: 2025-12-01
Also published as: CN100546196C

Abstract

The method divides original vector data into blocks of vector data. Next, loss compression based on DCT transform is carried out for the blocks of vector data according to one of two modes. Then, entropy coding is carried out for the obtained DCT transform so as to obtain compressed blocks of vector data. Finally, all compressed blocks of vector data are organized as binary bit steam in specific format. Main effects of the invention are: compression ratio near to 20 is reached for vector data formed natural in wide range such as contour line, road network and water system etc; SNR reaches to 60dB; and maximal absolute deformation is about 10-5 order of magnitude of dynamic range of original vector data. Features are: controllable distortion generated in reconstructing error and compression process; providing technical base for asymptotic transmission of vector data flexibly and in high efficiency by combining asymptotic compression technique in space domain and in frequency domain.

Description

A kind of compression method of two-dimensional vector data

Technical field

The present invention relates to field of information processing, particularly a kind of compression method of two-dimensional vector data.

Background technology

1.GIS the characteristics of middle two-dimensional vector data reach its necessity of compressing:

The two-dimensional space vector data is one of current GIS-Geographic Information System key data type of managing and using, and has characteristics such as data volume is big, skewness, topological relation complexity.Vector data is generally represented by the form of coordinate string among the GIS, coordinate data wherein is generally floating-point format, along with the ability of obtaining data improve and the user to the requirement of data precision, in order to express the real geographical world better, GIS vector data amount is very huge, needs a large amount of storage and transfer resource.Network service emerging GIS applications such as (referring to document [1]) in geography information, vector data need be connected with unsettled Internet via limited bandwidth and sends the user to, need the data volume size of the vector data of transmission to become decision application key of success, directly adopt original coordinate data can't meet the demands as the carrier of two-dimensional space vector data.

H.263/4 or the like in multi-medium data fields such as image, video and audio frequency, ripe international standard is all arranged at present, as JPEG, JPEG2000, MPEG1/2/4.These standards provide a kind of to huge multi-medium data effectively possess high compression ratio and high performance compression method in, huge commercial opportunity also is provided, making based on the online browse of these multi-medium datas and using in real time becomes possibility.To the GIS vector data, also there is not at present a kind of so special compression method can under the prerequisite of the geography information of not losing vector data and being contained, reduce the data volume of vector data effectively.The storage and the efficiency of management that this has not only influenced spatial data also become the bottleneck of geography information network service, hindered combining of GIS-Geographic Information System and Internet Distributed Application greatly.

Under this background, by decomposition that the complexity analyzing of two-dimensional space vector data inside and information are constituted and to the utilization of existing Techniques for Multimedia Compression with in the innovation aspect the vector data compression, the present invention proposes two-dimensional vector data is carried out compression method based on block transform coding.

2. the characteristic analysis of two-dimensional vector data and compression scheme are selected:

Compare with view data, two-dimensional vector data has complex inner structure.Vector data file constitutes by being positioned at inner all the interested object vectors of mapping scope, and the arrangement of the point of these object vector inside is orderly, yet the arrangement of object vector in vector data file inside is unordered.

As everyone knows, in view data, a main forms of its redundancy is exactly the correlation on the gray value between the adjacent image point.Existing image compression international standard has adopted the block transform coding method, by being the approximate incoherent conversion coefficient of frequency domain with view data from space field transformation, then conversion coefficient is carried out independently scalar quantization and utilizes this correlation.This method has been avoided the optimum in theory still too big vector quantization method of amount of calculation of direct employing, has also effectively utilized the correlation of initial data inside simultaneously, is " optimum " scheme of a kind of reality.

We find, also exist very strong long serial correlation in the sample point coordinate sequence data that constitutes vector data, thereby also exist a large amount of redundancies.As the digitized representations in the geographical world of reality, these coordinate sequences have shown the variation tendency of geographic object (as river, road etc.).Because mild property variation and a large amount of existence of cyclic variation in geographic object, make the inner sample point coordinate data sequence of vector data also have corresponding characteristics and show the said correlation in our front, show that also it is feasible that vector data is compressed.

It not is directly to be transplanted on the vector data Image Data Compression algorithm of maturation so simple that vector data is compressed.No matter how big view data can be regarded a two-dimensional matrix with several components (as RGB) as to data volume, and that the internal structure of this rule is a vector data is not available.Therefore, seek effective compression expression of vector data, must fully take into account of the influence of the structure of vector data compression algorithm and compression performance.

During the linear transformation adopted in selecting the vector data compression scheme, we have also adopted DCT (discrete cosine transform, english abbreviation) conversion as numerous image compression algorithms.Although from the meaning KL conversion (Kahunen-Lo é ve) of decorrelation is optimum linear transformation [2], but the transformation matrix of KL conversion is relevant with the statistical nature of the required data of handling, can not determine in advance, and it calculates and does not still have fast algorithm so far.Theoretical proof is to single order Markov signal, and dct transform is good being similar to the KL conversion when its coefficient correlation ρ approaches 1.Experiment shows that its coefficient correlation approaches 1 really very much when adopting the single order Markov process to carry out modeling to vector data, and because the dct transform matrix can be determined in advance, its calculating has multiple fast algorithm, and it all is feasible therefore upward adopting dct transform that vector data is compressed to practice theoretically.

3. quantize distortion with the two-dimensional vector data compression:

Quantizing process can produce the loss of precision, and reduce distortion is one of target of two-dimensional vector data compression pursuit as far as possible.The MSE (Mean Square Error, mean square error) that adopt also have other criterion as distortion metrics [3] more in the lossy compression method of multi-medium data at present, and these criterions respectively have its pluses and minuses.As the MSE criterion, it has calculates simply, explicit physical meaning, advantage such as is convenient to be optimized, and its outstanding shortcoming then is not meet with human apperceive characteristic in some aspects, bigger MSE value not necessarily correspondence relatively poor perceived quality.In addition, MSE has embodied the effect of a kind of " on average ", i.e. Zheng Ti distortion reached given standard, therefore, MSE can't carry out local control to the error of single pixel or audio sample value, may or to rebuild at reconstructed image have other data error more remarkable in the audio frequency, but but be the requirement that meets distortion control on the whole.

To the dct transform coefficient of vector data, adopt different quantization tables to carry out the uniform quantization meeting and obtain different compression result, obtain different distortions.Present patent application does not specify quantization table, does not specify the module of distortion yet.

4. the progressive compression and the transmission of vector data:

Progressive compression and transmission are emerging technology that is applied to multi-medium data in recent years, are characterized in according to user's the requirement and the factors such as ability of the network bandwidth, demonstration and computing equipment, and the data of transmitting suitable size are given the user.This transmission course can be ended by the user at any time, and the data that obtained have then constituted the expression of a compression of former multi-medium data.The advantage of this technology is that the user can obtain the compression expression of initial data in real time, and needn't wait for that whole transfer of data finishes, and this user for low bandwidth is even more important.The utilization of this technology need be carried out inherent combination to multi-medium data, and the JPEG2000 Standard of image compression provides good support (referring to document [4]) to this technology.

For vector data, the asymptotic transmission plan that Many researchers proposes only limits to choice and the transmission to spatial data points, that is to say the progressive expression that only limits to spatial domain, and it and the existing original storage mode of vector data are closely-related.Owing to there is not deep vector data frequency domain to represent mode as support, so these progressive transmission schemes can't relate in the vector data frequency domain and represent the frequency domain progressive transmission mode of carrying out on the basis.

Comprehensively with on, transform coding method is applied to the compression of two-dimensional space vector data, and its key problem in technology is frequency domain analysis, redundancy analysis, the evaluation of vector data compression artefacts, the Data Rate Distribution algorithm that is suitable for the vector data coding, the asymptotic transmission of vector data frequency domain of vector data or the like.With regard to above these problems, the researcher does not also propose total solution at present, just in research, have spatial database, the service of spatial data network, WebGIS or the like related, but do not carry out deep research, more do not form the method for system.

Summary of the invention

Technical problem to be solved by this invention is: by the decomposition that the frequency domain analysis of two-dimensional space vector data and information are constituted, use for reference Techniques for Multimedia Compression, the vector data compression scheme is proposed, a kind of novel two-dimensional space vector data compression method based on transition coding is provided, and the quick and effective of realization two-dimensional vector data stored, manages and transmit.

The technical scheme that the present invention solves the problems of the technologies described above is: behind the two-dimensional vector data piecemeal, use and based on the coding method of dct transform two-dimensional vector data is carried out lossy compression method, and be output as the bit stream of specified format.

What the present invention proposed carries out compression method based on transition coding to two-dimensional vector data, has following significant effect:

One. through test, in that the far-ranging vector data that forms naturally has reached under the situation of nearly 20 compression ratio to contour, road network, water system etc., its signal to noise ratio is up to 60dB, and maximum definitely distortion only is 10 of a former vector data dynamic range ^-5About the order of magnitude;

They are two years old. and the self adaptation piecemeal to the two-dimensional space vector data has generated the vector data piece that is suitable for block transform coding in the labyrinth that does not destroy spatial data self;

They are three years old. and adopt suitable quantization method, can control reconstruction error, thus the distortion that is produced in the control compression process;

They are four years old. in conjunction with the progressive compress technique of spatial domain and frequency domain, for providing technical foundation with the asymptotic transmission of vector data efficiently flexibly.By can satisfy user's demand better in conjunction with the new asymptotic more efficiently transmission plan that expression proposed of spatial domain and frequency domain vector data.

Description of drawings

Fig. 1: the basic flow sheet of compression method of the present invention

Fig. 2: the block algorithm block diagram, wherein N is 2 ⁿ, n specifies for compression is preceding.

Fig. 3: vector data compression main program block diagram.

Fig. 4: the compressing vector data piece is carried out the merogenesis schematic diagram according to given accuracy.

Fig. 5: jvg compressed file structure.

Fig. 6: the data before the compression.

Fig. 7: the data that compression back decompress(ion) obtains.

Fig. 8: the overlapping demonstration of before the compression and compression back data.Wherein: before the compression be black, and compression is a grey afterwards, black color dots by the grey lines and short-term be distortion cause not quite identical.

Fig. 9: an overlapping demonstration of local data after amplifying 3 times.

Figure 10: an overlapping demonstration of local data after amplifying 8 times.

Embodiment

The invention will be further described below in conjunction with example and accompanying drawing.

The present invention is a kind of compression method of two-dimensional space vector data, specifically: at first original vector data is divided into the vector data piece, according to one of two kinds of patterns the vector data piece is carried out lossy compression method based on dct transform again, then the dct transform coefficient that obtains is carried out entropy coding and obtain the compressing vector data piece, at last all compressing vector data pieces are organized as the bit stream of specified format, Fig. 1 is the basic flow sheet of compression method among the present invention.

The present invention adopts following steps that two-dimensional vector data is compressed.

One. the two-dimensional vector data collection is divided into vector data piece (abbreviation piecemeal)

1. the arrangement of two-dimensional vector data:

The handled original vector data of the present invention is a data set that comprises one or more geometric objects (generally being a plurality of), can be a file, also can be binary system or text data stream.A geometric object is formed (for example, having the polygon on island, the bifurcated in river etc.) by one or more independent parts, and each part is formed according to being linked in sequence of storage by the point that quantity does not wait.

If the data of compression do not meet above-mentioned model, before employing the technology of the present invention, must put in order in advance, perhaps arrangement in real time is the form of appointment in the process of implementing compression.

2. the two-dimensional vector data collection is divided into the vector data piece

The process that two-dimensional vector data is divided into the vector data piece is exactly a plurality of vector data pieces that each geometric object are divided into identical size respectively according to the needs of compression algorithm.For the ease of carrying out the quick calculating of dct transform, all contain 2 in each vector data piece ⁿIndividual two-dimemsional number strong point, n is the integer greater than 1, specifies before compression.

For consistent with the original structure of vector data, piecemeal carries out in object vector inside.The number of the point of storage can not all be 2 in the geometric object (and various piece) ⁿMultiple, extract the vector data piece from the various piece order after, to the remaining point of various piece, artificially the point that replenishes sufficient amount is to constitute a complete vector data piece, block algorithm is seen Fig. 2.N among Fig. 2 is 2 ⁿBlock algorithm travels through each part of a geometric object successively, by the original order extraction 2 of point ⁿThe vector data piece of size is wherein had a few complete vector data piece from some parts, is called GDB (GeometryData Block).To remaining less than 2 behind each extracting section GDB ⁿPoint, reconstruct a complete vector data piece after artificially replenishing the point of sufficient amount, be called GSDB (Geometry Supplement Data Block).The coordinate of the point that replenishes equals the mean value of original coordinate of having a few among the GSDB.Compression to two kinds of vector data pieces of GDB and GSDB is the same, but in the compressing vector data file that forms, and the head of compressing vector data piece has comprised the information of the quantity of the actual data point that comprises in the type of indicating this piece and this piece.

Two. based on the lossy compression method (being called for short coding) of dct transform

Referring to Fig. 3, adopt following step that vector data is compressed.

1.DCT conversion:

To all vector data pieces, select one of two kinds of patterns that x and y coordinate data are carried out the one dimension dct transform respectively.First kind of pattern is at first the data of vector data piece to be quantized, then the integer dct transform that quantized data be can't harm; Second kind of pattern is directly to use the floating-point dct transform that diminishes, and conversion quantizes afterwards, if do not consider the rounding error in the floating-point operation, it is harmless that the floating-point dct transform also can be regarded as, but the follow-up quantification to conversion coefficient then diminishes.

One-dimensional sequence x (i), i=0,1, the dct transform of Λ N-1 and being inversely transformed into

X (k) = c_{k} Σ_{i = 0}^{N - 1} x (i) \cos \frac{(2 i + 1) kπ}{2 N}

x (i) = Σ_{k = 0}^{N - 1} c_{k} X (k) \cos \frac{(2 i + 1) kπ}{2 N}

Wherein

c_{0} = 1 / \sqrt{N}, c_{k} = \sqrt{2 / N}, 1 \leq k \leq N - 1 .

The integer dct transform is derived by the floating-point dct transform and is got, specifically can be referring to document [6].This patent does not limit the type of integer dct transform of concrete use and parameter wherein, as long as forward integer dct transform (Forward IntegerDCT) and reverse integer dct transform (Inverse Integer DCT) correspondence mutually.

2. quantize and reconstruction:

Quantification in first kind of pattern is an example with the uniform quantization of x component, and its method is: suppose that current vector data is [XMIN, XMAX] in the scope of x direction, specifying the quantization step of x data is Δ _x, so the x data in all vector data pieces are carried out following quantification:

Wherein

Be downward rounding operation, follow-up conversion and coding are to q _x(i) carry out, and can't harm.

To having obtained q behind the compressing vector data file decompress(ion) _x(i), it is redeveloped into the interval mid point of former quantification, just:

\hat{x} (i) = q_{x} (i) Δ_{x} + XMIN

Quantification in second kind of pattern is an example with the uniform quantization of x component, and its method is: suppose that the DCT coefficient after current vector data is through the floating-point dct transform is X (k), and k=0,1, Λ, N-1, the appointment quantization step is a Δ _Ix, so all X (k) are carried out following quantification:

Wherein

Be downward rounding operation, follow-up coding is to Q _x(k), k=0,1, Λ, N-1 carries out, and can't harm.Notice that generally be different for its quantization step of different conversion coefficients of x coordinate data this moment, do not have only a quantization step directly coordinate data being quantized and do not resemble.

Similarly, to obtaining Q behind the compressing vector data file decompress(ion) _X(k), it is redeveloped into the interval mid point of former quantification, just

\hat{X} (k) = Q_{X} (k) Δ_{kX}

Attention: no matter be first kind of pattern or second kind of pattern, the lossy compression method of vector data piece has all been obtained one group of integer dct transform coefficient.In follow-up harmless entropy coding (ground floor coding), will no longer distinguish this two kinds of patterns, and uniformly these coefficients will be called the integer dct transform coefficient.For diminishing quantification, this patent does not specify the step-length of quantification and the tolerance of quantizing distortion, and different quantization steps can produce different quantizing distortion, is not quite similar in different distortion metrics lower compression effects yet.How passing through to set distortion metrics and optimize quantization method, is the content of opening under the compress technique framework of this patent.

The parameter that quantizes generally is different with the difference of vector data, different quantization parameters will produce different quantization transform coefficients and last compression effectiveness, therefore this patent does not limit special quantization parameter especially yet, and the user of patent can select quantization parameter as required or select the method for parameter.

3.DC the processing of coefficient:

Dct transform coefficient can be divided into two classes, and first coefficient is called the DC coefficient, is the average of participating in all data of conversion; Other coefficient is called the AC coefficient.Because the DC coefficient is directly relevant with the position of vector data piece, therefore often have very strong correlation between the DC coefficient of all vector data pieces of same geometric object, be suitable for adopting predictive coding to compress.In the present invention, in a geometric object (comprising its a plurality of parts), the DC coefficient in first vector data piece, the DC coefficient of all subsequent vector data blocks is all calculated the difference of the DC coefficient of itself and previous compression data block, then this difference is carried out huffman coding.

The example of difference huffman coding table sees attached list 1.First row of this table are capable number, and the 3rd row are Huffman code words of this row, and every row comprises integer range ((2 ⁱ-1) ,+(2 ⁱ-1)), but do not comprise between the mesozone ((2 ^I-1-1) ,+(2 ^I-1-1)).The code word that difference is encoded comprises two parts, the monobasic Huffman code word that first is expert at for this difference; Second portion is the complement of one's (not needing sign bit) of this difference.For the DC coefficient in first vector data piece in the geometric object, directly use this table to encode.This coding method can be referring to document [6].The difference huffman coding of DC coefficient can illustrate as follows.The DC coefficient of supposing first vector data piece in the current geometric object is 1787, the DC coefficient of second vector data piece is 1783, the DC coefficient of first vector data piece is carried out direct coding obtain 111111110|11011111011, actual coding is difference-4 between the two in second vector data piece, is encoded to 100|011.

4. ground floor is encoded:

To each vector data piece, through all obtaining two integer dct transform coefficient pieces (corresponding respectively to x and y coordinate data) after the lossy compression method of first kind of pattern or second kind of pattern.First component of this piece is the DC coefficient, and it is handled and discusses in front, the distance of swimming huffman coding that all remaining AC coefficients can't harm in the following manner.

At first the AC coefficient block is divided into (nonzero coefficient, zero run-length length) right.The zero run-length lengths table is shown in the number that this nonzero coefficient front is zero AC coefficient.Right to each, at first find the line number R of nonzero coefficient in the difference huffman coding table, in distance of swimming huffman coding table, find the code word of position then corresponding to Z/R (Z is row number, and R is row number), wherein Z is the length of zero run-length.In this table, R is since 1 counting, and Z is since 0 counting.This code word back adds that the radix-minus-one complement (not needing sign bit) of nonzero coefficient is exactly this right coding.Last a string zero (number is indefinite, also may be 0) of AC coefficient block represented with EOB (End Of Block, block end).Distance of swimming huffman coding table comprised (zero run-length length, nonzero coefficient is expert at) to the Huffman code word of EOB.The example of a distance of swimming huffman coding table sees attached list 2.The distance of swimming huffman coding of AC coefficient block can illustrate as follows.Suppose the size N=16 of discrete cosine transform block, the AC coefficient block of desire coding is 10,0,0,2,0,0,0,0,0,0,0,0,0,0,0, so this piece be divided into 2 (nonzero coefficient, zero run-length length) to an EOB.The front of first nonzero coefficient 10 does not have zero coefficient, and it is arranged in difference huffman coding table the 4th row, and therefore its corresponding code word is that it is encoded to 1011|110 at 1011 of 0/4 place in distance of swimming huffman coding table.Obtain equally the back nonzero coefficient 2 be encoded to 11111001|10.Last this string AC coefficient is encoded as:

101111011111001101010

Wherein last 1010 is codings of EOB.

In addition, the length of zero run-length must not exceed 15, if exceed then represent it with a code word ZRL, and restarts zero run-length length is counted.For instance, if between two non zero AC coefficients 17 zero AC coefficients are arranged, be split into a ZRL and zero run-length length so and be 2 (nonzero coefficient, zero run-length length) right.

5. the second layer is encoded:

If directly the bit stream that obtains is output as binary file after the ground floor end-of-encode, so resulting compression vector file will have unique precision.Have only after the whole file decompression, the decompress(ion) vector file that just can obtain having given accuracy, only decompress(ion) part compression position flow then can not get the significant expression to former vector file.By reorganization to compression position flow, can make compression position flow carry out the part decompress(ion) after, obtain lower accuracy or decompress(ion) vector file more among a small circle.And if the user needs more high accuracy or wider vector data, then can further carry out decompress(ion) until ending to remaining compression position flow.This organizational form makes compression position flow have progressive character, and the user can need select corresponding bit-stream to carry out decompress(ion) according to it, just can use and need not carry out decompress(ion) to whole file.Above-mentioned precision or scope according to vector data organized compression position flow so that export the process that bit stream has approximate progressive character and is called second layer coding.

Second layer coding is by the realization of cutting apart to the compressing vector data piece.Suppose that the precision that former compressing vector data reaches is ε _Total, this ε _TotalRefer between all data points and the former data point behind the decompress(ion) a certain prior given threshold value (for example, 10 that do not exceed in x direction and y deflection error ^-5).Given one group of S accuracy class { ε from low to high ₁, ε ₂, Λ, ε _S, ε wherein _S=ε _TotalAll compressing vector data pieces all will be split into the S joint so, make that decompressed data has reached ε when only decompress(ion) the 1st saves ₁Accuracy class, and the like.Similarly, to whole compressing vector data file, (during the individual accuracy class of 1≤k≤S), whole decompress(ion) vector data file has also reached k accuracy class when will wherein all compressing vector data pieces all unziping to k.Therefore will compress the k section that the set that belongs to the compression position flow that k saves in all compressing vector data pieces of vector file (perhaps its certain scope, explanation hereinafter) is called this compression vector file (perhaps its certain scope).Having comprised several (nonzero coefficient, zero run-length length) right codings in the joint that forms after cutting apart single compressing vector data piece, also may be zero, can not interrupt (nonzero coefficient, zero run-length length) right coding but cut apart.Form joint afterwards in the code word that must add EOB at last (in the above example, being 1010) of these joints,, so directly represent this joint with EOB if this joint is empty.Schematic diagram of cutting apart with five accuracy classes is seen Fig. 4.The mode of determining and the compressing vector data piece being cut apart according to the accuracy class of determining of accuracy class can be specified by using, and is not the content of this patent.

Among the present invention vector data being carried out second layer coding has dual mode, is respectively precision mode of priority and scope mode of priority.In the precision mode of priority, the low precision version of whole vector data at first is extracted, and the data of decompress(ion) will progressively improve the precision of whole vector data then.In the scope mode of priority, at first whole vector data is divided into several scopes, all scope non-overlapping copies all are made up of several geometric objects.Then each scope is carried out the preferential tissue of precision.The division of scope is determined by using, is not the content of this patent.Because the precision mode of priority also can be regarded as whole file is considered as a scope, can think that therefore the compressing vector data bit stream is made up of one or more scope, a scope then is made up of one or more section.

Illustrate this two kinds of second layer coded systems below.Suppose to have and comprised three geometric object A among the vector data A ₁, A ₂, A ₃All geometric objects all are divided into three joints, i.e. A according to three precision levels _IjI, j=1,2,3, represent that the j of i geometric object saves.A is split into two scopes under the scope mode of priority, comprises geometric object A respectively ₁, A ₂With geometric object A ₃If according to the precision mode of priority, the arrangement mode of all joints is in compressed file so:

A ₁₁，A ₂₁，A ₃₁，A ₁₂，A ₂₂，A ₃₂，A ₁₃，A ₂₃，A ₃₃

If according to the scope mode of priority, the arrangement mode of all joints is in compressed file so:

A ₁₁，A ₂₁，A ₁₂，A ₂₂，A ₁₃，A ₂₃，A ₃₁，A ₃₂，A ₃₃

Three. the output of compressing vector data

To all compressing vector data pieces, just formed last vector data compressed file in interpolation various marks and control information under the given coded system and after organizing, be called the jvg file, form (see figure 5) by file header and file body.Below the form of jvg file is done detailed explanation.All bytes all have the littleendian byte order (the most remarkable position in byte backmost) that uses on the Intel platform.

Referring to subordinate list 3, the file header of compressed file has comprised following two parts:

First is whole about the compressing vector data file and the information of the compression algorithm taked.

This part is essential.Here and the mark of in the formation of follow-up code stream, using form by two bytes, wherein first byte is 0XFF, second byte be value in the scope except 0X00 and 0XFF (be complete 0 and complete 1 byte), specifically sees attached list 3.

Second portion is the needed table of compression algorithm, comprises the dct transform coefficient quantization table, DC coefficient difference and AC coefficient be can't harm the coding schedule of entropy coding.

This part is optional.If do not provide these tables, may obtain from other acquiescence approach by compression and decompress(ion) program so.Specify as follows:

It at first is the definition of dct transform coefficient quantization table.The definition of quantization table is begun by mark DQT (referring to subordinate list 4), first kind of undefined quantization table of mode is made up of a double-precision floating point constant C and the individual integer of N (N is the discrete cosine transform block size), and the corresponding quantization step of each dct transform coefficient is the integer that C multiply by correspondence position; The undefined quantization table of the second way is made up of N double-precision floating points, and these numbers are the quantization step of corresponding dct transform coefficient.

Be the definition of DC coefficient difference huffman coding table then.In order to preserve this table, at first provide the line number R that one one byte signless integer is represented this table, the difference that every row comprises can be directly number be calculated by the row of this row, does not therefore need to preserve.Next be a R byte signless integer, represent the length of the pairing Huffman code word of every row.The length of Huffman code word does not allow to surpass 16.Each is preserved with two bytes in order with all code words at last, and real code word is positioned at the low side of these two bytes, because code word size does not allow to surpass 16, therefore two bytes are enough.With subordinate list 1 is example, and at first the byte of Bao Cuning is 0X11 (17 hexadecimal representations), is following 17 bytes then, represents the length of each code word:

0X02，0X03，0X03，…

Ensuing 34 bytes are:

0X00，0X00，0X00，0X02，0X00，0X03…

Be AC coefficient distance of swimming huffman coding table at last.In order to preserve this table, at first provide line number R and columns Z that two one byte signless integers are represented this table.R has represented the maximum number of lines of non zero AC coefficient in DC coefficient difference huffman coding table, and Z has represented the maximum length of the zero run-length of non zero AC coefficient front.Then, next preserve RZ code word in this table according to the order of row major again, and (Biao beginning just) inserts the code word of preserving EOB in (0,0) position, locate to insert the code word of preserving ZRL in (0,15).Amount to RZ+2 code word.When preserving these code words, equally at first preserve a byte signless integer of RZ+2 expression correspondence code word length, each is preserved with two bytes in order with all code words then.

If above-mentioned quantization table and coding schedule all provide, six tables (corresponding x and y coordinate data respectively) will be arranged so in compressed file.

The file body of jvg file has comprised actual compressing vector data.If be the ground floor coding, this part only has a compressing vector data section so, and this section is made up of one or more compressing vector data pieces.If be second layer coding, this part has one or more scopes so, and each scope is made up of one or more sections, and each section is made up of the joint of one or more compressing vector data pieces in order.In file body, need some marks to represent the beginning and the end of different scopes and section, and each joint ends up by EOB.

At first do some explanations below with regard to the compressing vector data piece in the ground floor coding.By the block algorithm of front as can be known the compressing vector data piece may handle by GDB or GSDB, and this GDB or GSDB may be arranged in a part that geometric object is different, these informational needs are preserved in the compressing vector data piece.For this reason at additional two bits of the beginning of each compressing vector data piece, its meaning is shown in subordinate list 5, if this compressing vector data piece is come by the GSDB compression then next byte is preserved actual the counting of comprising in this compressing vector data piece.Because GDB or GSDB will produce two compressing vector data pieces (corresponding x and y coordinate data respectively), and these two pieces always in succession, so only preserve these information previous beginning.Can rebuild the geometric object of forming by a plurality of parts by the points information that comprises among build portion information and the GSDB.If in addition a compressing vector data piece is divided into a plurality of joints according to required precision, so only in first segment, preserve former build portion information.

Determined to adopt ground floor still be second layer coding and specifically whole vector data is divided into scope and the compressing vector data piece in each scope is divided into the mode of the joint and the section of being combined as after get final product all bit streams are organized as file body.Accompanying drawing 5 is examples of file body.In compressed file, be difficult to find the border between each geometric object, but when decompress(ion), can be combined as geometric object according to the header information of each compressing vector data piece coordinate point range after with decompress(ion).

Also have 2 needs explanations at last.The one, each coding unit in the jvg file (as the compressing vector data piece or cut apart the formed joint of this piece) all must be a byte-aligned, therefore, must replenish enough 0 at last so that its figure place is 8 integral multiple at this coding unit so if the figure place of coding unit is not 8 integral multiple.The 2nd, for fear of bit stream and the mark in the compressed file (definition of the face that sees before) obscured in the coding unit, if in the regulation coding unit if 0XFF byte (just one is 1 byte entirely), inserting one so in this byte back is 0 byte entirely.If therefore decoder runs into a 0XFF byte, if the byte of its back is 0X00, should directly abandon this byte so, if not, should explain so these two bytes are interpreted as a mark and take corresponding action according to the value of this mark.Fig. 6 to Figure 10 is an example that compresses actual river data with the art of this patent, employing ground floor coding, size before the compression is 5,831,572 bytes (5.56M), the size after the compression are 269,716 bytes (264kB), comprised file header and quantization table, coding schedule etc., compression ratio has reached 21.6.

Subordinate list

Subordinate list 1 difference huffman coding table example

Capable number	The coefficient difference	The Huffman code word
Capable number	The coefficient difference	The Huffman code word	0	0	00
1	-1.1	010	0	0	00
1	-1.1	010	2	-3，-2，2，3	011
3	-7，…，-4，4，…，7	100	2	-3，-2，2，3	011
3	-7，…，-4，4，…，7	100	4	-15，…，-8，8，…，15	101
5	-31，…，-16，16，…，31	110	4	-15，…，-8，8，…，15	101
5	-31，…，-16，16，…，31	110	6	-63，…，-32，32，…，63	1110
7	-127，…，-64，64，…，127	11110	6	-63，…，-32，32，…，63	1110
7	-127，…，-64，64，…，127	11110	8	-255，…，-128，128，…，255	111110
9	-511，…，-256，256，…，511	1111110	8	-255，…，-128，128，…，255	111110
9	-511，…，-256，256，…，511	1111110	10	-1023，…，-512，512，…，1023	11111110
11	-2047，…，-1024，1024，…，2047	111111110	10	-1023，…，-512，512，…，1023	11111110
11	-2047，…，-1024，1024，…，2047	111111110	12	-4095，…，-2048，2048，…，4095	11111111110
13	-8191，…，-4096，4096，…，8191	111111111110	12	-4095，…，-2048，2048，…，4095	11111111110
13	-8191，…，-4096，4096，…，8191	111111111110	14	-16383，…，-8192，8192，…，16383	1111111111110
15	-32767，…，-16384，16384，…，32767	11111111111110	14	-16383，…，-8192，8192，…，16383	1111111111110
15	-32767，…，-16384，16384，…，32767	11111111111110	16	32768	11111111111111

Subordinate list 2AC coefficient distance of swimming huffman coding table example

Z/R	Code word	Z/R	Code word	Z/R	Code word
Z/R	Code word	Z/R	Code word	Z/R	Code word		0/0(EOB)	1010
0/1	00	1/1	1100	2/1	11100	…	0/0(EOB)	1010
0/1	00	1/1	1100	2/1	11100	…	0/2	01	1/2	11011	2/2	11111001	…
0/3	100	1/3	1111001	2/3	1111110111	…	0/2	01	1/2	11011	2/2	11111001	…
0/3	100	1/3	1111001	2/3	1111110111	…	0/4	1011	1/4	111110110	2/4	111111110100	…
0/5	11010	1/5	11111110110	2/5	1111111110001001	…	0/4	1011	1/4	111110110	2/4	111111110100	…
0/5	11010	1/5	11111110110	2/5	1111111110001001	…	…	…	…	…	…	…	…

Subordinate list 3 jvg file headers

The starting position	Explanation	Value	Type	Explanation
The starting position	Explanation	Value	Type	Explanation	Byte 0	Compressed file is described	“jveg”	Character	Essential
Byte 4	Compressed file begins	SOF	Assigned tags	Essential	Byte 0	Compressed file is described	“jveg”	Character	Essential
Byte 4	Compressed file begins	SOF	Assigned tags	Essential	Byte 6	Compressed file length	File size (byte)	Integer	Essential
Byte 10	The discrete cosine transform block size	2 ⁿ, n is specified by the user	Integer	Essential	Byte 6	Compressed file length	File size (byte)	Integer	Essential
Byte 10	The discrete cosine transform block size	2 ⁿ, n is specified by the user	Integer	Essential	Byte 14	The compressed file version	1000	Integer	Essential
Byte 18	The vector data scope	Xmin	Double-precision floating points	Essential	Byte 14	The compressed file version	1000	Integer	Essential
Byte 18	The vector data scope	Xmin	Double-precision floating points	Essential	Byte 26	The vector data scope	Ymin	Double-precision floating points	Essential
Byte 34	The vector data scope	Xmax	Double-precision floating points	Essential	Byte 26	The vector data scope	Ymin	Double-precision floating points	Essential
Byte 34	The vector data scope	Xmax	Double-precision floating points	Essential	Byte 42	The vector data scope	Ymax	Double-precision floating points	Essential
Byte 50	Direct quantization step	Xstep	Double-precision floating points	Optional	Byte 42	The vector data scope	Ymax	Double-precision floating points	Essential
Byte 50	Direct quantization step	Xstep	Double-precision floating points	Optional	Byte 58	Direct quantization step	Ystep	Double-precision floating points	Optional

Employed mark in the subordinate list 4 jvg files

Value	Title	Explanation
Value	Title	Explanation	0XFF60	SOF ₀	Floating-point DCT, the ground floor coding
0XFF61	SOF ₁	Floating-point DCT, second layer coding, precision is preferential	0XFF60	SOF ₀	Floating-point DCT, the ground floor coding
0XFF61	SOF ₁		0XFF62	SOF ₂	Floating-point DCT, second layer coding, scope is preferential
0XFF63	SOF ₃	Integer DCT	0XFF62	SOF ₂
0XFF63	SOF ₃	Integer DCT	0XFF70	SOV	Compressing vector data begins
0XFF71	EOV	Compressing vector data finishes	0XFF70	SOV	Compressing vector data begins
0XFF71	EOV	Compressing vector data finishes	0XFF72	SOR	Scope begins
0XFF73	EOR	End of extent (EOE)	0XFF72	SOR	Scope begins
0XFF73	EOR	End of extent (EOE)	0XFF74	SOS	The section beginning
0XFF75	EOS	Section finishes	0XFF74	SOS	The section beginning
0XFF75	EOS	Section finishes	0XFF80	DQT ₀	First kind of dct transform coefficient quantization table that mode defines
0XFF81	DQT ₁	The dct transform coefficient quantization table of second way definition	0XFF80	DQT ₀
0XFF81	DQT ₁		0XFF82	DHT ₀	Definition DC coefficient difference huffman coding table
0XFF83	DHT ₁	Definition AC coefficient huffman coding table	0XFF82	DHT ₀	Definition DC coefficient difference huffman coding table

Subordinate list 5 compressing vector data build portion information

The value of two bits of compressing vector data build portion	Meaning
	Meaning	11	This piece is a GDB
10	This piece is a GSDB, but is not the ending of a geometric object	11	This piece is a GDB
10		00	This piece is a GSDB, and the corresponding ending of a geometric object

List of references:

[1]H.Wu，Hanwu Zhang，Xiaojing Liu，Xia Sun，Adaptive Architecture of GeospatialInformation Service over the Internet with QoGIS Embedded，in：ISPRS Workshopon Service and Application of Spatial Data Infrastructure，XXXVI(4/W6)，Oct14-16，2005，Hangzhou，China

[2] Hu Guangshu. Digital Signal Processing---theory, algorithm and realization. Beijing: publishing house of Tsing-Hua University, 1997

[3]A.Ortega and K.Ramchandran，Rate Distortion Methods for Image and VideoCompression，IEEE Signal Proc.Magazine，pp.23-49，Nov.1998.

[4]M.Rabbani and R.Joshi，An Overview of the JPEG2000 Still Image CompressionStandard，Signal Processing：Image Communication，vol.17，issue.1，pp.3-48，Jan.2002.

[5]V.K.Goyal，Theoreticai Foundations of Transform Coding，IEEE SignalProcessing Magazine，pp.9-21，Sept.2001.

[6]Y.Zeng，L.Cheng，G.Bi，and A.C.Kot，Integer DCTs and Fast Algorithms.IEEETransactions on Signal Processine，vol.49，no.11，pp.2774-2782，Nov.2001.

[7] David Salomon[U.S.] work, Wu Lenan etc. translate, data compression principle and application. Beijing: Electronic Industry Press, 2003

Claims

1. the compression method of a two-dimensional space vector data, it is characterized in that behind the two-dimensional vector data piecemeal, use is carried out lossy compression method based on the coding method of dct transform to two-dimensional vector data, and is output as the bit stream of specified format, and DCT is the english abbreviation of discrete cosine; Concrete steps are as follows:

(1) two-dimensional vector data is divided into the vector data piece, follow-up cataloged procedure carries out individually at each vector data piece, and this step is called piecemeal,

(2) x in each vector data piece and y coordinate data are carried out lossy compression method based on dct transform, this step is called coding,

(3) compression data block that all codings are obtained adds various marks and control information formation compression position flow, outputs to binary compressing vector data file, and this step is called output.

2. the compression method of two-dimensional vector data according to claim 1 is characterized in that adopting the following steps piecemeal:

To handled two-dimensional vector data, the object vector that is comprised with vector data inside is a unit, is divided into the vector data piece of identical size according to the needs of compression algorithm; All contain 2 in each vector data piece ⁿIndividual two-dimemsional number strong point, for consistent with the original structure of vector data, piecemeal carries out in object vector inside, less than 2 ⁿThe vector data piece of individual data point constitutes a complete vector data block after artificially replenishing some data points; N is the integer greater than 1, specifies before compression.

3. the compression method of two-dimensional vector data according to claim 1 is characterized in that adopting the following steps coding:

A. to all vector data pieces, select one of two kinds of patterns that x and y coordinate data are carried out the one dimension dct transform respectively; First kind of pattern is earlier coordinate data to be quantized, and then the rounded coordinate data after quantizing be can't harm the integer dct transform, obtains the integer transform coefficient; Second kind of pattern is directly coordinate data to be carried out the floating-point dct transform, obtains the floating-point transform coefficient,

B. all conversion coefficients that second kind of mode treatment obtained carry out uniform quantization, obtain quantization transform coefficient,

C. to the integer transform coefficient in the quantization transform coefficient of each vector data piece or the first kind of pattern in order block-by-block can't harm entropy coding, generate the compressing vector data piece, the cataloged procedure of being made up of step a, b and c is called ground floor encodes, perhaps,

D. the further tissue compression vector data of the requirement piece that increases progressively according to the reconstruction precision of vector data makes compression position flow have progressive character, and like this, the cataloged procedure of being made up of step a, b, c and d is called second layer coding; Second layer coding is only at the quantization transform coefficient that obtains under second kind of pattern.

4. the compression method of two-dimensional vector data according to claim 1 is characterized in that output: the lossy compression method data block is added various marks and control information, and the bit stream that obtains is outputed to the compressed file of specified format, expansion jvg by name; This compressed file is made up of file header and two parts of file body.