CN107682016A - A kind of data compression method, data decompression method and related system - Google Patents
A kind of data compression method, data decompression method and related system Download PDFInfo
- Publication number
- CN107682016A CN107682016A CN201710884914.2A CN201710884914A CN107682016A CN 107682016 A CN107682016 A CN 107682016A CN 201710884914 A CN201710884914 A CN 201710884914A CN 107682016 A CN107682016 A CN 107682016A
- Authority
- CN
- China
- Prior art keywords
- data
- block
- recombination
- data block
- similar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of data compression method, data decompression method and related system, after former data are divided into multiple data blocks, similar data block migration is recombinated to eliminate redundant data, so as to improve the compression ratio of data.Present invention method includes:Former data are divided into multiple data blocks;Detect the similitude of multiple data blocks;Similar data block is migrated into restructuring successively, generates recombination data;Recombination data is compressed, generates compressed data.The present embodiment additionally provides a kind of data decompression method and related system, for improving the compression ratio of data.
Description
Technical field
The present invention relates to microcomputer data processing field, more particularly to a kind of data compression method, data decompression side
Method and related system.
Background technology
Data compression refers to that on the premise of useful information is not lost reduction data volume improves it to reduce memory space
Transmission, storage and treatment effeciency, or data are reorganized according to certain algorithm, reduce redundancy and the storage of data
A kind of technical method in space.
Current data compression technique is broadly divided into lossy compression method and Lossless Compression, existing lossless compressiong mostly by
Developed based on dictionary encoding technology LZ77 and LZ78.Dictionary encoding technology is mainly using a kind of slow based on " sliding window "
Technology is deposited, current character sequence is matched with the character string cached in sliding window, it is relative with one if repeated
Short coding represents, so as to realize that the redundancy of character tandem eliminates.
And in existing lossless compressiong sliding window size, the major limitation lookup of redundant data, on the one hand,
Sliding window means more to be easily found redundant data more greatly, so as to more eliminate redundancy, but with sliding window
Increase, the matched and searched time of redundance character string is also exponentially increased, therefore most of compression algorithms limit sliding window
Size, such as bzip2 maximum sliding window is 900KB;On the other hand, sliding window is too small, the redundant digit in different windows
It can not be eliminated according to because of apart from each other, substantial amounts of redundant data is still suffered from storage system, meanwhile, the character of Non-redundant data
String matching operation also takes seriously, reduces the data compression speed in storage system.
The content of the invention
The embodiments of the invention provide a kind of data compression method, data decompression method and related system, for by former number
After multiple data blocks are divided into, similar data block migration is recombinated to eliminate redundant data, so as to solve traditional compression
In technology, the problem of causing data redundancy apart from each other not eliminate because of the limitation of sliding window size.
One aspect of the present invention provides a kind of method of data compression, including:
Former data are divided into multiple data blocks;
Detect the similitude of multiple data blocks;
Similar data block is migrated into restructuring successively, generates recombination data;
Recombination data is compressed, generates compressed data.
Optionally, after former data are divided into multiple data blocks, before the similitude for detecting multiple data blocks, this method
Also include:
Record order, skew and the block length of multiple data blocks, generation original spectrum.
Optionally, similar data block is being migrated into restructuring successively, after generating recombination data, recombination data pressed
Before contracting, this method also includes:
According to recombination data, the skew of multiple data blocks is updated, obtains new original spectrum;
New original spectrum is compressed, generation compressed file spectrum.
Optionally, similar data block is migrated into restructuring successively, generates recombination data, including:
Set of metadata of similar data block is migrated into restructuring successively, generates multiple similar chained lists;
According to multiple similar chained lists, data block contents corresponding to reading, generate recombination data from former data.
Optionally, the similitude of multiple data blocks is detected, including:
The similitude of multiple data blocks is detected by super method of characteristic, Simhash or Minhash methods.
Another aspect of the present invention provides a kind of data decompression method, including:
Depressurizing compression data and compressed file spectrum, respectively obtain recombination data and new original spectrum;
According to order, skew and the block length of multiple data blocks of new original spectrum record, read respectively from recombination data
Multiple data blocks;
According to the order of multiple data blocks of new original spectrum record, multiple data blocks are write successively, obtain former data.
Present invention also offers a kind of system of data compression, including:
Blocking unit, for former data to be divided into multiple data blocks;
Detection unit, for detecting the similitude of multiple data blocks;
Recomposition unit, for similar data block to be migrated into restructuring successively, generate recombination data;
Compression unit, for recombination data to be compressed, generate compressed data.
Present invention also offers a kind of system of data decompression, including:
Decompression units, composed for depressurizing compression data and compressed file, respectively obtain recombination data and new original spectrum;
Reading unit, for composing skew and the block length of the multiple data blocks recorded according to new original, respectively from restructuring number
Multiple data blocks are read according to middle;
Writing unit, the order of multiple data blocks of record is composed according to new original, multiple data blocks is write successively, obtains
Former data.
Present invention also offers a kind of computer installation, including processor, the processor is stored in memory for execution
On computer program when, it is possible to achieve the steps:
Former data are divided into multiple data blocks;
Detect the similitude of multiple data blocks;
Similar data block is migrated into restructuring successively, generates recombination data;
Recombination data is compressed, generates compressed data.
Present invention also offers a kind of computer installation, including processor, processor is used to perform storage on a memory
Computer program when, for realizing the steps:
Depressurizing compression data and compressed file spectrum, respectively obtain recombination data and new original spectrum;
According to the skew of multiple data blocks of new original spectrum record and block length, more numbers are read from recombination data respectively
According to block;
According to the order of multiple data blocks of new original spectrum record, multiple data blocks are write successively, obtain former data.
Present invention also offers a kind of computer-readable recording medium, computer program is stored thereon with, the computer journey
When sequence is executed by processor, for realizing the steps:
Former data are divided into multiple data blocks;
Detect the similitude of multiple data blocks;
Similar data block is migrated into restructuring successively, generates recombination data;
Recombination data is compressed, generates compressed data.
Present invention also offers a kind of computer-readable recording medium, computer program is stored thereon with, it is characterised in that
When computer program is executed by processor, for realizing the steps:
Depressurizing compression data and compressed file spectrum, respectively obtain recombination data and new original spectrum;
According to the skew of multiple data blocks of new original spectrum record and block length, more numbers are read from recombination data respectively
According to block;
According to the order of multiple data blocks of new original spectrum record, multiple data blocks are write successively, obtain former data.
As can be seen from the above technical solutions, the embodiment of the present invention has advantages below:
In the present invention, former data are divided into multiple data blocks, detect the similitude of multiple data blocks, by similar data block
Migration restructuring, generates recombination data, is then compressed recombination data, obtains compressed data, because the present invention will be similar
Data block migration recombinates, so as to which similar data block restructuring, as much as possible be disappeared set of metadata of similar data block so as to ensure that together
Except redundancy, solve in conventional data compression causes data redundancy apart from each other can not because of the limitation of sliding window size
The problem of elimination.
Brief description of the drawings
Fig. 1 is the process schematic of data compression method;
Fig. 2 is a kind of one embodiment schematic diagram of data compression method in the embodiment of the present invention;
Fig. 3 is a kind of another embodiment schematic diagram of data compression method in the embodiment of the present invention;
Fig. 4 is the structure organization schematic diagram of file spectrum;
Fig. 5 is the process schematic of data decompression method;
Fig. 6 is a kind of one embodiment schematic diagram of data decompression method in the embodiment of the present invention;
Fig. 7 is a kind of one embodiment schematic diagram of data compression system in the embodiment of the present invention;
Fig. 8 is a kind of another embodiment schematic diagram of data compression system in the embodiment of the present invention;
Fig. 9 is a kind of one embodiment schematic diagram of data decompression system in the embodiment of the present invention.
Embodiment
The embodiments of the invention provide a kind of data compression method, data decompression method and related system, for by former number
After multiple data blocks are divided into, similar data block migration is recombinated to eliminate redundant data, so as to solve traditional compression
In technology, the problem of causing data redundancy apart from each other not eliminate because of the limitation of sliding window size, while also carry
The high compression ratio of data.
For the ease of the understanding to file, first the technical term occurred in text is explained as follows:
Deblocking:Deblocking divides documents into multiple data blocks using block algorithm, and the selection of block algorithm is not
But piecemeal speed can be influenceed, and is also had a great impact to the Detection results of set of metadata of similar data block.Existing deblocking algorithm
Mainly include fixed length piecemeal and two kinds of elementary tactics of piecemeal based on content.Fixed length piecemeal marks cutting edge according to piecemeal position
Boundary, it realizes that simply cutting speed is fast.Due to the problem of Boundary Moving be present, the redundancy detection effect of fixed length piecemeal is simultaneously paid no attention to
Think.And piecemeal border is determined according to the local content of data flow based on the piecemeal of content, it efficiently solves Boundary Moving
Problem, data flow is divided into the data block of random length.Comparatively, the block algorithm based on content can better adapt to frequency
The load of numerous modification content, can find more redundant datas, be widely used in the storage system based on data deduplication
In system.
Similitude detects:Similitude detects the data block highly similar for identifying content, so as to find out and eliminate storage
Similarity redundancy in system.The representative fingerprint of comparison document is generally basede in storage system to judge the similarity relation between file.
Existing conventional similarity detection method has the similarity detection method based on super characteristic value, Simhash, Minhash etc..
Data Migration:Data Migration is to be changed the partial data order in file so that set of metadata of similar data can be real
Now cluster, so as to improve a kind of method of compressing file effect.Data Migration provides the mechanism for recovering metadata, migration simultaneously
Elementary cell be data block.After file is divided into multiple data blocks, set of metadata of similar data block is identified by similarity detection method, so
The position of set of metadata of similar data block is moved afterwards, is made set of metadata of similar data block physical location adjacent, is made file data more compressible.
Data compression:Data compression is a kind of redundant data technology for eliminating of main flow, is mainly eliminated by way of coding
Redundant data information, i.e., on the premise of ensureing that legacy data information is not lost, original contents are changed, for what is repeated
The coded representation of the less byte number of byte sequence, so as to reach the purpose for eliminating partial redundance data.Earliest by Claude
Elwood Shannon (- 2001 years 1916) propose " comentropy " concept --- all there is redundancy in any information, redundancy is big
It is small uncertain in other words relevant with the probability of each symbol in information (numeral, letter or word) appearance.Shannon information
Entropy theory has established the theoretical foundation of data compression, as the continuous growth of electronic digital information, data compression technique are also gradual
Develop into lossless compressiong, lossy compression etc..Existing lossless compressiong is mostly by based on dictionary encoding technology
LZ77 and LZ78 are developed.Dictionary encoding technology is mainly using a kind of caching technology for being based on " sliding window ", by current word
Symbol sequence is matched with the character string of caching in sliding window, if repeated, is represented with a relatively short coding,
So as to realize that the redundancy of character tandem eliminates.
In order to make it easy to understand, Fig. 1 gives the process schematic of data compression method, with reference to Fig. 1, to describe this hair
Data compression method in bright, referring to Fig. 2, a kind of one embodiment of data compression method in the embodiment of the present invention, including:
201st, former data are divided into multiple data blocks;
It is understood that data compression is on the premise of ensureing that former data are not lost, redundant data is eliminated, so as to
Reach diminution memory space, the purpose of speeding up data transmission.
The present invention is the thought based on set of metadata of similar data clustering combination, so as to the elimination redundant data of maximum possible.For reality
The clustering combination of existing set of metadata of similar data by former data, it is necessary to carry out piecemeal, so as to realize the Similar contrasts of block data.
Deblocking is that former data are divided into multiple data blocks using block algorithm.The granularity of average piecemeal is left for 8KB
Right (changing Block granularity can also be arranged as required to as 4KB or 16KB), block algorithm can be used and calculated based on content piecemeal
Method or fixed length piecemeal.
Fixed length piecemeal marks cut-boundary according to piecemeal position, and it realizes that simply cutting speed is fast.Due to border be present
The problem of mobile, the redundancy detection effect of fixed length piecemeal is general.It is and true according to the local content of data flow based on the piecemeal of content
Determine piecemeal border, the problem of it efficiently solves Boundary Moving, data flow is divided into the data block of random length.It is relative and
Speech, the block algorithm based on content can better adapt to the load of frequently modification content, can find more redundant datas.
202nd, the similitude of multiple data blocks is detected;
Former data form multiple data blocks after deblocking, and data compression system carries out similar to multiple data blocks
Property detection, wherein similitude detection algorithm have many kinds, such as:Super method of characteristic, Simhash or Minhash methods.
Wherein, the similitude detection of multiple data blocks specifically how is realized using above-mentioned algorithm, in the following embodiments
It is described in detail.
It should be noted that the Similarity Detection Algorithm in the present embodiment includes but is not limited to above-mentioned algorithm, do not do herein
Concrete restriction.
203rd, similar data block is migrated into restructuring successively, generates recombination data;
Multiple data blocks by similitude after detecting so that similar data block cluster restructuring, forms multiple similar chains
Table, data compression system read according to similar chained list, successively from former data corresponding to data block, then by the data block of reading according to
Secondary write-in, you can generation recombination data.
Specifically, how multiple data blocks generate similar chained list after similitude detection, and how according to similar chain
Table, obtain recombination data and be described in detail in the following embodiments.
204th, recombination data is compressed, generates compressed data.
In multiple data chunks into after recombination data, data compression system is further by traditional compression method, to this
Recombination data is compressed so that the maximized de-redundancy of similar data block, so as to increase the compression ratio of former data.
In the present invention, former data are divided into multiple data blocks, detect the similitude of multiple data blocks, by similar data block
Migration restructuring, generates recombination data, is then compressed recombination data, obtains compressed data, because the present invention will be similar
Data block migration recombinates, so as to which similar data block restructuring, as much as possible be disappeared set of metadata of similar data block so as to ensure that together
Except redundancy, solve in conventional data compression causes data redundancy apart from each other can not because of the limitation of sliding window size
The problem of elimination.
Embodiment based on Fig. 2, the data compression method in the embodiment of the present invention is described below in detail, referring to Fig. 3, this
Another embodiment of a kind of data compression method in inventive embodiments, including:
301st, former data are divided into multiple data blocks;
In order to realize the purpose for recombinating similar data clusters in the present invention, it is necessary to which former data are carried out into piecemeal, so as to
Obtain multiple data blocks.Wherein, deblocking is that former data are divided into multiple data blocks using block algorithm.Average piecemeal
Granularity is 8KB or so (changing Block granularity can also be arranged as required to as 4KB or 16KB), and block algorithm can be used and is based on
Content block algorithm or fixed length piecemeal.
It is specifically, detailed in the step 201 of the content and feature of content piecemeal and fixed length block algorithm in Fig. 2 embodiments
Thin description, here is omitted.
302nd, order, skew and the block length of multiple data blocks, generation original spectrum are recorded;
Former data are returned to for the ease of later stage compressed data, data compression system is needed to multiple data blocks in former data
In order, the block length of skew and each data block recorded, the order of plurality of data block is used to recover each data
Order of the block in former data, skew and block length are to accurately read out the content of each data block.Wherein, record multiple
The file of the order of data block, skew and block length, referred to as original are composed.
Fig. 4 is the institutional framework schematic diagram of file spectrum, gives the original spectrum example of an entitled TEST file.File
Mainly include long file size, filename, the fileinfo of filename and data block number, the skew of each data block in spectrum
With the data block metadata of block length.
303rd, the similitude of multiple data blocks is detected;
Former data form multiple data blocks after deblocking, and data compression system carries out similar to multiple data blocks
Property detection, wherein similitude detection algorithm have many kinds, such as:Super method of characteristic, Simhash or Minhash methods.
Illustrated below with super method of characteristic, it is assumed that N number of data block be present, then N number of data block is used respectively
A kind of hash algorithm, then N number of cryptographic Hash is respectively obtained, as N number of super characteristic value, but in order to improve multiple data block similarities
Discrimination, then a variety of hash algorithms are used to N number of data block respectively so that each data block corresponds to multiple super characteristic values,
Then each data block corresponds to multiple super characteristic value indexes respectively.Go to contrast with each super characteristic value of each data block respectively
Each super characteristic value of other data blocks, if having the super spy of identical in finding multiple super characteristic values of certain two data block
Value indicative, then the two artificial data blocks are set of metadata of similar data block.
It should be noted that the similar of multiple data blocks is detected using global super characteristic value index in the present embodiment
Property, the scope of approx imately-detecting is expanded, improve the Detection results of set of metadata of similar data block.But approx imately-detecting algorithm in the present embodiment
Simhash or Minhash detection algorithms can be used, specific detection algorithm, are not particularly limited herein.
304th, set of metadata of similar data block is migrated into restructuring successively, generates multiple similar chained lists;
In step 303, if data compression system finds super characteristic value identical data block be present, by these data
Block is added in corresponding similar chained list successively, and skew and the block length of each data block are recorded in similar chained list, to obtain
Multiple similar chained lists.
As shown in figure 1, wherein data block A, C, F is set of metadata of similar data block, then data block A, C, F are designated as similar chained list 1, number
It is set of metadata of similar data block according to block B, D, E, then data block B, D, E is designated as similar chained list 2, if some data block is not present and other numbers
According to the super characteristic value of block identical, then newly-built similar chained list, for depositing the data block.
305th, according to multiple similar chained lists, data block contents corresponding to reading, generate recombination data from former data;
By multiple data blocks after similitude detects, multiple similar chained lists are obtained, data compression system travels through often successively
Individual similar chained list, according to the order of data block in each similar chained list, skew and block length, it is successively read out from former data each
The content of data block, each data block of reading is then write into file successively, generate recombination data.
As shown in figure 1, according to similar chained list 1 record each data block order, skew and block length, from former data according to
The secondary content for reading out data block A, C, F, writes file successively;According to the order of each data block of similar chained list 2 record, partially
Shifting and block length, are successively read out data block B, D, E content from former data, then write file successively, by that analogy, according to phase
Like the order of chained list, the content of each data block is read out respectively, is write successively, so as to generate recombination data, as shown in Figure 1
Recombination data A, C, F, B, D, E.
306th, according to recombination data, the skew of multiple data blocks is updated, obtains new original spectrum;
After recombination data is generated, because the position of each data block is changed, corresponding each data
The skew of block is also changed, as shown in figure 1, in former data, it is assumed that A data blocks are 1K, and B data block is 2k, C data block
For 3K, then skew of the C data block in former data is A data blocks and the block length summation of B data block, as 3K, and generates restructuring
After data because the position of C data block changes, then C data block skew for A data blocks block length, i.e. 1k.For the later stage
According to original spectrum and recombination data, extensive restored data, data compression system is then needed according to recombination data, renewal original spectrum
In multiple data blocks skew, for the ease of description, the original spectrum after renewal is referred to as new original and composed.
307th, new original spectrum is compressed, obtains compressed file spectrum;
After obtaining new original spectrum, new original spectrum is compressed, obtains compressed file spectrum, and in order to which the later stage decompresses
It is convenient, compressed file can be composed and be associated storage with the compressed data in later stage.
308th, recombination data is compressed, obtains compressed data;
, can be most by compression because similar data block is clustered into restructuring after step 305 obtains recombination data
The big possible redundancy for eliminating set of metadata of similar data block, obtains the compressed data of more low capacity.
Further, present invention addresses in conventional compression method, cause apart because of the limitation of sliding window size
The problem of too remote redundancy can not eliminate.
It should be noted that step 308 can also perform before step 307, i.e., do not have between step 307 and step 308
There is order to limit, and operate for convenience in practice, step 307 can also merge with step 308 to be performed, i.e., simultaneously will be new former
File is composed and recombination data is compressed, and obtains compressed data and compressed file spectrum.
In the present invention, former data are divided into multiple data blocks, detect the similitude of multiple data blocks, by similar data block
Migration restructuring, generates recombination data, is then compressed recombination data, obtains compressed data, because the present invention will be similar
Data block migration recombinates, so as to which similar data block restructuring, as much as possible be disappeared set of metadata of similar data block so as to ensure that together
Except redundancy, solve in conventional data compression causes data redundancy apart from each other can not because of the limitation of sliding window size
The problem of elimination.
The data compression method in the present invention is described above is, the data decompression method in the present invention will be described below, please
Refering to Fig. 6, one embodiment of data decompression method in the embodiment of the present invention, including:
601st, depressurizing compression data and compressed file spectrum, respectively obtain recombination data and new original spectrum;
Embodiment based on Fig. 3, after obtaining compressed data and compressed file spectrum, extensive restored data, data decompression system
Need to decompress compressed data and compressed file spectrum, after decompression, both can obtain recombination data and new original spectrum, Fig. 5 is
The process schematic of data decompression method.
As shown in figure 5, after compressed data and compressed file spectrum decompression, recombination data and new original spectrum are obtained.
602nd, order, skew and the block length of multiple data blocks of record are composed according to new original, respectively from recombination data
Read multiple data blocks;
After compressed data and compressed file spectrum decompression, recombination data and new original spectrum are obtained, wherein, new original spectrum note
The order and block length of each data block in former data, and skew of each data block in recombination data are recorded.So data decompression
The order of each data block, block length in the former data that system records in being composed according to new original, and each data block is in recombination data
In skew, read out the content of each data block according to the order of former data block from recombination data successively.
As shown in figure 5, order A, B, C, D, E, F of the multiple data blocks recorded in being composed according to new original, and each data
Skew and block length of the block in recombination data, it is multiple according to being recorded in former data from recombination data A, D, F, B, C, E respectively
The order of data block reads out the content of each data block.
603rd, the order of multiple data blocks of record is composed according to new original, multiple data blocks is write successively, obtains former number
According to.
In step 602, data decompression system from recombination data according to former data record data block order successively
After the content for reading out each data block, then write the content of each data block successively, you can extensive restored data.
It should be noted that if data storage is in disk, because data in magnetic disk is sequentially written in, and according to former data
The order of the data block of record, it is non-sequential reading, so magnetic can be caused when reading the content of each data block in recombination data
The certain I/O expenses of disk, so as to shorten the life-span of disk, if but disk is changed to SSD disks, because SSD disks support it is random read and
Random writing, you can solve the problems, such as that magnetic disc i/o expense is big.
In the present invention, the method for corresponding data compression, depressurizing compression data and compressed file are composed, and respectively obtain recombination data
Composed with new original, order, skew and the block length of multiple data blocks of record are composed according to new original, respectively from recombination data
Multiple data blocks are read, then write multiple data blocks successively, you can extensive restored data.
The data compression method in the present invention is described above is, the data compression system in the present invention will be described below, please
Refering to Fig. 7, a kind of one embodiment of data compression system in the embodiment of the present invention, including:
Blocking unit 701, for former data to be divided into multiple data blocks;
Detection unit 702, for detecting the similitude of multiple data blocks;
Recomposition unit 703, for similar data block to be migrated into restructuring successively, generate recombination data;
Compression unit 704, for recombination data to be compressed, generate compressed data.
It should be noted that the effect of each unit and the data compression system described in Fig. 2 embodiments in the present embodiment
Type of action, here is omitted.
In the present invention, former data are divided into multiple data blocks by bronze drum blocking unit 701, detected by detection unit 702
The similitude of multiple data blocks, similar data block migration is recombinated, generate recombination data, then will by compression unit 704
Recombination data is compressed, and obtains compressed data, because the present invention recombinates similar data block migration, so as to by similar number
According to block restructuring together, so as to ensure that as much as possible by set of metadata of similar data block eliminate redundancy, solve in conventional data compression because
The problem of causing data redundancy apart from each other not eliminate for the limitation of sliding window size.
For ease of understanding, the data compression system in the embodiment of the present invention is described below in detail, referring to Fig. 8, of the invention
Another embodiment of data compression system in embodiment, including:
Blocking unit 801, for former data to be divided into multiple data blocks;
Detection unit 802, for detecting the similitude of multiple data blocks;
Recomposition unit 803, for similar data block to be migrated into restructuring successively, generate recombination data;
First compression unit 804, for recombination data to be compressed, generate compressed data.
Further, the data compression system also includes:
First generation unit 805, for recording order, skew and the block length of multiple data blocks, generation original spectrum;
Updating block 806, for according to recombination data, updating the skew of multiple data blocks, obtaining new original spectrum;
Second compression unit 807, for new original spectrum to be compressed, generation compressed file spectrum.
Wherein, recomposition unit 803 includes:
First generation module 8031, for set of metadata of similar data block to be migrated into restructuring successively, generate multiple similar chained lists;
Second generation module 8032, for data block contents corresponding to according to multiple similar chained lists, being read from former data,
Generate recombination data.
Wherein detection unit 802, including:
Detection module 8021, for detecting multiple data blocks by super method of characteristic, Simhash or Minhash methods
Similitude.
It should be noted that the effect of above-mentioned each unit and each module and the effect of data compression system in Fig. 3 embodiments
Similar, here is omitted.
In the present invention, former data are divided into multiple data blocks by bronze drum blocking unit 801, detected by detection unit 802
The similitude of multiple data blocks, similar data block migration is recombinated, generate recombination data, then will by compression unit 804
Recombination data is compressed, and obtains compressed data, because the present invention recombinates similar data block migration, so as to by similar number
According to block restructuring together, so as to ensure that as much as possible by set of metadata of similar data block eliminate redundancy, solve in conventional data compression because
The problem of causing data redundancy apart from each other not eliminate for the limitation of sliding window size.
Data compression system is described above is, then describes data decompression system below, referring to Fig. 9, the present invention is implemented
One embodiment of data decompression system in example, including:
Decompression units 901, composed for depressurizing compression data and compressed file, respectively obtain recombination data and new original
Spectrum;
Reading unit 902, for composing skew and the block length of the multiple data blocks recorded according to new original, respectively from restructuring
Multiple data blocks are read in data;
Writing unit 903, the order of multiple data blocks of record is composed according to new original, writes multiple data blocks successively,
Obtain former data.
It should be noted that the effect of each unit and the effect class of data decompression system in Fig. 6 embodiments in the present embodiment
Seemingly, here is omitted.
In the present invention, the method for corresponding data compression, composed by the depressurizing compression data of decompression units 901 and compressed file,
Respectively obtain recombination data and new original is composed, the order for multiple data blocks that reading unit 902 records according to new original spectrum,
Skew and block length, read multiple data blocks, then write multiple data blocks successively, you can recover former number from recombination data respectively
According to.
The angle of slave module functional entity is to the data compression system in the embodiment of the present invention, data decompression system above
It is described, the computer installation in the embodiment of the present invention is described from the angle of hardware handles below:
The computer installation is used for the function of realizing data compression system side, Computer of embodiment of the present invention device one
Individual embodiment includes:
Processor and memory;
Memory is used to store computer program, can when processor is used to perform the computer program stored in memory
To realize following steps:
Former data are divided into multiple data blocks;
Detect the similitude of multiple data blocks;
Similar data block is migrated into restructuring successively, generates recombination data;
Recombination data is compressed, generates compressed data.
In some embodiments of the invention, processor, can be also used for realizing following steps:
Record order, skew and the block length of multiple data blocks, generation original spectrum.
In some embodiments of the invention, processor, can be also used for realizing following steps:
According to recombination data, the skew of multiple data blocks is updated, obtains new original spectrum;
New original spectrum is compressed, generation compressed file spectrum.
In some embodiments of the invention, processor, can be also used for realizing following steps:
Set of metadata of similar data block is migrated into restructuring successively, generates multiple similar chained lists;
According to multiple similar chained lists, data block contents corresponding to reading, generate recombination data from former data.
In some embodiments of the invention, processor, can be also used for realizing following steps:
The similitude of multiple data blocks is detected by super method of characteristic, Simhash or Minhash methods.
The computer installation can be also used for realizing the function of data decompression system side, Computer of the embodiment of the present invention
Another embodiment of device includes:
Depressurizing compression data and compressed file spectrum, respectively obtain recombination data and new original spectrum;
According to order, skew and the block length of multiple data blocks of new original spectrum record, read respectively from recombination data
Multiple data blocks;
According to the order of multiple data blocks of new original spectrum record, multiple data blocks are write successively, obtain former data.
It is understood that either data compression system side, or data decompression system side, the meter of described above
During computing device computer program in calculation machine device, can also realize it is above-mentioned corresponding to each unit in each device embodiment
Function, here is omitted.Exemplary, computer program can be divided into one or more module/units, one or
Multiple module/units are stored in memory, and by computing device, to complete the present invention.One or more module/units
Can be the series of computation machine programmed instruction section that can complete specific function, the instruction segment is used to describe computer program in number
According to the implementation procedure in compressibility/data decompression system.For example, computer program can be divided into above-mentioned data compression system
Each unit in system, each unit can realize the concrete function as described in above-mentioned corresponding data compressibility.
Computer installation can be the computing devices such as desktop PC, notebook, palm PC and cloud server.Meter
Calculation machine device may include but be not limited only to processor, memory.It will be understood by those skilled in the art that processor, memory are only
Only it is the example of computer installation, does not form the restriction to computer installation, more or less parts can be included, or
Some parts, or different parts are combined, such as computer installation can also include input-output equipment, network insertion is set
Standby, bus etc..
Processor can be CPU (Central Processing Unit, CPU), can also be that other are logical
With processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other PLDs, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor
Deng processor is the control centre of computer installation, utilizes each portion of various interfaces and the whole computer installation of connection
Point.
Memory can be used for storage computer program and/or module, processor to be stored in memory by running or performing
Interior computer program and/or module, and the data being stored in memory are called, realize the various work(of computer installation
Energy.Memory can mainly include storing program area and storage data field, wherein, storing program area can storage program area, at least
Application program needed for One function etc.;Storage data field can store uses created data etc. according to terminal.In addition, deposit
Reservoir can include high-speed random access memory, can also include nonvolatile memory, such as hard disk, internal memory, plug-in type
Hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card
(Flash Card), at least one disk memory, flush memory device or other volatile solid-state parts.
Present invention also offers a kind of computer-readable recording medium, the computer-readable recording medium is used to realize data
The function of compressibility side, computer program is stored thereon with, when computer program is executed by processor, processor can be with
For performing following steps:
Former data are divided into multiple data blocks;
Detect the similitude of multiple data blocks;
Similar data block is migrated into restructuring successively, generates recombination data;
Recombination data is compressed, generates compressed data.
In some embodiments of the invention, the computer program of computer-readable recording medium storage is executed by processor
When, processor, it can be specifically used for performing following steps:
Record order, skew and the block length of multiple data blocks, generation original spectrum.
According to recombination data, the skew of multiple data blocks is updated, obtains new original spectrum;
New original spectrum is compressed, generation compressed file spectrum.
Set of metadata of similar data block is migrated into restructuring successively, generates multiple similar chained lists;
According to multiple similar chained lists, data block contents corresponding to reading, generate recombination data from former data.
The similitude of multiple data blocks is detected by super method of characteristic, Simhash or Minhash methods.
Present invention also offers another computer-readable recording medium, the computer-readable recording medium is used to realize number
According to the function of decompression system side, computer program is stored thereon with, when computer program is executed by processor, processor can
For performing following steps:
Depressurizing compression data and compressed file spectrum, respectively obtain recombination data and new original spectrum;
According to order, skew and the block length of multiple data blocks of new original spectrum record, read respectively from recombination data
Multiple data blocks;
According to the order of multiple data blocks of new original spectrum record, multiple data blocks are write successively, obtain former data.
If it is understood that integrated unit is realized in the form of SFU software functional unit and is used as independent product pin
Sell or in use, can be stored in a corresponding computer read/write memory medium.It is real based on such understanding, the present invention
All or part of flow in existing above-mentioned corresponding embodiment method, the hardware of correlation can also be instructed by computer program
To complete, computer program can be stored in a computer-readable recording medium, the computer program is being executed by processor
When, can be achieved above-mentioned each embodiment of the method the step of.Wherein, computer program includes computer program code, computer journey
Sequence code can be source code form, object identification code form, executable file or some intermediate forms etc..Computer-readable medium
It can include:Any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disc, the light of computer program code can be carried
Disk, computer storage, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random
Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It is it should be noted that computer-readable
The content that medium includes can carry out appropriate increase and decrease according to legislation in jurisdiction and the requirement of patent practice, such as at certain
A little jurisdictions, electric carrier signal and telecommunication signal are not included according to legislation and patent practice, computer-readable medium.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
In several embodiments provided herein, it should be understood that disclosed system, apparatus and method can be with
Realize by another way.For example, device embodiment described above is only schematical, for example, the division of unit,
Only a kind of division of logic function, can there is an other dividing mode when actually realizing, such as multiple units or component can be with
With reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or discussed
Mutual coupling or direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING of device or unit or
Communication connection, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, be shown as unit
Part can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple networks
On unit.Some or all of unit therein can be selected to realize the purpose of this embodiment scheme according to the actual needs.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or in use, can
To be stored in a computer read/write memory medium.Based on such understanding, technical scheme substantially or
Saying all or part of the part to be contributed to prior art or the technical scheme can be embodied in the form of software product
Out, the computer software product is stored in a storage medium, including some instructions are causing a computer equipment
(can be personal computer, server, or network equipment etc.) performs all or part of each embodiment method of the present invention
Step.And foregoing storage medium includes:It is USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), random
Access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with Jie of store program codes
Matter.
More than, the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to foregoing reality
Example is applied the present invention is described in detail, it will be understood by those within the art that:It still can be to foregoing each
Technical scheme described in embodiment is modified, or carries out equivalent substitution to which part technical characteristic;And these are changed
Or replace, the essence of appropriate technical solution is departed from the spirit and scope of various embodiments of the present invention technical scheme.
Claims (12)
- A kind of 1. data compression method, it is characterised in that including:Former data are divided into multiple data blocks;Detect the similitude of the multiple data block;Similar data block is migrated into restructuring successively, generates recombination data;The recombination data is compressed, generates compressed data.
- 2. according to the method for claim 1, it is characterised in that it is described former data are divided into multiple data blocks after, institute Before stating the similitude for detecting the multiple data block, methods described also includes:Record order, skew and the block length of the multiple data block, generation original spectrum.
- 3. according to the method for claim 2, it is characterised in that similar data block is migrated into restructuring successively described, it is raw Described to be compressed the recombination data into after recombination data, before generating compressed data, methods described also includes:According to the recombination data, the skew of the multiple data block is updated, obtains new original spectrum.
- 4. according to the method in any one of claims 1 to 3, it is characterised in that described to move similar data block successively Restructuring is moved, generates recombination data, including:The similar data block is migrated into restructuring successively, generates multiple similar chained lists;According to the multiple similar chained list, data block contents corresponding to reading, generate recombination data from the former data.
- 5. according to the method for claim 4, it is characterised in that the similitude of the multiple data block of detection, including:The similitude of multiple data blocks is detected by super method of characteristic, Simhash or Minhash methods.
- A kind of 6. data decompression method, it is characterised in that including:Depressurizing compression data and compressed file spectrum, respectively obtain recombination data and new original spectrum;According to order, skew and the block length of multiple data blocks of the new original spectrum record, respectively from the recombination data Read the multiple data block;According to the order of the multiple data block of the new original spectrum record, the multiple data block is write successively, is obtained Former data.
- A kind of 7. data compression system, it is characterised in that including:Blocking unit, for former data to be divided into multiple data blocks;Detection unit, for detecting the similitude of the multiple data block;Recomposition unit, for similar data block to be migrated into restructuring successively, generate recombination data;Compression unit, for the recombination data to be compressed, generate compressed data.
- A kind of 8. data decompression system, it is characterised in that including:Decompression units, composed for depressurizing compression data and compressed file, respectively obtain recombination data and new original spectrum;Reading unit, order, skew and the block length of multiple data blocks for composing record according to the new original, respectively from institute State and the multiple data block is read in recombination data;Writing unit, the order of the multiple data block for composing record according to the new original, writes described more successively Individual data block, obtain former data.
- 9. a kind of computer installation, it is characterised in that including processor, the processor is performing the meter of storage on a memory During calculation machine program, for realizing the step in the data compression method as described in any one of claim 1 to 5.
- 10. a kind of computer installation, it is characterised in that including processor, the processor stores on a memory in execution During computer program, for realizing the step in data decompression method as claimed in claim 6.
- 11. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program When being executed by processor, for realizing the step in the data compression method as described in any one of claim 1 to 5.
- 12. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program When being executed by processor, for realizing the step in data decompression method as claimed in claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710884914.2A CN107682016B (en) | 2017-09-26 | 2017-09-26 | Data compression method, data decompression method and related system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710884914.2A CN107682016B (en) | 2017-09-26 | 2017-09-26 | Data compression method, data decompression method and related system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107682016A true CN107682016A (en) | 2018-02-09 |
CN107682016B CN107682016B (en) | 2021-09-17 |
Family
ID=61137381
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710884914.2A Active CN107682016B (en) | 2017-09-26 | 2017-09-26 | Data compression method, data decompression method and related system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107682016B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108427538A (en) * | 2018-03-15 | 2018-08-21 | 深信服科技股份有限公司 | Storage data compression method, device and the readable storage medium storing program for executing of full flash array |
CN110083743A (en) * | 2019-03-28 | 2019-08-02 | 哈尔滨工业大学(深圳) | A kind of quick set of metadata of similar data detection method based on uniform sampling |
CN110781155A (en) * | 2019-10-18 | 2020-02-11 | 赛尔网络有限公司 | Data storage reading method, system, equipment and medium based on IPFS |
CN110888918A (en) * | 2019-11-25 | 2020-03-17 | 湖北工业大学 | Similar data detection method and device, computer equipment and storage medium |
CN111984615A (en) * | 2020-08-04 | 2020-11-24 | 中国人民银行数字货币研究所 | Method, device and system for sharing files |
CN112099725A (en) * | 2019-06-17 | 2020-12-18 | 华为技术有限公司 | Data processing method and device and computer readable storage medium |
CN112665886A (en) * | 2020-12-11 | 2021-04-16 | 浙江中控技术股份有限公司 | Data conversion method for high-frequency original data of vibration measurement of large-scale rotating machinery |
WO2022206334A1 (en) * | 2021-03-30 | 2022-10-06 | 华为技术有限公司 | Data compression method and apparatus |
CN115858478A (en) * | 2023-02-24 | 2023-03-28 | 山东中联翰元教育科技有限公司 | Data rapid compression method of interactive intelligent teaching platform |
CN118337221A (en) * | 2024-06-13 | 2024-07-12 | 陕西颐刚盛讯科技有限责任公司 | Network security data transmission method |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101667843A (en) * | 2009-09-22 | 2010-03-10 | 中兴通讯股份有限公司 | Methods and devices for compressing and uncompressing data of embedded system |
CN102065098A (en) * | 2010-12-31 | 2011-05-18 | 网宿科技股份有限公司 | Method and system for synchronizing data among network nodes |
CN102737132A (en) * | 2012-06-25 | 2012-10-17 | 天津神舟通用数据技术有限公司 | Multi-rule combined compression method based on database row and column mixed storage |
CN103020317A (en) * | 2013-01-10 | 2013-04-03 | 曙光信息产业(北京)有限公司 | Device and method for data compression based on data deduplication |
CN103067022A (en) * | 2012-12-19 | 2013-04-24 | 中国石油天然气集团公司 | Nondestructive compressing method, uncompressing method, compressing device and uncompressing device for integer data |
CN104142924A (en) * | 2013-05-06 | 2014-11-12 | 中国移动通信集团福建有限公司 | Method and device for compressing flash picture format |
CN104283567A (en) * | 2013-07-02 | 2015-01-14 | 北京四维图新科技股份有限公司 | Method for compressing or decompressing name data, and equipment thereof |
CN105204781A (en) * | 2015-09-28 | 2015-12-30 | 华为技术有限公司 | Compression method, device and equipment |
CN107087184A (en) * | 2017-04-28 | 2017-08-22 | 华南理工大学 | A kind of multi-medium data recompression method |
US9767154B1 (en) * | 2013-09-26 | 2017-09-19 | EMC IP Holding Company LLC | System and method for improving data compression of a storage system in an online manner |
CN107251438A (en) * | 2015-02-16 | 2017-10-13 | 三菱电机株式会社 | Data compression device, data decompression device, data compression method, uncompressing data and program |
-
2017
- 2017-09-26 CN CN201710884914.2A patent/CN107682016B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101667843A (en) * | 2009-09-22 | 2010-03-10 | 中兴通讯股份有限公司 | Methods and devices for compressing and uncompressing data of embedded system |
CN102065098A (en) * | 2010-12-31 | 2011-05-18 | 网宿科技股份有限公司 | Method and system for synchronizing data among network nodes |
CN102737132A (en) * | 2012-06-25 | 2012-10-17 | 天津神舟通用数据技术有限公司 | Multi-rule combined compression method based on database row and column mixed storage |
CN103067022A (en) * | 2012-12-19 | 2013-04-24 | 中国石油天然气集团公司 | Nondestructive compressing method, uncompressing method, compressing device and uncompressing device for integer data |
CN103020317A (en) * | 2013-01-10 | 2013-04-03 | 曙光信息产业(北京)有限公司 | Device and method for data compression based on data deduplication |
CN104142924A (en) * | 2013-05-06 | 2014-11-12 | 中国移动通信集团福建有限公司 | Method and device for compressing flash picture format |
CN104283567A (en) * | 2013-07-02 | 2015-01-14 | 北京四维图新科技股份有限公司 | Method for compressing or decompressing name data, and equipment thereof |
US9767154B1 (en) * | 2013-09-26 | 2017-09-19 | EMC IP Holding Company LLC | System and method for improving data compression of a storage system in an online manner |
CN107251438A (en) * | 2015-02-16 | 2017-10-13 | 三菱电机株式会社 | Data compression device, data decompression device, data compression method, uncompressing data and program |
CN105204781A (en) * | 2015-09-28 | 2015-12-30 | 华为技术有限公司 | Compression method, device and equipment |
CN107087184A (en) * | 2017-04-28 | 2017-08-22 | 华南理工大学 | A kind of multi-medium data recompression method |
Non-Patent Citations (1)
Title |
---|
蔡明 等: "一种新的数据无损压缩编码方法", 《电子与信息学报》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108427538A (en) * | 2018-03-15 | 2018-08-21 | 深信服科技股份有限公司 | Storage data compression method, device and the readable storage medium storing program for executing of full flash array |
CN110083743A (en) * | 2019-03-28 | 2019-08-02 | 哈尔滨工业大学(深圳) | A kind of quick set of metadata of similar data detection method based on uniform sampling |
CN110083743B (en) * | 2019-03-28 | 2021-11-16 | 哈尔滨工业大学(深圳) | Rapid similar data detection method based on unified sampling |
CN112099725A (en) * | 2019-06-17 | 2020-12-18 | 华为技术有限公司 | Data processing method and device and computer readable storage medium |
WO2020253406A1 (en) * | 2019-06-17 | 2020-12-24 | 华为技术有限公司 | Data processing method and device, and computer readable storage medium |
US11797204B2 (en) | 2019-06-17 | 2023-10-24 | Huawei Technologies Co., Ltd. | Data compression processing method and apparatus, and computer-readable storage medium |
EP3896564A4 (en) * | 2019-06-17 | 2022-04-13 | Huawei Technologies Co., Ltd. | Data processing method and device, and computer readable storage medium |
CN110781155B (en) * | 2019-10-18 | 2022-06-24 | 赛尔网络有限公司 | Data storage reading method, system, equipment and medium based on IPFS |
CN110781155A (en) * | 2019-10-18 | 2020-02-11 | 赛尔网络有限公司 | Data storage reading method, system, equipment and medium based on IPFS |
CN110888918A (en) * | 2019-11-25 | 2020-03-17 | 湖北工业大学 | Similar data detection method and device, computer equipment and storage medium |
CN111984615A (en) * | 2020-08-04 | 2020-11-24 | 中国人民银行数字货币研究所 | Method, device and system for sharing files |
CN111984615B (en) * | 2020-08-04 | 2024-05-28 | 中国人民银行数字货币研究所 | File sharing method, device and system |
US12086107B2 (en) | 2020-08-04 | 2024-09-10 | Digital Currency Institute, The People's Bank Of China | File sharing method, apparatus, and system |
CN112665886A (en) * | 2020-12-11 | 2021-04-16 | 浙江中控技术股份有限公司 | Data conversion method for high-frequency original data of vibration measurement of large-scale rotating machinery |
WO2022206334A1 (en) * | 2021-03-30 | 2022-10-06 | 华为技术有限公司 | Data compression method and apparatus |
CN115858478A (en) * | 2023-02-24 | 2023-03-28 | 山东中联翰元教育科技有限公司 | Data rapid compression method of interactive intelligent teaching platform |
CN115858478B (en) * | 2023-02-24 | 2023-05-12 | 山东中联翰元教育科技有限公司 | Data rapid compression method of interactive intelligent teaching platform |
CN118337221A (en) * | 2024-06-13 | 2024-07-12 | 陕西颐刚盛讯科技有限责任公司 | Network security data transmission method |
CN118337221B (en) * | 2024-06-13 | 2024-09-03 | 陕西颐刚盛讯科技有限责任公司 | Network security data transmission method |
Also Published As
Publication number | Publication date |
---|---|
CN107682016B (en) | 2021-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107682016A (en) | A kind of data compression method, data decompression method and related system | |
CN108427538B (en) | Storage data compression method and device of full flash memory array and readable storage medium | |
CN107506153B (en) | Data compression method, data decompression method and related system | |
CN107046812B (en) | Data storage method and device | |
US9514178B2 (en) | Table boundary detection in data blocks for compression | |
US20140258248A1 (en) | Delta Compression of Probabilistically Clustered Chunks of Data | |
US20130282677A1 (en) | Data compression system for dna sequence | |
CN107305586B (en) | Index generation method, index generation device and search method | |
CN107111623A (en) | Parallel historical search and coding for the compression based on dictionary | |
CN111125033B (en) | Space recycling method and system based on full flash memory array | |
CN108027713A (en) | Data de-duplication for solid state drive controller | |
CN103236847A (en) | Multilayer Hash structure and run coding-based lossless compression method for data | |
US10078646B2 (en) | Hardware efficient fingerprinting | |
CN103838753A (en) | Storage and verification method and device for exchange codes | |
US10534755B2 (en) | Word, phrase and sentence deduplication for text repositories | |
CN109947731A (en) | The delet method and device of repeated data | |
CN111124939A (en) | Data compression method and system based on full flash memory array | |
CN111124940A (en) | Space recovery method and system based on full flash memory array | |
CN111124259A (en) | Data compression method and system based on full flash memory array | |
US12124420B2 (en) | Systems, methods and devices for eliminating duplicates and value redundancy in computer memories | |
US9176973B1 (en) | Recursive-capable lossless compression mechanism | |
CN112395275A (en) | Data deduplication via associative similarity search | |
CN111198857A (en) | Data compression method and system based on full flash memory array | |
Xue et al. | A comprehensive study of present data deduplication | |
EP3051699B1 (en) | Hardware efficient rabin fingerprints |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |