CN107423321A

CN107423321A - It is applicable the method and its device of high-volume small documents cloud storage

Info

Publication number: CN107423321A
Application number: CN201710206089.0A
Authority: CN
Inventors: 郑晟
Original assignee: Shanghai Feixun Data Communication Technology Co Ltd
Current assignee: Beijing yizhiyun Technology Co., Ltd
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2017-12-01
Anticipated expiration: 2037-03-31
Also published as: CN107423321B

Abstract

The present invention relates to a kind of method and its device for being applicable high-volume small documents cloud storage, its method includes S1：File uploads：Obtain handling file and upload after file is carried out into alternative splicing according to file connecting method；S2：Establish or search save mesh：It is more than the data block of given threshold using a size, data block is divided into multiple equal-sized save mesh, or search the data block that file is handled in the remaining enough storing step S1 of save mesh；S3：Processing file in step S1 is parsed and is stored in step S2 in one or more save mesh of data block, the present invention uploads small documents by the method for merging, reduces connection and establishes number, improves uploading rate；Each file is stored using suitable memory space, and file renewal, the access for being are more convenient.

Description

It is applicable the method and its device of high-volume small documents cloud storage

Technical field

The invention belongs to Computer Applied Technology field, more particularly to a kind of method for being applicable high-volume small documents cloud storage And its device.

Background technology

The cloud storage of file, upload use RESTful api interfaces, are substantially to use http protocol, if each file Uploading successively, substantial amounts of time loss is establishing HTTP connection procedures, and actual data transfer is time-consuming seldom, therefore, it is necessary to Reduce HTTP connections and establish number.In addition, during storage, large amount of small documents occupies substantial amounts of storage resource, both uneconomical, and shadow Access performance is rung, in order to solve the above-mentioned technical problem there has been proposed following scheme：

First, to the optimisation strategy of mass small documents, typically can all include small documents to merge storage, such as The haystack and the TFS of Taobao that Facebook increases income employ such optimisation strategy, and inode is reduced by merging file Quantity, that is, metadata number reduce, reaching can substantially be fully loaded in internal memory, it is possible to achieve a disk IO cans read file, greatly promote the performance for reading file, solve the problems, such as the reading performance of mass small documents.It is overall Thinking is by establishing secondary index, fills multiple small documents using a big data block, the upload of small documents is converted to The upload of medium-sized file, the addressing of small documents are converted to the addressing of data block and the addressing of data block bias internal amount.

2nd, Chinese patent discloses a kind of a kind of read-write solution method [application number of millions small documents data： CN201410560613.0], this programme is come by using the mode for opening up the continuous disk space of bulk when storing small documents Large amount of small documents is stored, that is, continuous data in logic are stored on the continuous space of disk array as much as possible；Magnetic Disk space is divided into multiple pieces, and the size of each block is 64KB, and basic thought is：Each small documents can only be stored in single piece In, it is impossible to deposited across 2 blocks, each file will possess one or more blocks, and these blocks all only deposit this file Data, each file data are stored on continuous disk space；The program is by will continuous data are as far as possible in logic The continuous space of physical disk is stored in, the role of meta data server is served as and by simplified file using cache technologies Information node improves cache utilization rates, improves small documents access performance.

In such scheme, the former using Piece file mergence method, although it can reduce number of files, due to Piece file mergence is that byte sequence is continuous, if some file needs to change, cost is bigger, therefore this method is only suitable for In the less occasion of renewal；And the latter, using the method for continuous disk space, equally exist certain defect：In cloud storage When, due to the virtualization of storage, thus realize during coming into operation relatively difficult.

The content of the invention

Regarding the issue above, the present invention provides one kind can merge small documents upload, improve and upload The method for being applicable high-volume small documents cloud storage of speed；

It is another object of the present invention in view of the above-mentioned problems, provide it is a kind of can reduce connection number applicable high-volume it is small The device of file cloud storage；

To reach above-mentioned purpose, present invention employs following technical proposal：

The method that the present invention is applicable high-volume small documents cloud storage, comprises the following steps：

S1：File uploads：Obtain handling file and upload after file is carried out into alternative splicing according to file connecting method；

S2：Establish or search save mesh：It is more than the data block of given threshold using a size, data block is divided into Multiple equal-sized save mesh, or search the data block that file is handled in the remaining enough storing step S1 of save mesh；

S3：Processing file in step S1 is parsed and is stored in the one or more of data block in step S2 and is stored In grid.

Pass through above-mentioned technical proposal, small documents are merged into upload, and make use of the save mesh of data block to merge File stored, and for each processing file, can quickly and accurately know its specific grid in data block, update Handle very convenient.

It is applicable in above-mentioned in the method for high-volume small documents cloud storage, in step sl, described file connecting method For：

The size of file is differentiated, when the size of file is more than pre-set dimension, then it is assumed that be non-small documents, in independence Pass；When the size of file is less than or equal to pre-set dimension, then it is assumed that be small documents, it is necessary to splice；

Wherein, the mode of splicing is：Small documents are combined into medium-sized processing file of the full-size no more than given threshold.

It is applicable in the method for high-volume small documents cloud storage in above-mentioned, in step sl, is also performed before being spliced Following steps：

Prejudge after current small documents are incorporated to, whether the medium-sized processing document size of synthesis exceedes given threshold, if it does, Then without splicing；Otherwise, splice.

It is applicable in the method for high-volume small documents cloud storage in above-mentioned, is synthesized after anticipation is incorporated to current small documents Medium-sized processing document size exceedes given threshold, then is labeled as waiting by current small documents while splicing is decided not to perform Splicing file generates another medium-sized processing file to wait splicing with other small documents.

It is applicable in above-mentioned in the method for high-volume small documents cloud storage, in step s 2, each file that handles only stores In a data block, and when a multiple save mesh for handling file one data block of occupancy, the processing file takes Save mesh can be continuous save mesh, or discontinuous save mesh.

It is applicable in above-mentioned in the method for high-volume small documents cloud storage, to corresponding number when save mesh is established Data block specifications record sheet is established according to the size of its save mesh according to block, and is picked when grid is stored up for processing ff Except the data block that memory space is full.

It is applicable in above-mentioned in the method for high-volume small documents cloud storage, in step s 2, when in the presence of the place for needing to store When managing file, the data block with suitable save mesh is preferentially searched or established according to currently processed file to currently processed text Part is stored.

It is applicable in above-mentioned in the method for high-volume small documents cloud storage, it is in step s 2, further comprising the steps of:

S2-1：An a reference value Y is set, if pending file is a reference value Y integral multiple,

Then, save mesh size=Y × (processing document size/Y)；

Otherwise, save mesh size=Y × (int (processing document size/Y)+1)；

Wherein int function representations round.

S2-2：Processing file polling according to storage is currently needed for handles file data blocks relation table, checks processing text Whether part is stored, if record is not present, then it represents that is new file, is performed step S2-4；If record is present, table It is shown as needing the file updated, performs step S2-3；

S2-3:Obtain the data block that the processing file originally stored, and the save mesh chi calculated according to step S2-1 It is very little compared with the save mesh size of the original of the processing file, when the processing file is with what original needed to use During identical save mesh size, corresponding save mesh in original data block is directly updated；When the storage file and original text When part uses different save mesh sizes, the memory space that save mesh is corresponded in original data block is first discharged, is then held Row step S2-4；

S2-4：Data block of all save mesh sizes equal to calculating gained save mesh size in step S2-1 is searched, These data blocks are traveled through, the data block until finding a free remaining save mesh, the processing file are then stored in the data In the vacant save mesh of some of block, data table related is then updated；Data block if appropriate is not present, then using a number According to block, multiple sizes are divided into equal to the save mesh that gained save mesh size is calculated in step S2-1.

It is applicable in above-mentioned in the method for high-volume small documents cloud storage, in step S2-2, relative path is combined into text Part name is inquired about as query object to check that file whether there is.

A kind of device for being applicable high-volume small documents cloud storage using the method for being applicable high-volume small documents cloud storage.

The present invention the above-mentioned method and device for being applicable high-volume small documents cloud storage compared to prior art have with Lower advantage：1st, small documents can be uploaded using the mode of Piece file mergence, reduces connection number, improve uploading rate；2nd, introduce The data block of different save mesh specifications stores small documents, can farthest utilize memory space, while cause file Renewal access is more convenient.

Brief description of the drawings

Fig. 1 is the workflow diagram of the embodiment of the present invention one；

Fig. 2 is the partial duty flow figure of the embodiment of the present invention one；

Fig. 3 is the partial duty flow figure of the embodiment of the present invention two.

Embodiment

The present invention can be established secondary available for being stored with carrying out higher efficiency to high-volume small documents by the present invention for reduction Number, easy-to-look-up and more new file.It is the preferred embodiments of the present invention and with reference to accompanying drawing below, technical scheme is made Further description, but the present invention is not limited to these embodiments.

Embodiment one

Such as Fig. 1, present embodiment discloses a kind of method for being applicable high-volume small documents cloud storage, comprise the following steps：

Specifically, as Fig. 2, file connecting method are：

The size of file is differentiated, when the size of file is more than pre-set dimension, then it is assumed that be non-small documents, in independence Pass；When the size of file is less than or equal to pre-set dimension, then it is assumed that be small documents, it is necessary to splice；Due to generic-document size File more than 512k is usually not considered as small documents, so the pre-set dimension of the present embodiment is set to 512k.

Wherein, the method for splicing is：Small documents are combined into medium-sized processing file of the full-size no more than given threshold；

Specific joining method is as follows：

File path：512 bytes；

File name：512 bytes；

File size:4 bytes

File content：0~512K.

Spliced according to lower column format：

Certainly, it is previously required to first prejudge the size of file in splicing：After judging that current small documents are incorporated to, synthesis Whether medium-sized processing document size exceedes given threshold (such as 1M), if it does, then without splicing；Otherwise, splice.This Sample, the processing file that a size is no more than 1M is obtained every time, this data block is uploaded, small documents can both be spliced, Ensure that spliced file is not excessive simultaneously.

Further, the medium-sized processing document size synthesized after anticipation is incorporated to current small documents exceedes given threshold , then while splicing is decided not to perform by current small documents labeled as etc. file to be spliced with wait and other small documents spell Deliver a child into another medium-sized processing file.

S2：Save mesh is established or searched according to the size of processing file：For speed up processing, net is stored establishing Data block specifications record sheet established according to the size of its save mesh to corresponding data block when lattice, and to handle file Memory space full data block is rejected when searching storage grid, in order to make full use of resource, the present embodiment is preferentially used and looked into The mode looked for, of course for the memory space for further fully utilizing data block, search has enough residues as much as possible The N times of size equal or close to processing file of capacity and save mesh specification, wherein N is integer, to make full use of each The memory space of save mesh.

A chi is used when the data block with the save mesh that can be stored for handling file is not found The very little data block more than given threshold, data block is divided into multiple equal-sized save mesh, similarly, draws data block N times of the save mesh specification being divided into is equal to the size of processing file, and wherein N is integer.

Further, in order to inquire about the convenience of renewal, each file that handles only is stored in a data block, and when one When handling multiple save mesh of file one data block of occupancy, the save mesh that the processing file takes can be continuously to deposit Store up grid, or discontinuous save mesh, because each save mesh is respectively provided with the numbering of their own, so even if place It is discontinuous only it is to be understood that the numbering of its save mesh occupied to manage the save mesh that file takes.

S3：File stores：Processing file in step S1 is parsed and is stored in one of data block in step S2 Or in multiple save mesh.

Data block of the present embodiment using size as 8M is specifically described：

Data block is divided into multiple save mesh that size is 8k herein, it is known that the quantity of save mesh is 1024, Certainly, in order to adapt to the processing file of all size, data block divides the size dynamically changeable of the save mesh formed.It is below Several tables of data related to save mesh and data block that system is established：

Table 1：Data block information table：blockinfo

The table is used for the use information of record data block, the save mesh number used at present, if usednum is 8M/ Gridsize, represent that the data block is used completely；If insufficient, free save mesh is represented, can be continuing with.

Table 2：Handle file data blocks relation table：fileblock

The table is used to record the data block where processing file.

Table 3：Handle file save mesh information table：filegrid

The table have recorded the information that file takes save mesh, and the numeral order of the save mesh of composition this document.

In the following example：

The file that file id is 1 is represented, its data block id is 1, and its save mesh taken numbering is 12, its memory space Size gridsize according to corresponding to blockid=1 determine that if gridsize is 8K, this document takes 8K storages Space.

In the present embodiment, the save mesh size gridsize of data block is n times of 8K, 1<=n<=64, n are integer, 64 kinds of specifications altogether.

Small documents are merged upload by the present embodiment, and make use of the save mesh of data block to carry out the file of merging Storage, and for each processing file, it can quickly and accurately know its specific grid in data block, renewal handles very square Just.

Further, the present embodiment is also included in one using the applicable large quantities of of the method for being applicable high-volume small documents cloud storage Measure the device of small documents cloud storage.

Embodiment two

Such as Fig. 3, the present embodiment is similar with embodiment one, and difference is, this implementation in step s 2, needs when existing During the processing file of storage, the data block with suitable save mesh is preferentially searched or established according to currently processed file to working as Pre-processing file is stored.

Suitable save mesh refers to that the memory capacity that currently processed file takes stores just or close to precisely one The capacity that grid provides, i.e., currently processed file can be stored in a save mesh just, certainly when the chi of processing file Very little when be not 8k integral multiple, the specification for the grid established or searched can be slightly larger than processing document size.

That is, the specification for the save mesh that the present embodiment is searched or established is equal or nearly equal to processing file Size.

Suitable save mesh is obtained in a manner of using step S2-1：

Then, save mesh size=(Y) × (processing document size/Y)；

Otherwise, save mesh size=(Y) × (int (processing document size/Y)+1)；

Wherein int function representations round.

The least unit for the save mesh that a reference value Y divides as data block, the storage of the volume data block in the present embodiment Grid can be different size, but each data block and different remaining the save mesh size for being of a reference value are it The integral multiple of a reference value, such as 1 times, 2 times, 3 times etc., so Y meets that below equation meets below equation：Y × N=data block chis Very little, N is integer, for example, for the data block that size is 8M, Y can take 8k, is 8k integral multiple so for size Handle file, it is assumed that integral multiple here is 5 times, i.e. the processing file of 40k sizes, it is possible to marks off one by 5 minimums The save mesh of unit composition, its size is 5 × 8K=40k；And for size it is not the processing file of 8k integral multiple, example Such as, the processing file of 42k sizes, it is possible to mark off a save mesh being made up of 6 least units, its size is 6 × 8K=48k, so, in the case where mass data block and its vacant save mesh be present, different size of processing file is total With the presence of the save mesh of its suitable storage, the memory space of each data block is made full use of.

S2-2：Processing file polling according to storage is currently needed for handles file data blocks relation table, checks processing text Whether part is stored, if record is not present, then it represents that is new file, is performed step S2-4；If record is present, table It is shown as needing the file updated, performs step S2-3；Can carry out simultaneously can also be first laggard by step S2-1 and step S2-2 herein OK.

S2-4：Data block of all save mesh sizes equal to calculating gained save mesh size in step S2-1 is searched, These data blocks are traveled through, the data block until finding a free remaining save mesh, the processing file are then stored in the data In the vacant save mesh of some of block, data table related is then updated, data table related includes processing file save mesh information The related forms such as table, processing file data blocks relation table, data block information table；Data block if appropriate is not present, then uses One data block, multiple sizes are divided into equal to the save mesh that gained save mesh size is calculated in step S2-1.

Further, in step S2-2, inquired about using relative path combination filename as query object to check File whether there is.

Processing file is stored in one of data block by the present embodiment as much as possible so that file renewal, access are more just It is prompt.

Specific embodiment described herein is only to spirit explanation for example of the invention.Technology belonging to the present invention is led The technical staff in domain can be made various modifications or supplement to described specific embodiment or be replaced using similar mode Generation, but without departing from the spiritual of the present invention or surmount scope defined in appended claims.

Although the arts such as save mesh, data block, small documents, processing file, medium-sized processing file have more been used herein Language, but it is not precluded from the possibility using other terms.It is used for the purpose of more easily describing and explaining this using these terms The essence of invention；Any additional limitation is construed as all to disagree with spirit of the present invention.

Claims

A kind of 1. method for being applicable high-volume small documents cloud storage, it is characterised in that comprise the following steps：

S1：File uploads：Obtain handling file and upload after file is carried out into alternative splicing according to file connecting method；

S2：Establish or search save mesh：It is more than the data block of given threshold using a size, data block is divided into multiple Equal-sized save mesh, or search the data block that file is handled in the remaining enough storing step S1 of save mesh；

S3：File stores：Processing file in step S1 is parsed and is stored in one or more of data block in step S2 In individual save mesh.
2. the method according to claim 1 for being applicable high-volume small documents cloud storage, it is characterised in that：In step sl, Described file connecting method is：

The size of file is differentiated, when the size of file is more than pre-set dimension, then it is assumed that it is non-small documents, it is independent to upload； When the size of file is less than or equal to pre-set dimension, then it is assumed that be small documents, it is necessary to splice；

Wherein, the mode of splicing is：Small documents are combined into medium-sized processing file of the full-size no more than given threshold.
3. the method according to claim 2 for being applicable high-volume small documents cloud storage, it is characterised in that：In step sl, Following steps are also performed before being spliced：

Prejudge after current small documents are incorporated to, whether the medium-sized processing document size of synthesis exceedes given threshold, if it does, then not Spliced；Otherwise, splice.
4. the method according to claim 3 for being applicable high-volume small documents cloud storage, it is characterised in that：In anticipation to currently The medium-sized processing document size that small documents synthesize after being incorporated to exceedes given threshold, then ought while splicing is decided not to perform Preceding small documents labeled as etc. file to be spliced with wait and other small documents splicing generate another medium-sized processing file.
5. the method according to claim 1 for being applicable high-volume small documents cloud storage, it is characterised in that：In step s 2, Each processing file is only stored in a data block, and when a processing file takes multiple save mesh of a data block When, the save mesh that the processing file takes can be continuous save mesh, or discontinuous save mesh.
6. the method according to claim 1 for being applicable high-volume small documents cloud storage, it is characterised in that：Net is stored establishing Data block specifications record sheet established according to the size of its save mesh to corresponding data block when lattice, and to handle file Memory space full data block is rejected when searching storage grid.
7. the method according to claim 6 for being applicable high-volume small documents cloud storage, it is characterised in that：In step s 2, When in the presence of the processing file for needing to store, preferentially search or establish with suitable save mesh according to currently processed file Data block stores to currently processed file.
8. the method according to claim 7 for being applicable high-volume small documents cloud storage, it is characterised in that：In step s 2, It is further comprising the steps of:

S2-1：An a reference value Y is set, if pending file is a reference value Y integral multiple,

Then, save mesh size=(Y) × (processing document size/Y)；

Otherwise, save mesh size=(Y) × (int (processing document size/Y)+1)；

Wherein int function representations round.

S2-2：Processing file polling according to storage is currently needed for handles file data blocks relation table, checks that the processing file is It is no to be stored, if record is not present, then it represents that be new file, perform step S2-4；If record is present, then it represents that is The file updated is needed, performs step S2-3；

S2-3:Obtain the data block that the processing file originally stored, and according to the save mesh size that step S2-1 is calculated with The save mesh size of the original of the processing file is compared, and what is needed to use when the processing file and original is identical Save mesh size when, directly update original data block in corresponding save mesh；When the storage file makes with original During with different save mesh sizes, the memory space that save mesh is corresponded in original data block is first discharged, then performs step Rapid S2-4；

S2-4：All save mesh sizes are searched equal to the data block that gained save mesh size is calculated in step S2-1, traversal These data blocks, the data block until finding a free remaining save mesh, are then stored in the data block by the processing file In some vacant save mesh, data table related is then updated；Data block if appropriate is not present, then using a data Block, multiple sizes are divided into equal to the save mesh that gained save mesh size is calculated in step S2-1.
9. the method according to claim 8 for being applicable high-volume small documents cloud storage, it is characterised in that：In step S2-2 In, inquired about using relative path combination filename as query object to check that file whether there is.
10. a kind of method for being applicable high-volume small documents cloud storage in 1-9 using claim described in any one is applicable The device of high-volume small documents cloud storage.