CN106446126B - Mass spatial information data storage management method and storage management system - Google Patents
Mass spatial information data storage management method and storage management system Download PDFInfo
- Publication number
- CN106446126B CN106446126B CN201610832422.4A CN201610832422A CN106446126B CN 106446126 B CN106446126 B CN 106446126B CN 201610832422 A CN201610832422 A CN 201610832422A CN 106446126 B CN106446126 B CN 106446126B
- Authority
- CN
- China
- Prior art keywords
- file
- data
- storage
- distributed
- storage management
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000007726 management method Methods 0.000 title claims abstract description 122
- 238000013500 data storage Methods 0.000 title claims abstract description 62
- 238000000034 method Methods 0.000 claims description 12
- 238000013467 fragmentation Methods 0.000 claims description 11
- 238000006062 fragmentation reaction Methods 0.000 claims description 11
- 230000008520 organization Effects 0.000 claims description 10
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 238000000926 separation method Methods 0.000 claims description 3
- 230000003068 static effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 10
- 230000008901 benefit Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 241001478428 Syngnathus Species 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013523 data management Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a storage management method for massive spatial information data, which comprises the following steps: establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data; creating a distributed storage cluster which is controlled by an instruction of a storage management node, takes a file block as a basic unit, stores file data in a distributed manner, is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded; the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm. The invention solves the problem that the existing storage management method can not effectively store and manage the spatial information data with the characteristics of mass, multi-source, isomerism and the like.
Description
Technical Field
The invention relates to the field of research on storage relations of mass spatial data, in particular to a mass spatial information data storage management method and a storage management system.
Background
At present, the development of the satellite technology field in China is well-established, but some regions lack integration in the aspect of satellite application data information, and are seriously insufficient in overall planning and information fusion, wherein for example, the storage and management of spatial information data have many problems, the spatial information data tend to be diversified (space-based, aviation, adjacent space and the like) and have the characteristic of huge data quantity, the data multiple sources cause huge total data quantity, and tens of millions of file data need to be stored; the high resolution characteristic enables a single high-resolution spatial information data file (single data) to reach several GB or even dozens of GB, and it is a necessary trend to build a storage management system which meets the capacity of civil spatial information data with huge total amount and single data capacity.
Disclosure of Invention
The technical problem is that the existing storage management method can not effectively store and manage the spatial information data with the characteristics of mass, multi-source, isomerism and the like;
in view of this, embodiments of the present invention provide a method and a system for storing and managing mass spatial information data, so as to solve the technical problem.
The solution of the problem is as follows:
the invention provides a storage management method for massive spatial information data, which comprises the following steps:
establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data;
creating a distributed storage cluster which is controlled by an instruction of a storage management node, takes a file block as a basic unit, stores file data in a distributed manner, is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm.
Further, the storage management node stores file system metadata of the distributed file system, and records the file system metadata using the editing log file when detecting any operation for modifying the file system metadata;
before the storage management node stores actual file data in the data storage node, the whole file is subjected to data segmentation, the whole file is segmented into file blocks with predefined sizes, and a globally unique handle is allocated to each file block. For example, a file is created in a relational database management system (RDSMS), and a storage management node records log information created by the file in an edit log file (Editlog); the storage management node stores the edit log file (edit log) file in a local file system; file data is pre-cut into file blocks with predefined sizes, and then distributed storage is performed by the file blocks, so that the storage efficiency can be improved; the existence of the file system metadata can enable organization and management of the distributed file system to be relied upon and recorded.
Furthermore, before actual file data is stored in the data storage node, the whole file is subjected to data segmentation to be segmented into file blocks with predefined sizes, and an operation of allocating a globally unique handle to each file block is completed by an external client through the storage management node;
the block size of the file block is defaulted to 64 MB. In the scheme, the default block size reaches 64MB, which is far larger than the block size of a common file system, and the advantage of selecting a larger block size is that the interaction between the client and the storage management node can be reduced; moreover, the client is likely to be aware of multiple operations on a given block, and maintaining a longer connection with a data storage node may reduce network load; meanwhile, the scale of the file system metadata stored on the storage management node can be reduced, so that the file system metadata can be stored in a memory, and the management and storage efficiency can be improved.
Further, all file blocks of the file data are copied into a plurality of copies, the number of the copies is called a copy factor, and the file block size and the copy factor of each file data are configurable. The copy factor can be configured when the file is created and can be changed later; the fault tolerance of the system can be improved by setting the copy factors.
Further, on the basis of a Hash modular algorithm, a redundancy strategy is adopted, a redundancy fragmentation space is added to the data storage nodes, and each file block is copied into a copy;
and if the data storage node is detected to be invalid or disconnected, continuously tracking the file block copy by using the storage management node, and starting the copy of the copy. The storage management node is responsible for managing the copying of the file blocks, and when a part of storage nodes lose contact with the storage management node, the storage management node can mark the storage nodes which cannot be connected as failure states; at this time, the storage management node continuously tracks the file block copy to be copied and starts the copying of the file block copy; when the copy factor of a certain file is reduced, the storage management node selects the surplus file block copy to be deleted, and transmits a copy deletion instruction to a certain storage node in the next heartbeat detection, and at the moment, the storage node removes the corresponding file block copy and releases the space; the tile data security of the system and the reliability of the system are effectively guaranteed by the arrangement of the scheme.
Further, the namespace, the mapping of the block to the file, and the attribute of the file of the distributed file system are all stored in the file system metadata, and the file system metadata is stored in the storage management node. A block is the smallest unit of storage and processing in a database, containing header information data or PL/SQL code for the block itself.
Further, the distributed file system is implemented based on a relational database management system (RDSMS);
when the data storage node is damaged, a file content checksum mechanism is adopted;
when a new relational database management system (RDSMS) file is created, a checksum is calculated for each file block of the file data and the checksum is saved as a separate hidden file to the storage management node. Massive remote sensing data are stored in an independent file in a framing mode, and a relational database management system is responsible for indexing work of file data, file blocks and the like, so that an application system can quickly position the file data to be read; the mode is very convenient for managing mass data and can be used according to
The capacity expansion is realized by continuously increasing storage equipment or storage nodes according to the data capacity requirement. (ii) a The checksum is actually stored in the namespace of the storage management node; the scheme aims at the condition that a file block of a certain storage node is possibly damaged due to storage equipment errors, network errors or software defects of the storage node and the like; in this case, RDSMS adopts a file content checksum mechanism to ensure data integrity.
Furthermore, the distributed file system adopts an information separation mixed indexing algorithm for indexing relevant spatio-temporal information, and an improved hash table is adopted in the algorithm for indexing static information, dynamic historical spatio-temporal information and current and future spatio-temporal information. The scheme solves the problems of cooperativity and survivability of massive spatial information.
Furthermore, the distributed file system adopts a remote sensing spatial data multi-scale organization management mode, and transversely takes high-score application as a basic unit, so that data organization of different spatial geographic resource types is realized; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized. Transversely taking high-resolution application as a basic unit, and further realizing data organization of different space geographic resource body types; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized.
The invention also provides a mass space information data storage management system adopting any one of the systems disclosed by the invention, which comprises the following components:
the distributed file system is established in a master-slave mode;
the storage management node is positioned under the distributed file system and used for carrying out full life cycle management on the spatial information data and storing file system metadata of the distributed file system;
the storage cluster is used for storing file data in a distributed manner by taking a file block as a basic unit under the instruction control of the storage management node, and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
an external client coupled to the storage management node.
The invention has the beneficial effects that: by adopting the technical scheme, the invention can at least obtain the following technical effects: because the distributed file system in the master-slave mode is adopted, the storage capacity has easy expandability, the horizontal expandability of the whole storage system can be easily realized by increasing the capacity of the data storage nodes and/or the data of the data storage nodes, the physical details of the capacity and the number change of the data storage nodes can be shielded for the access application of the data, and the expandable storage requirement of the Syngnathus space information data can be better met; the data storage nodes adopt a distributed storage strategy, so that each data storage node can be ensured to store almost the same amount of data, and meanwhile, the data quantity of the tiles retrieved by each data storage node is approximately the same during parallel retrieval, thereby effectively announcing the concurrent query performance and the load balance of the system; the Hash modular algorithm is a distributed strategy which is simple in principle and easy to implement, and the strategy has the greatest advantage of excellent dispersity and can effectively guarantee the load of a system.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the contents of the embodiments of the present invention and the drawings without creative efforts.
FIG. 1 is a flow chart of a mass spatial information data storage management method according to the present invention;
FIG. 2 is a schematic diagram of the mass spatial information data storage management system of the present invention;
FIG. 3 is a diagram illustrating the software architecture of a mass spatial information data storage management system according to the present invention;
FIG. 4 is a schematic diagram of a network deployment of the storage management system of the present invention.
Throughout the drawings, it should be noted that like reference numerals are used to depict the same or similar elements, features and structures.
Detailed Description
The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the disclosure as defined by the claims and their equivalents. The following description includes various specific details to aid understanding, but these details are to be regarded as illustrative only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the various embodiments described herein can be made without departing from the scope and spirit of the present disclosure. In addition, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.
The terms and words used in the following description and claims are not limited to literature meanings, but are used only by the inventor to enable the disclosure to be clearly and consistently understood. Accordingly, it should be apparent to those skilled in the art that the following descriptions of the various embodiments of the present disclosure are provided for illustration only and not for the purpose of limiting the disclosure as defined by the appended claims and their equivalents.
It is to be understood that the singular forms also include the plural reference unless the context clearly dictates otherwise. Thus, for example, reference to a "component surface" includes reference to one or more such surfaces.
The first embodiment is as follows:
fig. 1 is a flowchart of a method for storing and managing mass spatial information data according to the present invention, and referring to fig. 1, the present invention discloses a method for storing and managing mass spatial information data, which includes the steps of:
s1: establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data;
s2, creating a distributed storage cluster which is controlled by the instruction of the storage management node and takes the file block as a basic unit to store the file data in a distributed way and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
s3: the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm.
The invention has the beneficial effects that: by adopting the technical scheme, the invention can at least obtain the following technical effects: because the distributed file system in the master-slave mode is adopted, the storage capacity has easy expandability, the horizontal expandability of the whole storage system can be easily realized by increasing the capacity of the data storage nodes and/or the data of the data storage nodes, the physical details of the capacity and the number change of the data storage nodes can be shielded for the access application of the data, and the expandable storage requirement of the Syngnathus space information data can be better met; the data storage nodes adopt a distributed storage strategy, so that each data storage node can be ensured to store almost the same amount of data, and meanwhile, the data quantity of the tiles retrieved by each data storage node is approximately the same during parallel retrieval, thereby effectively announcing the concurrent query performance and the load balance of the system; the Hash modular algorithm is a distributed strategy which is simple in principle and easy to implement, and the strategy has the greatest advantage of excellent dispersity and can effectively guarantee the load of a system.
The Hash modular algorithm is a distributed strategy which is simple in principle and easy to implement, and the strategy has the greatest advantage of excellent dispersity and can effectively guarantee the load of a system.
The Hash modular arithmetic adopts a Hash function based on modular arithmetic to carry out modular arithmetic on the key, and the arithmetic result is taken as the entry address of the Hash table corresponding to the key. If N nodes are needed, an object key needs to be uniformly mapped to the N nodes, and the node to which the object key is stored is determined according to the result of Hash (key)% N.
The core idea of the distribution strategy of the invention is to use the Hash modular algorithm for reference: the formula is as follows:
H(key)=(3*row+col)%K
h (key) is the number of the fragment space, K is the maximum number of virtual nodes, K > H (key) > 0, row is the row number of the lower left corner of the tile, and col is the column number of the lower left corner. The fragmentation space number is in the top directory of the tile file tree structure.
Any 3 x 3 tile request thus obtained, where the 9 tile space numbers are all different. This is reasonable according to the existing tile size 512 by 512 pixels and the mainstream display window size 1280 by 1024 pixels.
For mapping of the fragmentation space number and the actual physical node, for example, the number of the actual physical node is n (n < ═ K), the number of the fragmentation space numbers allocated to each node should be between (INT) K/n and (INT) K/(n +1), and the difference between adjacent fragmentation space numbers of each node is n. In the present invention, the maximum value (K) of the number of storage nodes is 255 (this value is estimated according to the actual storage amount, and once the setting cannot be changed), and if n is 9, the fragmentation space number information is as shown in table 1: a space slicing example is shown.
Table 1: space slicing example
And K is 255, and the information of the slicing space number stored by each tile is detailed in the table 2.
Table 2: tile space slicing
50 | 51 | 52 | 53 | 54 | ||||||||||
0 | 1 | 53 | 54 | |||||||||||
0 | 1 | 2 | 3 | 4 | ||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 0 | 1 | ||||
5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 4 | |
8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 3 | 4 | 5 | 6 | 7 | |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 6 | 7 | 8 | 9 | 0 | |
4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 9 | 0 | 1 | 2 | 3 | |
42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 37 | 38 | 39 | 40 | 41 | |
45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 40 | 41 | 42 | 43 | 44 | |
48 | 49 | 50 | 51 | 52 | 53 | 54 | 43 | 44 | 45 | 46 | 47 | |||
51 | 52 | 53 | 54 | 46 | 47 | 48 | 49 | 50 | ||||||
54 | 49 | 50 | 51 | 52 | 53 |
The whole table can be regarded as original remote sensing image data, each cell in the table represents a block of tile image data, and the tile image data can be understood as a large amount of tile data generated after the original image data is subjected to segmentation processing. From the table, it can be found that for any 3 × 3 request, the 9 tile data involved therein are stored on different tile spaces and the tile spaces are numbered consecutively, so that concurrent requests are maximally uniform to each node when requested, and the concurrency degree is the highest and the efficiency is the fastest. Even for n × n requests, the distributed nodes of each tile are relatively dispersed and not concentrated in some nodes, so that the efficiency is guaranteed to some extent. From each row, each row includes all consecutive tile space numbers (0 to 254), and the number of tiles stored in each tile space is equal, so the number of tiles actually stored in each physical node is almost equal. Consequently whole remote sensing image data is through handling the tile data that generates according to the utility model discloses a hash distribution strategy will be by even dispersion to different storage nodes, and every storage node's storage tile data volume also approximately equals, has reached the requirement of query efficiency and load balancing.
Preferably, in this embodiment, the storage management node stores the file system metadata of the distributed file system, and records the file system metadata using the edit log file when detecting any operation that modifies the file system metadata;
before the storage management node stores actual file data in the data storage node, the whole file is subjected to data segmentation, the whole file is segmented into file blocks with predefined sizes, and a globally unique handle is allocated to each file block. The spatial information data has the characteristics of once writing and multiple reading, and once spatial information data products of different levels are processed and generated, the data per se is not allowed to be modified in principle; the file system metadata is helpful for this, for example, a file is created in a relational database management system (RDSMS), and the storage management node records log information of file creation in an edit log file (Editlog); the storage management node stores the edit log file (edit log) file in a local file system; file data is pre-cut into file blocks with predefined sizes, and then distributed storage is performed by the file blocks, so that the storage efficiency can be improved; the existence of the file system metadata can enable organization and management of the distributed file system to be relied upon and recorded.
Preferably, in this embodiment, before actual file data is stored in the data storage node, the whole file is subjected to data segmentation to be segmented into file blocks of a predefined size, and an operation of allocating a globally unique handle to each file block is completed by the external client through the storage management node;
the block size of a file block is defaulted to 64 MB. In the scheme, the default block size reaches 64MB, which is far larger than the block size of a common file system, and the advantage of selecting a larger block size is that the interaction between the client and the storage management node can be reduced; moreover, the client is likely to be aware of multiple operations on a given block, and maintaining a longer connection with a data storage node may reduce network load; meanwhile, the scale of the file system metadata stored on the storage management node can be reduced, so that the file system metadata can be stored in a memory, and the management and storage efficiency can be improved.
Preferably, in this embodiment, all file blocks of the file data are copied into multiple copies, the number of copies is referred to as a copy factor, and the file block size and the copy factor of each file data are configurable. The copy factor can be configured when the file is created and can be changed later; the fault tolerance of the system can be improved by setting the copy factors.
Preferably, in this embodiment, on the basis of the hash modulo algorithm, a redundancy policy is adopted, and by adding a redundancy fragmentation space to the data storage node, each file block is copied into a copy;
and if the data storage node is detected to be invalid or disconnected, continuously tracking the file block copy by using the storage management node, and starting the copy of the copy. When the local storage device or the network device fails, the system has the capability of automatic error detection and rapid and automatic data recovery; in the scheme, the storage management node is responsible for managing the copying of the file blocks, and when a part of storage nodes lose contact with the storage management node, the storage management node can mark the storage nodes which cannot be connected as failure states; at this time, the storage management node continuously tracks the file block copy to be copied and starts the copying of the file block copy; when the copy factor of a certain file is reduced, the storage management node selects the surplus file block copy to be deleted, and transmits a copy deletion instruction to a certain storage node in the next heartbeat detection, and at the moment, the storage node removes the corresponding file block copy and releases the space; the tile data security of the system and the reliability of the system are effectively guaranteed by the arrangement of the scheme.
For the redundancy strategy mentioned therein:
the distributed strategy based on the Hash modular method can well guarantee the load balance of the system, but the design of the safety and the reliability of the system is lacked, namely once a certain storage node fails, the tile data of the corresponding storage node is lost, and huge loss is brought to the system. Most distributed storage systems tend to adopt a fault-tolerant method to increase the reliability of the system. So-called fault tolerance is to say fault tolerance, which allows a system to fail and requires that related functions and services do not fail when a fault occurs, and the fault tolerance function can be designed usually in a copy form, and the essence of the method is a redundancy strategy.
The basic idea of the redundancy strategy is to design a distributed redundancy deployment scheme of 'fragmentation space copy' on the basis of a Hash modular algorithm, namely, the redundancy fragmentation space is added to a physical node to add a copy to the system, so that the fault tolerance of the system is effectively improved, and the reliability and the safety of the distributed system are ensured. The specific method comprises the following steps: and adding two copies to the fragmentation space number obtained by the Hash modulo, and respectively placing the two copies into the previous node and the next node. Thus, each tile data will store three thirds of data on three nodes, with two copies, and if the primary physical node corresponding to the space slice fails, the data can be obtained from the other two standby physical nodes, and the space slice number of each node also has the original one list to be changed into three lists, as described in table 3.
Table 3: slicing space number table corresponding to ith physical storage node
Copies of i-1 nodes | i node | Copies of the i +1 node |
(i-1)-1 | i-1 | (i+1)-1 |
(i-1)-1+n | i-1+n | (i+1)-1+n |
(i-1)-1+2n | i-1+2n | (i+1)-1+2n |
(i-1)-1+3n | i-1+3n | (i+1)-1+3n |
… | … | … |
(i-1)-1+kn | i-1+kn | (i+1)-1+kn |
The multi-resolution tile data structure is a common strategy for storage and management of mass remote sensing image data at present, the strategy fully utilizes a multi-resolution image pyramid and an image blocking technology, can be effectively applied to seamless organization and visualization of the mass remote sensing data, and solves the real expression and presentation of the real world based on images; the high-resolution satellite remote sensing technology with the characteristics of high spatial resolution, high temporal resolution or high spectral resolution provides abundant data sources for remote sensing quantification, dynamism, networking, practicability and industrialization and extraction of ground feature characteristics by using remote sensing data.
Preferably, in this embodiment, the namespace, the block-to-file mapping, and the attributes of the file of the distributed file system are all stored in the file system metadata, and the file system metadata is stored in the storage management node. A block is the smallest unit of storage and processing in a database, containing header information data or PL/SQL code for the block itself.
Preferably, the distributed file system is implemented based on a relational database management system (RDSMS);
when the data storage node is damaged, a file content checksum mechanism is adopted;
when a new relational database management system (RDSMS) file is created, a checksum is calculated for each file block of the file data and the checksum is saved as a separate hidden file to the storage management node. Massive remote sensing data are stored in an independent file in a framing mode, and a relational database management system is responsible for indexing work of file data, file blocks and the like, so that an application system can quickly position the file data to be read; the mode is very convenient for managing mass data, and capacity expansion can be realized by continuously increasing storage equipment or storage nodes according to the data capacity requirement. (ii) a The checksum is actually stored in the namespace of the storage management node; the scheme aims at the condition that a file block of a certain storage node is possibly damaged due to storage equipment errors, network errors or software defects of the storage node and the like; in this case, RDSMS adopts a file content checksum mechanism to ensure data integrity.
Preferably, the distributed file system employs an information separation hybrid indexing algorithm for indexing relevant spatiotemporal information, which employs an improved hash table for indexing static information, dynamic historical spatiotemporal information, current and future spatiotemporal information. The scheme solves the problems of cooperativity and survivability of massive spatial information.
Preferably, in the embodiment, the distributed file system adopts a remote sensing spatial data multi-scale organization management mode, and transversely takes high-score application as a basic unit, so as to realize data organization of different spatial geographic resource body types; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized. Transversely taking high-resolution application as a basic unit, and further realizing data organization of different space geographic resource body types; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized.
Example two:
fig. 2 is a schematic diagram of a mass spatial information data storage management system according to the present invention, and as can be seen from fig. 2, the present invention further provides a mass spatial information data storage management system according to any one of the disclosure of the present invention, including:
the distributed file system 1 is established in a master-slave mode;
the storage management node 10 is positioned under the distributed file system and used for carrying out full life cycle management on the spatial information data and storing file system metadata of the distributed file system;
the storage cluster 20 is used for distributed storage of file data by taking a file block as a basic unit under the instruction control of the storage management node, and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
an external client coupled to the storage management node.
The invention has the beneficial effects that: by adopting the technical scheme, the invention can at least obtain the following technical effects: because the distributed file system in the master-slave mode is adopted, the storage capacity has easy expandability, the horizontal expandability of the whole storage system can be easily realized by increasing the capacity of the data storage nodes and/or the data of the data storage nodes, the physical details of the capacity and the number change of the data storage nodes can be shielded for the access application of the data, and the expandable storage requirement of the Syngnathus space information data can be better met; the data storage nodes adopt a distributed storage strategy, so that each data storage node can be ensured to store almost the same amount of data, and meanwhile, the data quantity of the tiles retrieved by each data storage node is approximately the same during parallel retrieval, thereby effectively announcing the concurrent query performance and the load balance of the system; the Hash modular algorithm is a distributed strategy which is simple in principle and easy to implement, and the strategy has the greatest advantage of excellent dispersity and can effectively guarantee the load of a system.
Fig. 3 is a software architecture diagram of a mass spatial information data storage management system according to the present invention, and it can be known with reference to fig. 2 and 3 that:
the system also comprises a storage virtualization layer arranged between the storage management node and the storage cluster, wherein the storage virtualization layer stores resource metadata and interface protocols and specifications, the resource metadata comprise heartbeat information of the storage node and the like, and the interface protocols and specifications store a heartbeat information reporting interface of the storage management node and the like, which are shown in detail in figure 3; in addition, as can be known from fig. 3, the data storage node may be a linux server, a uinx server, a windows server, and the like, and besides, fig. 3 also shows some contents of the system, and since the contents are not the main points of the present invention, detailed description is not repeated.
Fig. 4 is a schematic diagram of network deployment of the storage management system of the present invention, and it can be known with reference to fig. 4 that:
a) the client computer is the medium of the massive spatial information data storage and management system. The client computer interacts with a background storage server in a B/S mode, and data is transmitted between the storage server and the client computer; the right unit can log in the own account number on the server through the computer and enjoy the due right of the own.
b) The special line access router is responsible for the special line to access the spatial information data center and provide a data communication link. The private access router is connected with client computers of all ownership units downwards and is connected with a core switch of the spatial information data center upwards.
c) The core switch is a data communication core of the spatial information data center and provides high-speed packet switching capacity for communication between the storage server and the management server. The core switch is connected with the access router of the spatial information data center downwards and is connected with the access switch upwards.
d) The access switch provides sufficient interfaces to provide access capabilities for the storage server and the management server. The access switch is connected with the storage server and the management server upwards and is connected with the core switch downwards.
e) The storage server is responsible for operating the server of the right unit, and the right unit logs in the space information data storage and management system to obtain service by using the account of the right unit through the storage server. The storage server is connected down to the access switch.
f) The management server organizes and manages the storage server, and the management server is connected with the access switch.
It should be noted that the various embodiments of the present disclosure as described above generally relate to the processing of input data and the generation of output data to some extent. This input data processing and output data generation may be implemented in hardware or software in combination with hardware. For example, certain electronic components may be employed in a mobile device or similar or related circuitry for implementing the functions associated with the various embodiments of the present disclosure as described above. Alternatively, one or more processors operating in accordance with stored instructions may implement the functions associated with the various embodiments of the present disclosure as described above. If so, it is within the scope of the present disclosure that these instructions may be stored on one or more non-transitory processor-readable media. Examples of the processor-readable medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, and optical data storage devices. In addition, functional computer programs, instructions, and instruction segments for implementing the present disclosure can be easily construed by programmers skilled in the art to which the present disclosure pertains.
While the disclosure has been shown and described with reference to various embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents.
Claims (7)
1. A mass space information data storage management method comprises the following steps:
establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data;
creating a distributed storage cluster which is controlled by an instruction of a storage management node, takes a file block as a basic unit, stores file data in a distributed manner, is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm;
wherein,
before actual file data is stored in a data storage node through the storage management node, an external client performs data segmentation on the whole file, segments the file into file blocks with predefined sizes, and allocates a globally unique handle to each file block; wherein the block size of the file block is defaulted to 64 MB;
all file blocks of the file data are copied into a plurality of copies;
on the basis of a Hash modular algorithm, a redundancy strategy is adopted, a redundancy fragmentation space is added to a data storage node, and each file block is copied into a copy; if the data storage node is detected to be invalid or disconnected, continuously tracking the file block copy by using the storage management node, and starting the copy of the copy;
the distributed file system adopts an information separation mixed indexing algorithm for indexing related spatio-temporal information, and an improved hash table is adopted in the algorithm for indexing static information, dynamic historical spatio-temporal information and current and future spatio-temporal information.
2. The method for storage management of mass spatial information data as claimed in claim 1, wherein said storage management node stores file system metadata of the distributed file system and records using the edit log file when detecting any operation for modifying the file system metadata.
3. The method for storage management of mass spatial information data as claimed in claim 2, wherein the number of said copies is called copy factor, and the file block size and copy factor of each file data are configurable.
4. The method for storage and management of mass spatial information data according to claim 2, wherein the namespace, block-to-file mapping, and file attributes of the distributed file system are stored in the file system metadata, and the file system metadata is stored in the storage management node.
5. The method for storage and management of mass spatial information data as claimed in claim 1, wherein said distributed file system is implemented based on a relational database management system;
when the data storage node is damaged, a file content checksum mechanism is adopted;
when a new relational database management system file is created, a checksum is calculated for each file block of the file data and stored as a separate hidden file in the storage management node.
6. The method for storing and managing mass spatial information data according to claim 1, wherein the distributed file system adopts a remote sensing spatial data multi-scale organization management mode, and uses high-score application as a basic unit in the transverse direction; the vertical direction is managed with multiple levels of resolution.
7. A mass spatial information data storage management system using the mass spatial information data storage management method according to any one of claims 1 to 6, comprising:
the distributed file system is established in a master-slave mode;
the storage management node is positioned under the distributed file system and used for carrying out full life cycle management on the spatial information data and storing file system metadata of the distributed file system;
the storage cluster is used for storing file data in a distributed manner by taking a file block as a basic unit under the instruction control of the storage management node, and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;
an external client coupled to the storage management node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610832422.4A CN106446126B (en) | 2016-09-19 | 2016-09-19 | Mass spatial information data storage management method and storage management system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610832422.4A CN106446126B (en) | 2016-09-19 | 2016-09-19 | Mass spatial information data storage management method and storage management system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106446126A CN106446126A (en) | 2017-02-22 |
CN106446126B true CN106446126B (en) | 2021-04-20 |
Family
ID=58165641
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610832422.4A Active CN106446126B (en) | 2016-09-19 | 2016-09-19 | Mass spatial information data storage management method and storage management system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106446126B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107301094A (en) * | 2017-05-10 | 2017-10-27 | 南开大学 | The dynamic self-adapting data model inquired about towards extensive dynamic transaction |
CN107423422B (en) * | 2017-08-01 | 2019-09-24 | 武大吉奥信息技术有限公司 | Spatial data distributed storage and search method and system based on grid |
CN107656980B (en) * | 2017-09-07 | 2020-09-22 | 北京神州绿盟信息安全科技股份有限公司 | Method applied to distributed database system and distributed database system |
CN107766645B (en) * | 2017-10-19 | 2020-10-16 | 上海营邑城市规划设计股份有限公司 | Planning BIM pipeline collision detection system storage management device |
CN107943867B (en) * | 2017-11-10 | 2021-11-23 | 中国电子科技集团公司第三十二研究所 | High-performance hierarchical storage system supporting heterogeneous storage |
CN109144966A (en) * | 2018-07-06 | 2019-01-04 | 航天星图科技(北京)有限公司 | A kind of high-efficiency tissue and management method of massive spatio-temporal data |
CN111176584B (en) * | 2019-12-31 | 2023-10-31 | 曙光信息产业(北京)有限公司 | Data processing method and device based on hybrid memory |
CN111414346A (en) * | 2020-04-30 | 2020-07-14 | 武汉众邦银行股份有限公司 | Distributed granulation storage method for massive unstructured data files |
CN111930711B (en) * | 2020-09-10 | 2020-12-29 | 北京志翔科技股份有限公司 | Method, device and equipment for adding nodes to distributed file system cluster |
CN112100146B (en) * | 2020-09-21 | 2021-06-29 | 重庆紫光华山智安科技有限公司 | Efficient erasure correction distributed storage writing method, system, medium and terminal |
CN113094572A (en) * | 2021-04-16 | 2021-07-09 | 中国工商银行股份有限公司 | Service data processing method, device and equipment |
CN113220234B (en) * | 2021-05-17 | 2022-10-21 | 南京林洋电力科技有限公司 | Terminal data storage management method and manager |
CN113793110A (en) * | 2021-07-01 | 2021-12-14 | 科尔比乐(广州)智能装备有限公司 | Industrial equipment data acquisition and analysis method based on cloud computing and cloud service platform |
CN113610484A (en) * | 2021-07-01 | 2021-11-05 | 科尔比乐(广州)智能装备有限公司 | Intelligent manufacturing production execution system L-MES and manufacturing method thereof |
CN114925073B (en) * | 2022-06-14 | 2024-04-16 | 深圳九有数据库有限公司 | Distributed database system supporting flexible dynamic fragmentation and implementation method thereof |
CN116915510B (en) * | 2023-09-13 | 2023-12-01 | 北京数盾信息科技有限公司 | Distributed storage system based on high-speed encryption algorithm |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5423044A (en) * | 1992-06-16 | 1995-06-06 | International Business Machines Corporation | Shared, distributed lock manager for loosely coupled processing systems |
CN102855239B (en) * | 2011-06-28 | 2016-04-20 | 清华大学 | A kind of distributed geographical file system |
CN103793442B (en) * | 2012-11-05 | 2019-05-07 | 北京超图软件股份有限公司 | The processing method and system of spatial data |
CN105608155B (en) * | 2015-12-17 | 2018-09-25 | 北京华油信通科技有限公司 | Mass data distributed memory system |
-
2016
- 2016-09-19 CN CN201610832422.4A patent/CN106446126B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106446126A (en) | 2017-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106446126B (en) | Mass spatial information data storage management method and storage management system | |
CN109241161B (en) | Meteorological data management method | |
KR101677418B1 (en) | Prioritizing data reconstruction in distributed storage systems | |
CN101334797B (en) | Distributed file systems and its data block consistency managing method | |
CN100337218C (en) | Data managing method for network storage system and network storage system constituted thereby | |
KR101813431B1 (en) | Method and system for distributing data in a distributed storage system | |
US9547706B2 (en) | Using colocation hints to facilitate accessing a distributed data storage system | |
CN105025053A (en) | Distributed file upload method based on cloud storage technology and system | |
EP3037988A1 (en) | Configuration method and device for hash database | |
CN103138912B (en) | Method of data synchronization and system | |
CN109407977B (en) | Big data distributed storage management method and system | |
CN109933312B (en) | Method for effectively reducing I/O consumption of containerized relational database | |
CN105468296A (en) | No-sharing storage management method based on virtualization platform | |
CN102982182A (en) | Data storage planning method and device | |
CN113449065A (en) | Data deduplication-oriented decentralized storage method and storage device | |
US20200341639A1 (en) | Lattice layout of replicated data across different failure domains | |
CN104965835A (en) | Method and apparatus for reading and writing files of a distributed file system | |
CN111708894A (en) | Knowledge graph creating method | |
CN107943615B (en) | Data processing method and system based on distributed cluster | |
EP4170499A1 (en) | Data storage method, storage system, storage device, and storage medium | |
CN116389233B (en) | Container cloud management platform active-standby switching system, method and device and computer equipment | |
CN107220003A (en) | A kind of method for reading data and system | |
CN113485644B (en) | IO data storage method and server | |
CN105068896A (en) | Data processing method and device based on RAID backup | |
CN105187489A (en) | File transfer method and system capable of clustering and supporting multiple users to upload simultaneously |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |