CN106446126B

CN106446126B - Mass spatial information data storage management method and storage management system

Info

Publication number: CN106446126B
Application number: CN201610832422.4A
Authority: CN
Inventors: 王景光; 邹同元; 褚鹏飞; 沈洋; 侯伟; 刘宗玥
Original assignee: Harbin Space Star Data System Technology Co ltd
Current assignee: Harbin Space Star Data System Technology Co ltd
Priority date: 2016-09-19
Filing date: 2016-09-19
Publication date: 2021-04-20
Anticipated expiration: 2036-09-19
Also published as: CN106446126A

Abstract

The invention discloses a storage management method for massive spatial information data, which comprises the following steps: establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data; creating a distributed storage cluster which is controlled by an instruction of a storage management node, takes a file block as a basic unit, stores file data in a distributed manner, is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded; the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm. The invention solves the problem that the existing storage management method can not effectively store and manage the spatial information data with the characteristics of mass, multi-source, isomerism and the like.

Description

Mass spatial information data storage management method and storage management system

Technical Field

The invention relates to the field of research on storage relations of mass spatial data, in particular to a mass spatial information data storage management method and a storage management system.

Background

At present, the development of the satellite technology field in China is well-established, but some regions lack integration in the aspect of satellite application data information, and are seriously insufficient in overall planning and information fusion, wherein for example, the storage and management of spatial information data have many problems, the spatial information data tend to be diversified (space-based, aviation, adjacent space and the like) and have the characteristic of huge data quantity, the data multiple sources cause huge total data quantity, and tens of millions of file data need to be stored; the high resolution characteristic enables a single high-resolution spatial information data file (single data) to reach several GB or even dozens of GB, and it is a necessary trend to build a storage management system which meets the capacity of civil spatial information data with huge total amount and single data capacity.

Disclosure of Invention

The technical problem is that the existing storage management method can not effectively store and manage the spatial information data with the characteristics of mass, multi-source, isomerism and the like;

in view of this, embodiments of the present invention provide a method and a system for storing and managing mass spatial information data, so as to solve the technical problem.

The solution of the problem is as follows:

the invention provides a storage management method for massive spatial information data, which comprises the following steps:

establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data;

creating a distributed storage cluster which is controlled by an instruction of a storage management node, takes a file block as a basic unit, stores file data in a distributed manner, is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;

the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm.

Further, the storage management node stores file system metadata of the distributed file system, and records the file system metadata using the editing log file when detecting any operation for modifying the file system metadata;

before the storage management node stores actual file data in the data storage node, the whole file is subjected to data segmentation, the whole file is segmented into file blocks with predefined sizes, and a globally unique handle is allocated to each file block. For example, a file is created in a relational database management system (RDSMS), and a storage management node records log information created by the file in an edit log file (Editlog); the storage management node stores the edit log file (edit log) file in a local file system; file data is pre-cut into file blocks with predefined sizes, and then distributed storage is performed by the file blocks, so that the storage efficiency can be improved; the existence of the file system metadata can enable organization and management of the distributed file system to be relied upon and recorded.

Furthermore, before actual file data is stored in the data storage node, the whole file is subjected to data segmentation to be segmented into file blocks with predefined sizes, and an operation of allocating a globally unique handle to each file block is completed by an external client through the storage management node;

the block size of the file block is defaulted to 64 MB. In the scheme, the default block size reaches 64MB, which is far larger than the block size of a common file system, and the advantage of selecting a larger block size is that the interaction between the client and the storage management node can be reduced; moreover, the client is likely to be aware of multiple operations on a given block, and maintaining a longer connection with a data storage node may reduce network load; meanwhile, the scale of the file system metadata stored on the storage management node can be reduced, so that the file system metadata can be stored in a memory, and the management and storage efficiency can be improved.

Further, all file blocks of the file data are copied into a plurality of copies, the number of the copies is called a copy factor, and the file block size and the copy factor of each file data are configurable. The copy factor can be configured when the file is created and can be changed later; the fault tolerance of the system can be improved by setting the copy factors.

Further, on the basis of a Hash modular algorithm, a redundancy strategy is adopted, a redundancy fragmentation space is added to the data storage nodes, and each file block is copied into a copy;

and if the data storage node is detected to be invalid or disconnected, continuously tracking the file block copy by using the storage management node, and starting the copy of the copy. The storage management node is responsible for managing the copying of the file blocks, and when a part of storage nodes lose contact with the storage management node, the storage management node can mark the storage nodes which cannot be connected as failure states; at this time, the storage management node continuously tracks the file block copy to be copied and starts the copying of the file block copy; when the copy factor of a certain file is reduced, the storage management node selects the surplus file block copy to be deleted, and transmits a copy deletion instruction to a certain storage node in the next heartbeat detection, and at the moment, the storage node removes the corresponding file block copy and releases the space; the tile data security of the system and the reliability of the system are effectively guaranteed by the arrangement of the scheme.

Further, the namespace, the mapping of the block to the file, and the attribute of the file of the distributed file system are all stored in the file system metadata, and the file system metadata is stored in the storage management node. A block is the smallest unit of storage and processing in a database, containing header information data or PL/SQL code for the block itself.

Further, the distributed file system is implemented based on a relational database management system (RDSMS);

when the data storage node is damaged, a file content checksum mechanism is adopted;

when a new relational database management system (RDSMS) file is created, a checksum is calculated for each file block of the file data and the checksum is saved as a separate hidden file to the storage management node. Massive remote sensing data are stored in an independent file in a framing mode, and a relational database management system is responsible for indexing work of file data, file blocks and the like, so that an application system can quickly position the file data to be read; the mode is very convenient for managing mass data and can be used according to

The capacity expansion is realized by continuously increasing storage equipment or storage nodes according to the data capacity requirement. (ii) a The checksum is actually stored in the namespace of the storage management node; the scheme aims at the condition that a file block of a certain storage node is possibly damaged due to storage equipment errors, network errors or software defects of the storage node and the like; in this case, RDSMS adopts a file content checksum mechanism to ensure data integrity.

Furthermore, the distributed file system adopts an information separation mixed indexing algorithm for indexing relevant spatio-temporal information, and an improved hash table is adopted in the algorithm for indexing static information, dynamic historical spatio-temporal information and current and future spatio-temporal information. The scheme solves the problems of cooperativity and survivability of massive spatial information.

Furthermore, the distributed file system adopts a remote sensing spatial data multi-scale organization management mode, and transversely takes high-score application as a basic unit, so that data organization of different spatial geographic resource types is realized; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized. Transversely taking high-resolution application as a basic unit, and further realizing data organization of different space geographic resource body types; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized.

The invention also provides a mass space information data storage management system adopting any one of the systems disclosed by the invention, which comprises the following components:

the distributed file system is established in a master-slave mode;

the storage management node is positioned under the distributed file system and used for carrying out full life cycle management on the spatial information data and storing file system metadata of the distributed file system;

the storage cluster is used for storing file data in a distributed manner by taking a file block as a basic unit under the instruction control of the storage management node, and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;

an external client coupled to the storage management node.

The invention has the beneficial effects that: by adopting the technical scheme, the invention can at least obtain the following technical effects: because the distributed file system in the master-slave mode is adopted, the storage capacity has easy expandability, the horizontal expandability of the whole storage system can be easily realized by increasing the capacity of the data storage nodes and/or the data of the data storage nodes, the physical details of the capacity and the number change of the data storage nodes can be shielded for the access application of the data, and the expandable storage requirement of the Syngnathus space information data can be better met; the data storage nodes adopt a distributed storage strategy, so that each data storage node can be ensured to store almost the same amount of data, and meanwhile, the data quantity of the tiles retrieved by each data storage node is approximately the same during parallel retrieval, thereby effectively announcing the concurrent query performance and the load balance of the system; the Hash modular algorithm is a distributed strategy which is simple in principle and easy to implement, and the strategy has the greatest advantage of excellent dispersity and can effectively guarantee the load of a system.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the contents of the embodiments of the present invention and the drawings without creative efforts.

FIG. 1 is a flow chart of a mass spatial information data storage management method according to the present invention;

FIG. 2 is a schematic diagram of the mass spatial information data storage management system of the present invention;

FIG. 3 is a diagram illustrating the software architecture of a mass spatial information data storage management system according to the present invention;

FIG. 4 is a schematic diagram of a network deployment of the storage management system of the present invention.

Throughout the drawings, it should be noted that like reference numerals are used to depict the same or similar elements, features and structures.

Detailed Description

The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of various embodiments of the disclosure as defined by the claims and their equivalents. The following description includes various specific details to aid understanding, but these details are to be regarded as illustrative only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the various embodiments described herein can be made without departing from the scope and spirit of the present disclosure. In addition, descriptions of well-known functions and constructions may be omitted for clarity and conciseness.

The terms and words used in the following description and claims are not limited to literature meanings, but are used only by the inventor to enable the disclosure to be clearly and consistently understood. Accordingly, it should be apparent to those skilled in the art that the following descriptions of the various embodiments of the present disclosure are provided for illustration only and not for the purpose of limiting the disclosure as defined by the appended claims and their equivalents.

It is to be understood that the singular forms also include the plural reference unless the context clearly dictates otherwise. Thus, for example, reference to a "component surface" includes reference to one or more such surfaces.

The first embodiment is as follows:

fig. 1 is a flowchart of a method for storing and managing mass spatial information data according to the present invention, and referring to fig. 1, the present invention discloses a method for storing and managing mass spatial information data, which includes the steps of:

s1: establishing a distributed file system in a master-slave mode, and a storage management node which is positioned under the distributed file system and is used for carrying out full life cycle management on spatial information data;

s2, creating a distributed storage cluster which is controlled by the instruction of the storage management node and takes the file block as a basic unit to store the file data in a distributed way and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;

s3: the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm.

The Hash modular algorithm is a distributed strategy which is simple in principle and easy to implement, and the strategy has the greatest advantage of excellent dispersity and can effectively guarantee the load of a system.

The Hash modular arithmetic adopts a Hash function based on modular arithmetic to carry out modular arithmetic on the key, and the arithmetic result is taken as the entry address of the Hash table corresponding to the key. If N nodes are needed, an object key needs to be uniformly mapped to the N nodes, and the node to which the object key is stored is determined according to the result of Hash (key)% N.

The core idea of the distribution strategy of the invention is to use the Hash modular algorithm for reference: the formula is as follows:

H(key)＝(3*row+col)％K

h (key) is the number of the fragment space, K is the maximum number of virtual nodes, K > H (key) > 0, row is the row number of the lower left corner of the tile, and col is the column number of the lower left corner. The fragmentation space number is in the top directory of the tile file tree structure.

Any 3 x 3 tile request thus obtained, where the 9 tile space numbers are all different. This is reasonable according to the existing tile size 512 by 512 pixels and the mainstream display window size 1280 by 1024 pixels.

For mapping of the fragmentation space number and the actual physical node, for example, the number of the actual physical node is n (n < ═ K), the number of the fragmentation space numbers allocated to each node should be between (INT) K/n and (INT) K/(n +1), and the difference between adjacent fragmentation space numbers of each node is n. In the present invention, the maximum value (K) of the number of storage nodes is 255 (this value is estimated according to the actual storage amount, and once the setting cannot be changed), and if n is 9, the fragmentation space number information is as shown in table 1: a space slicing example is shown.

Table 1: space slicing example

And K is 255, and the information of the slicing space number stored by each tile is detailed in the table 2.

Table 2: tile space slicing

									50	51	52	53	54
																					0	1	53	54
				0	1	2	3	4
															0	1	2	3	4	5	6	7
2	3	4	5	6	7	8	9	0				0	1
														5	6	7	8	9	0	1	2	3	0	1	2	3	4
8	9	0	1	2	3	4	5	6	3	4	5	6	7
														1	2	3	4	5	6	7	8	9	6	7	8	9	0
4	5	6	7	8	9	0	1	2	9	0	1	2	3

42	43	44	45	46	47	48	49	50	37	38	39	40	41
														45	46	47	48	49	50	51	52	53	40	41	42	43	44
48	49	50	51	52	53	54			43	44	45	46	47
														51	52	53	54						46	47	48	49	50
54									49	50	51	52	53

The whole table can be regarded as original remote sensing image data, each cell in the table represents a block of tile image data, and the tile image data can be understood as a large amount of tile data generated after the original image data is subjected to segmentation processing. From the table, it can be found that for any 3 × 3 request, the 9 tile data involved therein are stored on different tile spaces and the tile spaces are numbered consecutively, so that concurrent requests are maximally uniform to each node when requested, and the concurrency degree is the highest and the efficiency is the fastest. Even for n × n requests, the distributed nodes of each tile are relatively dispersed and not concentrated in some nodes, so that the efficiency is guaranteed to some extent. From each row, each row includes all consecutive tile space numbers (0 to 254), and the number of tiles stored in each tile space is equal, so the number of tiles actually stored in each physical node is almost equal. Consequently whole remote sensing image data is through handling the tile data that generates according to the utility model discloses a hash distribution strategy will be by even dispersion to different storage nodes, and every storage node's storage tile data volume also approximately equals, has reached the requirement of query efficiency and load balancing.

Preferably, in this embodiment, the storage management node stores the file system metadata of the distributed file system, and records the file system metadata using the edit log file when detecting any operation that modifies the file system metadata;

before the storage management node stores actual file data in the data storage node, the whole file is subjected to data segmentation, the whole file is segmented into file blocks with predefined sizes, and a globally unique handle is allocated to each file block. The spatial information data has the characteristics of once writing and multiple reading, and once spatial information data products of different levels are processed and generated, the data per se is not allowed to be modified in principle; the file system metadata is helpful for this, for example, a file is created in a relational database management system (RDSMS), and the storage management node records log information of file creation in an edit log file (Editlog); the storage management node stores the edit log file (edit log) file in a local file system; file data is pre-cut into file blocks with predefined sizes, and then distributed storage is performed by the file blocks, so that the storage efficiency can be improved; the existence of the file system metadata can enable organization and management of the distributed file system to be relied upon and recorded.

Preferably, in this embodiment, before actual file data is stored in the data storage node, the whole file is subjected to data segmentation to be segmented into file blocks of a predefined size, and an operation of allocating a globally unique handle to each file block is completed by the external client through the storage management node;

the block size of a file block is defaulted to 64 MB. In the scheme, the default block size reaches 64MB, which is far larger than the block size of a common file system, and the advantage of selecting a larger block size is that the interaction between the client and the storage management node can be reduced; moreover, the client is likely to be aware of multiple operations on a given block, and maintaining a longer connection with a data storage node may reduce network load; meanwhile, the scale of the file system metadata stored on the storage management node can be reduced, so that the file system metadata can be stored in a memory, and the management and storage efficiency can be improved.

Preferably, in this embodiment, all file blocks of the file data are copied into multiple copies, the number of copies is referred to as a copy factor, and the file block size and the copy factor of each file data are configurable. The copy factor can be configured when the file is created and can be changed later; the fault tolerance of the system can be improved by setting the copy factors.

Preferably, in this embodiment, on the basis of the hash modulo algorithm, a redundancy policy is adopted, and by adding a redundancy fragmentation space to the data storage node, each file block is copied into a copy;

and if the data storage node is detected to be invalid or disconnected, continuously tracking the file block copy by using the storage management node, and starting the copy of the copy. When the local storage device or the network device fails, the system has the capability of automatic error detection and rapid and automatic data recovery; in the scheme, the storage management node is responsible for managing the copying of the file blocks, and when a part of storage nodes lose contact with the storage management node, the storage management node can mark the storage nodes which cannot be connected as failure states; at this time, the storage management node continuously tracks the file block copy to be copied and starts the copying of the file block copy; when the copy factor of a certain file is reduced, the storage management node selects the surplus file block copy to be deleted, and transmits a copy deletion instruction to a certain storage node in the next heartbeat detection, and at the moment, the storage node removes the corresponding file block copy and releases the space; the tile data security of the system and the reliability of the system are effectively guaranteed by the arrangement of the scheme.

For the redundancy strategy mentioned therein:

the distributed strategy based on the Hash modular method can well guarantee the load balance of the system, but the design of the safety and the reliability of the system is lacked, namely once a certain storage node fails, the tile data of the corresponding storage node is lost, and huge loss is brought to the system. Most distributed storage systems tend to adopt a fault-tolerant method to increase the reliability of the system. So-called fault tolerance is to say fault tolerance, which allows a system to fail and requires that related functions and services do not fail when a fault occurs, and the fault tolerance function can be designed usually in a copy form, and the essence of the method is a redundancy strategy.

The basic idea of the redundancy strategy is to design a distributed redundancy deployment scheme of 'fragmentation space copy' on the basis of a Hash modular algorithm, namely, the redundancy fragmentation space is added to a physical node to add a copy to the system, so that the fault tolerance of the system is effectively improved, and the reliability and the safety of the distributed system are ensured. The specific method comprises the following steps: and adding two copies to the fragmentation space number obtained by the Hash modulo, and respectively placing the two copies into the previous node and the next node. Thus, each tile data will store three thirds of data on three nodes, with two copies, and if the primary physical node corresponding to the space slice fails, the data can be obtained from the other two standby physical nodes, and the space slice number of each node also has the original one list to be changed into three lists, as described in table 3.

Table 3: slicing space number table corresponding to ith physical storage node

Copies of i-1 nodes	i node	Copies of the i +1 node
			(i-1)-1	i-1	(i+1)-1
(i-1)-1+n	i-1+n	(i+1)-1+n
			(i-1)-1+2n	i-1+2n	(i+1)-1+2n
(i-1)-1+3n	i-1+3n	(i+1)-1+3n
			…	…	…
(i-1)-1+kn	i-1+kn	(i+1)-1+kn

The multi-resolution tile data structure is a common strategy for storage and management of mass remote sensing image data at present, the strategy fully utilizes a multi-resolution image pyramid and an image blocking technology, can be effectively applied to seamless organization and visualization of the mass remote sensing data, and solves the real expression and presentation of the real world based on images; the high-resolution satellite remote sensing technology with the characteristics of high spatial resolution, high temporal resolution or high spectral resolution provides abundant data sources for remote sensing quantification, dynamism, networking, practicability and industrialization and extraction of ground feature characteristics by using remote sensing data.

Preferably, in this embodiment, the namespace, the block-to-file mapping, and the attributes of the file of the distributed file system are all stored in the file system metadata, and the file system metadata is stored in the storage management node. A block is the smallest unit of storage and processing in a database, containing header information data or PL/SQL code for the block itself.

Preferably, the distributed file system is implemented based on a relational database management system (RDSMS);

when a new relational database management system (RDSMS) file is created, a checksum is calculated for each file block of the file data and the checksum is saved as a separate hidden file to the storage management node. Massive remote sensing data are stored in an independent file in a framing mode, and a relational database management system is responsible for indexing work of file data, file blocks and the like, so that an application system can quickly position the file data to be read; the mode is very convenient for managing mass data, and capacity expansion can be realized by continuously increasing storage equipment or storage nodes according to the data capacity requirement. (ii) a The checksum is actually stored in the namespace of the storage management node; the scheme aims at the condition that a file block of a certain storage node is possibly damaged due to storage equipment errors, network errors or software defects of the storage node and the like; in this case, RDSMS adopts a file content checksum mechanism to ensure data integrity.

Preferably, the distributed file system employs an information separation hybrid indexing algorithm for indexing relevant spatiotemporal information, which employs an improved hash table for indexing static information, dynamic historical spatiotemporal information, current and future spatiotemporal information. The scheme solves the problems of cooperativity and survivability of massive spatial information.

Preferably, in the embodiment, the distributed file system adopts a remote sensing spatial data multi-scale organization management mode, and transversely takes high-score application as a basic unit, so as to realize data organization of different spatial geographic resource body types; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized. Transversely taking high-resolution application as a basic unit, and further realizing data organization of different space geographic resource body types; the management is carried out longitudinally at multiple levels of resolution, and the function of horizontal and vertical management of multi-level different types of resource bodies is realized.

Example two:

fig. 2 is a schematic diagram of a mass spatial information data storage management system according to the present invention, and as can be seen from fig. 2, the present invention further provides a mass spatial information data storage management system according to any one of the disclosure of the present invention, including:

the distributed file system 1 is established in a master-slave mode;

the storage management node 10 is positioned under the distributed file system and used for carrying out full life cycle management on the spatial information data and storing file system metadata of the distributed file system;

the storage cluster 20 is used for distributed storage of file data by taking a file block as a basic unit under the instruction control of the storage management node, and is formed by a plurality of data storage nodes and can be horizontally and smoothly expanded;

an external client coupled to the storage management node.

Fig. 3 is a software architecture diagram of a mass spatial information data storage management system according to the present invention, and it can be known with reference to fig. 2 and 3 that:

the system also comprises a storage virtualization layer arranged between the storage management node and the storage cluster, wherein the storage virtualization layer stores resource metadata and interface protocols and specifications, the resource metadata comprise heartbeat information of the storage node and the like, and the interface protocols and specifications store a heartbeat information reporting interface of the storage management node and the like, which are shown in detail in figure 3; in addition, as can be known from fig. 3, the data storage node may be a linux server, a uinx server, a windows server, and the like, and besides, fig. 3 also shows some contents of the system, and since the contents are not the main points of the present invention, detailed description is not repeated.

Fig. 4 is a schematic diagram of network deployment of the storage management system of the present invention, and it can be known with reference to fig. 4 that:

a) the client computer is the medium of the massive spatial information data storage and management system. The client computer interacts with a background storage server in a B/S mode, and data is transmitted between the storage server and the client computer; the right unit can log in the own account number on the server through the computer and enjoy the due right of the own.

b) The special line access router is responsible for the special line to access the spatial information data center and provide a data communication link. The private access router is connected with client computers of all ownership units downwards and is connected with a core switch of the spatial information data center upwards.

c) The core switch is a data communication core of the spatial information data center and provides high-speed packet switching capacity for communication between the storage server and the management server. The core switch is connected with the access router of the spatial information data center downwards and is connected with the access switch upwards.

d) The access switch provides sufficient interfaces to provide access capabilities for the storage server and the management server. The access switch is connected with the storage server and the management server upwards and is connected with the core switch downwards.

e) The storage server is responsible for operating the server of the right unit, and the right unit logs in the space information data storage and management system to obtain service by using the account of the right unit through the storage server. The storage server is connected down to the access switch.

f) The management server organizes and manages the storage server, and the management server is connected with the access switch.

It should be noted that the various embodiments of the present disclosure as described above generally relate to the processing of input data and the generation of output data to some extent. This input data processing and output data generation may be implemented in hardware or software in combination with hardware. For example, certain electronic components may be employed in a mobile device or similar or related circuitry for implementing the functions associated with the various embodiments of the present disclosure as described above. Alternatively, one or more processors operating in accordance with stored instructions may implement the functions associated with the various embodiments of the present disclosure as described above. If so, it is within the scope of the present disclosure that these instructions may be stored on one or more non-transitory processor-readable media. Examples of the processor-readable medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, and optical data storage devices. In addition, functional computer programs, instructions, and instruction segments for implementing the present disclosure can be easily construed by programmers skilled in the art to which the present disclosure pertains.

While the disclosure has been shown and described with reference to various embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims and their equivalents.

Claims

1. A mass space information data storage management method comprises the following steps:

the data storage nodes adopt a distributed storage strategy, and the distributed storage strategy is realized based on a Hash modulo algorithm;

wherein,

before actual file data is stored in a data storage node through the storage management node, an external client performs data segmentation on the whole file, segments the file into file blocks with predefined sizes, and allocates a globally unique handle to each file block; wherein the block size of the file block is defaulted to 64 MB;

all file blocks of the file data are copied into a plurality of copies;

on the basis of a Hash modular algorithm, a redundancy strategy is adopted, a redundancy fragmentation space is added to a data storage node, and each file block is copied into a copy; if the data storage node is detected to be invalid or disconnected, continuously tracking the file block copy by using the storage management node, and starting the copy of the copy;

the distributed file system adopts an information separation mixed indexing algorithm for indexing related spatio-temporal information, and an improved hash table is adopted in the algorithm for indexing static information, dynamic historical spatio-temporal information and current and future spatio-temporal information.

2. The method for storage management of mass spatial information data as claimed in claim 1, wherein said storage management node stores file system metadata of the distributed file system and records using the edit log file when detecting any operation for modifying the file system metadata.

3. The method for storage management of mass spatial information data as claimed in claim 2, wherein the number of said copies is called copy factor, and the file block size and copy factor of each file data are configurable.

4. The method for storage and management of mass spatial information data according to claim 2, wherein the namespace, block-to-file mapping, and file attributes of the distributed file system are stored in the file system metadata, and the file system metadata is stored in the storage management node.

5. The method for storage and management of mass spatial information data as claimed in claim 1, wherein said distributed file system is implemented based on a relational database management system;

when a new relational database management system file is created, a checksum is calculated for each file block of the file data and stored as a separate hidden file in the storage management node.

6. The method for storing and managing mass spatial information data according to claim 1, wherein the distributed file system adopts a remote sensing spatial data multi-scale organization management mode, and uses high-score application as a basic unit in the transverse direction; the vertical direction is managed with multiple levels of resolution.

7. A mass spatial information data storage management system using the mass spatial information data storage management method according to any one of claims 1 to 6, comprising:

the distributed file system is established in a master-slave mode;

an external client coupled to the storage management node.